Architecture
Layout
src/
├── cli.ts # Commander entry
├── commands/ # User-facing CLI implementations
├── harnesses/ # One sub-directory per adapter
│ ├── types.ts # HarnessAdapter interface
│ ├── registry.ts # Central adapter registry
│ ├── detect.ts # Env-based active-harness resolver
│ ├── claude/ codex/ cursor/ opencode/ copilot/
│ │ └── hooks/ # Per-harness compiled hook sources
├── lib/ # Reusable building blocks
└── templates-source/ # Files copied into consumer repos
Build: tsup produces dist/cli.js (CLI binary) and dist/hooks/*.mjs (one bundle per hook). The prepare script copies templates-source/ to templates/ and drops compiled hooks into templates/claude/hooks/. The npm package ships dist/ and templates/.
Two CLI shapes
- Deterministic primitives:
init,doctor,status,lint,finddocs,node write,session-log stage-live,session-log update-proposals,curate-dedup,curate-persist,conflict prepare,drafts collect,schema,validate,pack import,pack export,rebalance trigger,rebalance move,index rebuild,logs prune. No LLM. Pure Node helpers; skills compose them, CI/scripts may call them directly.schema <name>prints a JSON Schema generated from the Zod definitions insrc/lib/schemas.ts(via thesrc/lib/schema-registry.tsname map) andvalidate <name> [file]validates a JSON artifact against it - the produce -> validate -> fix loop the curate/extract skills drive in place of narrated schemas.conflict preparecomputes the curate conflict defaults/sort-group,drafts collectaggregates and schema-validates the parallel-path batch drafts, andpack import/pack exportmove already-reviewed markdown packs without LLM involvement. - Launchers:
bootstrap,curate,node add. Thin wrappers that exec<harness> -p "/kk-<name>"withKENKEEP_BUILDER_INTERNAL=1set on the child. The LLM call runs in the host harness session, not in a subprocess spawned by this CLI.
Model config: the proposal-drain hook’s model and effort are set via proposalModel: { name, effort } in config.yaml. Curate and bootstrap run under whatever model the host harness session uses.
The one headless-subprocess site is the proposal-drain hook, which spawns the active harness’s headless driver (codex exec, agent -p, opencode run, copilot -p, …) per captured session log to extract candidates.
CAUTION
The Claude adapter’s drain hook is a deliberate no-op. Spawning a headless subprocess would double-bill the user’s Claude plan, so extraction instead runs inline during /kk-curate. Do not “fix” this hook to spawn a driver.
</Callout>
Pipelines
flowchart TB
subgraph capture[Capture]
H1[Stop / SessionEnd / PreCompact] --> KB1[kk-capture.mjs<br/>sync]
KB1 --> SL[_sessions/<log>.md<br/>pending]
end
subgraph extract[Extract candidates]
SS1[SessionStart] --> KB2[kk-proposal-drain.mjs<br/>async, headless harness]
SL --> KB2
KB2 --> SLD[_sessions/<log>.md<br/>done + candidates]
end
subgraph curate[Curate]
UC["/kk-curate slash command<br/>or curate launcher"] --> KB3[kk-curate skill<br/>in host harness session]
SLD --> KB3
KB3 -->|curate-dedup| PC[conflicts/<id>.md<br/>+ survivor batch]
PC -->|curate-persist| NODES[(nodes/<topic>/<id>.md)]
KB3 -->|rebalance trigger / move| IDX[per-folder index.md + ENTRY.md / GRAPH.md]
KB3 -->|index rebuild| IDX
end
subgraph review[Review]
NODES --> RV[git diff<br/>git commit / git restore]
PC --> SK["/kk-curate skill<br/>resolves with user"]
SK --> NODES
RV --> COMMIT[(committed nodes)]
end
subgraph consume[Consume]
SS2[SessionStart] --> KB4[kk-session-start.mjs<br/>sync]
IDX --> KB4
KB4 --> CTX[additionalContext → harness]
end
Parallel drafting and per-batch logs
When the host harness exposes native sub-agents (Claude Code and Cursor today), /kk-bootstrap and /kk-curate fan their drafting out across up to five sub-agents per wave, each reading its own slice in an isolated context. Harnesses without native sub-agents fall back to sequential drafting automatically: the skills probe their own tool surface at the start of each run and degrade silently, so a sequential run looks identical to a parallel one from the outside. /kk-add uses a single sub-agent only for context isolation, so the host transcript stays clean.
Each run drops a JSONL trace under .ai/kenkeep/_logs/, one file per batch (or one per run for /kk-add):
.ai/kenkeep/_logs/bootstrap/<runId>__<batchN>.jsonl
.ai/kenkeep/_logs/curator/<runId>__<batchN>.jsonl
.ai/kenkeep/_logs/kk-add/<runId>.jsonl
The parallel path additionally writes a <runId>__<batchN>.draft.json beside each .jsonl. If those are absent while .jsonl files exist, the sequential fallback ran. Everything under _logs/ is gitignored: per-user diagnostic state, not something to commit.
State files
| File | Owner | Purpose |
|---|---|---|
_sessions/<log>.md | capture, extract, curate | Per-session checkpoint. Filename is YYYYMMDD-HHmm-<sessionId>.md; re-firing the hook for the same session_id overwrites in place. |
_logs/proposal/*.jsonl | proposal-drain hook | Stream-JSON traces from the hook’s headless subprocess (non-Claude adapters). Gitignored. |
nodes/ (nested topical folders) | curator, node-add, bootstrap, human reviewer | Canonical knowledge. Reviewed via git diff and accepted via git commit. |
ENTRY.md / GRAPH.md | curator, index rebuild (incl. --stage for opt-in pre-commit hooks) | Deterministic outputs derived from nodes/. Regenerated by the curator at end-of-run; consumers may also wire index rebuild --stage into their own pre-commit hook. |
.state/installed-version | init | Package version + selected harnesses. Committed. |
.state/state.json | drain, curator, bootstrap, consume | Lock + last_nudged_at. Gitignored. |
.state/bootstrap-state.json | bootstrap | Doc SHA-256 cache. Gitignored. |
conflicts/<run-id>-<n>.md | curator (write), kk-curate skill (resolve), status (read) | Curator-detected contradictions, one markdown file per conflict. Frontmatter carries status: pending; resolution is via git restore (Reject / Accept-after-apply) or git commit (Keep as record). |
.config/prompts/* | init | Local prompt overrides. Committed. |
.state/usage.jsonl | capture | Write-only ledger of which KB documents each session read, from both dedicated read tools and markdown paths named in shell/search commands ({ document, type, session_id, used_at }). One line per read occurrence; only .md under nodes/ is recorded. Gitignored. |
Locking
Only the proposal-drain hook locks. It holds a proper-lockfile lock on state.json (a mkdir-atomic state.json.lock directory whose mtime is refreshed on a heartbeat while held; 60s stale threshold) to keep concurrent SessionStart drains from racing on the pending queue. A drain SIGKILLed by the host’s outer hook timeout can neither run its finally release nor proper-lockfile’s graceful-exit handler, so the lock only clears once it goes stale; the next drain auto-reclaims it on acquire (recovery within ~60s, vs. the 30-min state-file default used by other locks).
Curate, bootstrap, and consume do not lock. Curate, /kk-session-extract, and bootstrap each run in a single host harness session per user invocation (single-author by design); the atomic tmp+rename writes inside node write, session-log stage-live, curate-dedup, and curate-persist provide durability.
Live session extraction
/kk-session-extract reuses the session-log boundary instead of a parallel pipeline:
- The in-host skill applies
proposal-extract.mdto the visible live context. session-log stage-livevalidates proposal JSON and writes or updates_sessions/*.mdwithproposal_status: doneandcaptured_by: manual.- The skill drafts curator actions and calls
curate-dedup --session-id <staged-id>so only the staged log is stamped; unrelated done logs remain for/kk-curate. - When capture later rewrites the same
session_id, it preservescurator_processed_at,curator_run_id, and terminal proposal fields if the log was already curated.
CAUTION
Running two curate (or bootstrap) launchers against the same repo concurrently is unsupported. The second writer’s session-stamp update may silently lose to the first: no data corruption, but some sessions reprocess on the next run.
</Callout>
Knowledge base storage (tree over DAG)
Leaf nodes (the documents) live in topical folders under .ai/kenkeep/nodes/ at any depth. Every folder carries a generated index.md (an index node): a deterministic, actionable table-of-contents that invites descent, ordered by graph in-degree then title. The top-level catalog ENTRY.md (the SessionStart entry point) is a purpose-built whole-tree launchpad — the branch list — distinct from the per-folder nodes/index.md; GRAPH.md is the full edge listing.
An index node body carries: an embedded one-line descent directive (from the single KK_NAVIGATION_DIRECTIVE), explicit guidance to open at least one relevant leaf when the index was loaded for substantive topic detail, a ↑ Parent breadcrumb on non-root nodes, imperative Load [\name/`](…) for more information on <summary> descent pointers and Open title to learn about: <summary> leaf pointers (valid Markdown links that splice the target's summary, Title-cased name fallback when absent), a reworked ## By topic, and **no body statistics** (counts/token estimates are diagnostics, kept in frontmatter only). The leaf-read guidance is specific to generated nodes/**/index.md bodies; ENTRY.md` remains the bounded branch launchpad.
- Folder summary: authoring vs carrying vs rendering. The single non-deterministic field is each folder’s one-line
summary(inindex.mdfrontmatter; the root’s inENTRY.md). It is authored rarely and semantically by an LLM at exactly two quarantined clustering moments — the v1→v2 migration (thekk-migrateskill) and therebalancesplit-folder/create-branch/split-leaf steps (humans may hand-edit) — carried deterministically bygenerateIndex, which harvests the prior on-disk value before regenerating and re-stamps it verbatim (a leaf edit never perturbs a sibling’s summary), and rendered deterministically into the imperative pointers. A missing summary renders the Title-cased name fallback;index rebuildwarns and exits zero (warn, never block). - Reworked
## By topic. For each tag present among a folder’s direct leaves (bucket set/order unchanged: size DESC then alpha), it lists the ≤3 most-central nodes drawn from the whole tree carrying that tag, ranked by centrality = summed tag Jaccard (|A∩B|/|A∪B|) against the rest of that tag’s whole-tree cohort, tie-broken by in-degree then title. Each entry is a followableOpen [**title**](path) — <summary>. The block points OUT to the canonical nodes per topic instead of re-listing the local components. kindis a facet, not a directory.kind(map/practice) drives only the Conventions / Components rendering split; folders are topical.- Tree over DAG. Containment is a tree (one parent folder per leaf);
relates_to/depends_onstay a cross-tree DAG overlay, resolved byid. - Path is presentation;
idis identity. No node references another by path; index generation resolves eachidto its current path, so relocation never breaks a reference.generateIndexreturns oneindex.mdbody per directory plus per-folder metrics (occupancy, tag diversity, leaf size); the metrics feedrebalancebut are no longer printed in the body. nodes_hashexcludes generatedindex.md. The per-foldernodes_hashcovers that folder’s own direct leaves only; hashing the generated index nodes would be self-referential, and the whole-tree## By topicblock is deliberately excluded from it so cross-tree churn reorders the rendered block without perturbing an unrelated folder’s stability hash.- No schema bump. The
summaryfield folds into the current releasedschema_version: 2index frontmatter (added optional); there is no v3 hop. The reader still rejects the old flatnodes/<kind>/layout (orschema_version: 1) with a migrate message that now points at thekk-migrateskill, which preserves ids and edges. The headlessmigratecommand (which spawned a nested<harness> -pto cluster) has been removed: the in-hostkk-migrateskill performs the clustering in the user’s current session and drives the deterministicplaceprimitive (inventory + apply) for all file I/O, so a full migration now requires an interactive agent session.
Determinism contract
computeNodesHashis content-addressed, mtime-independent, and over leaf nodes only (generatedindex.mdfiles are excluded).generateIndexemits one deterministicindex.mdbody per directory;generateIndex/generateGraphare pure functions ofnodes/plus an injectednow. Repeated rebuilds over an unchanged leaf set are byte-identical.slugify,deriveNodeId,ensureUniqueIdare pure.crypto.randomUUID()is the only randomness, scoped torun_idminting.
NOTE
Tests depend on this contract. See tests/lib/index-gen.test.ts for golden-file comparisons.
</Callout>
Adapter interface
src/harnesses/types.ts (abbreviated — see source for full JSDoc):
| Member | Kind | Notes |
|---|---|---|
id | property | Stable id used in --harnesses and installed-version |
launchBinary | property | Executable name on PATH (claude, codex, opencode, …) |
launchArgsPrefix | property | Argv prepended before the slash-command payload |
hooks | property | readonly HookSpec[] — lifecycle declarations |
paths(root) | method | Returns harness-owned on-disk locations (dir, skillsDir, hooksDir/pluginsDir, …) |
install(opts) | method | First-time install |
upgrade(opts) | method | Idempotent refresh |
parseTranscript(text) | method | Harness-native format → RoleTaggedTranscript |
renderTranscript(t) | method | RoleTaggedTranscript → human-readable [USER]: / [AGENT]: form |
runHeadless(prompt, stdin, schema, opts?) | method | Spawns harness headless driver; validates result against Zod schema |
buildHarnessOpts(settings, role) | method | Translates EffectiveSettings → adapter-specific harnessOpts blob |
doctorChecks(paths) | method | Harness-specific health probes |
listMemoryFiles(opts?) | method | Returns file:// IRIs of harness auto-memory files |
detectFromEnv?(env) | optional method | Returns true when this harness is the active one |
runHeadless spawns the harness’s headless driver (e.g. claude -p, codex exec, opencode run). It has exactly two consumers: the proposal-drain hook (per-session candidate extraction) and the CLI launchers (bootstrap, curate, node add, which exec the active harness against a slash-command).
Adding an adapter: implement HarnessAdapter, then register it in src/harnesses/registry.ts. See CONTRIBUTING.md for the full step-by-step.
Testing
- Unit + integration (
npm test): pure-function tests forsrc/lib/, plus pipeline integration tests against a fake runner. CLI integration tests build the package and run the binary in a temp-dir sandbox. ~10s. - Manual: see Manual test plan.
Where to extend
| Goal | Path | |—|—| | Change extraction | src/templates-source/prompts/proposal-extract.md | | Change curate | src/templates-source/skills/kk-curate/SKILL.md (dedup logic in src/commands/curate-dedup.ts, survivor-batch persistence + placement in src/commands/curate-persist.ts, conflict defaults/sort in src/commands/conflict-prepare.ts, parallel-draft aggregation in src/commands/drafts-collect.ts) | | Change a structured LLM↔primitive contract | the Zod schema in src/lib/schemas.ts and its name in src/lib/schema-registry.ts (surfaced via schema / validate); never hand-author JSON Schema | | Change rebalance | src/lib/rebalance.ts (LLM-free trigger thresholds + grouped create-branch), src/commands/rebalance.ts (trigger / move primitives) | | Change live session extract | src/templates-source/skills/kk-session-extract/SKILL.md (session-log stage-live in src/commands/session-log-stage-live.ts) | | Change bootstrap | src/templates-source/skills/kk-bootstrap/SKILL.md (discovery primitive in src/commands/finddocs.ts, write primitive in src/commands/node-write.ts) | | Change manual node add | src/templates-source/skills/kk-add/SKILL.md | | New CLI subcommand | src/commands/<name>.ts + wire in src/cli.ts | | New hook | src/harnesses/<id>/hooks/<name>.ts + register in src/harnesses/<id>/hook-spec.ts | | New state file | Schema in src/lib/schemas.ts; add to gitignore block | | New adapter | Implement src/harnesses/types.ts; register in src/harnesses/registry.ts | </content> </invoke>