Manual test plan

Checks that resist automation: real Claude Code sessions, non-Linux OS, and human judgment on capture quality.

BEFORE YOU BEGIN

Run before any significant release (minor bump, schema bump, or pinned @anthropic-ai/claude-code bump). Record results in the release PR.

</Callout>

Clean sandbox

mkdir kk-manual-test && cd kk-manual-test
git init
echo "node_modules" > .gitignore
npm pack ../path/to/kenkeep
npm init -y
npx ./e0ipso-kenkeep-<v>.tgz init --harnesses claude
npm install
npx kenkeep doctor

doctor should exit 0 (warnings OK).

1. Platform smoke

For each OS: set up a clean sandbox, trigger one Stop capture, and confirm _sessions/ has one new file with proposal_status: pending.

macOS (latest)
Linux (Ubuntu 22.04+)
Windows WSL2: hook command in .claude/settings.json uses node, forward slashes
Windows native PowerShell: best-effort. Watch for CRLF in .claude/hooks/*.mjs (must be LF).

2. PreCompact timing

The hook contract is ≤1s wall-clock.

Feed a session ~30k tokens until auto-compact triggers.
Time from indicator appearing to session resuming. Under 2s (our portion is the lead).
Resulting _sessions/<log>.md contains the full transcript slice, not a summary.
If the deadline fires, the next Stop/SessionEnd still produces a log covering the missed window.

TIP

Diagnostic: time node .claude/hooks/kk-capture.mjs < /dev/null should be under 200ms cold. Over 1s usually means a slow filesystem, or the consumer hasn’t run npm install.

</Callout>

3. End-to-end happy path

Quality judgment.

Fresh sandbox.
10-15 messages of substantive conversation about an invented project. End the session.
_sessions/ shows one proposal_status: pending log.
Open a new session. Wait 30-90s. _logs/proposal/ has a new .jsonl. Log flips to proposal_status: done.
curate writes 1-4 nodes under nodes/ and regenerates ENTRY.md. Are these the right facts to remember? (Target: ≥80% acceptance.)
git diff nodes/ shows the new files. git add the ones you want, git commit. The pre-commit hook regenerates and stages a fresh ENTRY.md/GRAPH.md into the same commit. git restore nodes/<unwanted>.md to drop the rest. (If you git restore nodes without committing another nodes/ change, the hook won’t fire — run npx kenkeep index rebuild so the generated index drops them too.)
One more session. Ask Claude “what do you know about this project?” The response references something committed.

If curator output is clearly noise, bump the proposal prompt’s Version: and tighten “what to skip”.

4. `init --upgrade` against an older install

5. `logs prune` on real logs

After §3, both _logs/proposal/ and _logs/curator/ have 1-3 JSONL files.
Backdate one: touch -d "60 days ago" <file>.
logs prune deletes the backdated file and reports pruned 1 files. Recent files survive.
Set logsRetentionDays: 0 in config.yaml, rerun: every *.jsonl under _logs/ is deleted.
Set logsRetentionDays: 365, rerun on an already-pruned tree: reports pruned 0 files, no error.

6. `/kk-bootstrap`

In a sandbox with a small public repo (README + architecture.md), init and open Claude Code.
Run /kk-bootstrap. Agent reads source docs, writes nodes directly under nodes/ (topical folders), reports a summary including any collisions skipped.
Walk git diff nodes/ (or use a tool like self-review). git commit the ones you want; git restore <path> the rest. Bootstrap already rebuilt the index over every written node, so if you git restore without committing another nodes/ change, run npx kenkeep index rebuild to drop the removed nodes from the generated index.
No node carries a literal secret or stale TODO from the source docs.

7. `bootstrap` discovery and hash-aware re-run

In a repo with 50+ markdown files (>200k chars), run finddocs to preview discovery. The output is one + <relpath> line per surviving file; confirm the count matches your expectation given .kkignore.
Run bootstrap --from <subset> against 3-5 docs. Inspect bootstrap-state.json. Each processed doc has an entry with content_sha256 and produced_nodes.
Re-run bootstrap --from <subset>. The skill should report that every doc was hash-skipped.
Edit one file. Re-run. Only that file is reprocessed.

8. Single-author skill sessions (no cross-process lock)

Curate and bootstrap take no state.json lock; they are single-author by design.

Run two curate launchers in parallel against the same repo (& in a shell). Both run to completion; neither errors out with “locked”. After both finish, run index rebuild and doctor. state.json and session-log frontmatter both parse cleanly (Zod validation passes), even though one writer’s session-stamp updates may have silently lost to the other.
Worst case is some sessions reprocess on the next curate run. Confirm: re-run curate, verify the unstamped sessions get processed.
The proposal-drain hook still takes its own proposal-drain lock (independent surface; proper-lockfile on state.json, 60s stale threshold). Trigger two SessionStart events in quick succession against the same repo; the second drain skips while the first holds the lock. Simulate an interrupted drain by killing the holder; within ~60s the stale lock is auto-reclaimed on the next drain (DrainSummary.recoveredStaleLock === true).

9. Settings file

With no config.yaml, curate uses built-in defaults (curationThreshold: 20, logsRetentionDays: 30, lintEveryNSessions: 50).
Add project .ai/kenkeep/config.yaml with curationThreshold: 3. Re-run honors 3.
Add an unknown key (e.g. foo: bar). curate exits with an error naming the file.

10. Doctor exit codes

Reset between scenarios.

Delete installed-version. Doctor errors, exit 1.
Hand-edit a node to add fake derived_from: nonexistent.md. Warning, exit 0.
Hand-edit a node’s summary after a curate run. ENTRY stale warning, exit 0.
Install v(N-1), drop v(N) binary without upgrading. Version-mismatch warning, exit 0.

What this plan does NOT cover

Automated tests cover these:

Frontmatter Zod parsing (tests/lib/schemas.test.ts)
nodes_hash, INDEX/GRAPH determinism (tests/lib/index-gen.test.ts)
Pipeline logic with mocked subprocess (tests/lib/{proposal-drain,curate,bootstrap}.test.ts)
CLI argument parsing (per-command integration tests)
Logs-prune duration parsing (tests/lib/logs-prune.test.ts)

If a manual check uncovers a regression that automation should have caught, write the missing test in the fix PR.

Manual test plan

Clean sandbox

1. Platform smoke

2. PreCompact timing

3. End-to-end happy path

4. init --upgrade against an older install

5. logs prune on real logs

6. /kk-bootstrap

7. bootstrap discovery and hash-aware re-run

8. Single-author skill sessions (no cross-process lock)

9. Settings file

10. Doctor exit codes

What this plan does NOT cover

4. `init --upgrade` against an older install

5. `logs prune` on real logs

6. `/kk-bootstrap`

7. `bootstrap` discovery and hash-aware re-run