Manual test plan
Checks that resist automation: real Claude Code sessions, non-Linux OS, and human judgment on capture quality.
BEFORE YOU BEGIN
Run before any significant release (minor bump, schema bump, or pinned @anthropic-ai/claude-code bump). Record results in the release PR.
Clean sandbox
mkdir kk-manual-test && cd kk-manual-test
git init
echo "node_modules" > .gitignore
npm pack ../path/to/kenkeep
npm init -y
npx ./e0ipso-kenkeep-<v>.tgz init --harnesses claude
npm install
npx kenkeep doctor
doctor should exit 0 (warnings OK).
1. Platform smoke
For each OS: set up a clean sandbox, trigger one Stop capture, and confirm _sessions/ has one new file with proposal_status: pending.
- macOS (latest)
- Linux (Ubuntu 22.04+)
- Windows WSL2: hook command in
.claude/settings.jsonusesnode, forward slashes - Windows native PowerShell: best-effort. Watch for CRLF in
.claude/hooks/*.mjs(must be LF).
2. PreCompact timing
The hook contract is ≤1s wall-clock.
- Feed a session ~30k tokens until auto-compact triggers.
- Time from indicator appearing to session resuming. Under 2s (our portion is the lead).
- Resulting
_sessions/<log>.mdcontains the full transcript slice, not a summary. - If the deadline fires, the next
Stop/SessionEndstill produces a log covering the missed window.
TIP
Diagnostic: time node .claude/hooks/kk-capture.mjs < /dev/null should be under 200ms cold. Over 1s usually means a slow filesystem, or the consumer hasn’t run npm install.
3. End-to-end happy path
Quality judgment.
- Fresh sandbox.
- 10-15 messages of substantive conversation about an invented project. End the session.
_sessions/shows oneproposal_status: pendinglog.- Open a new session. Wait 30-90s.
_logs/proposal/has a new.jsonl. Log flips toproposal_status: done. curatewrites 1-4 nodes undernodes/and regeneratesINDEX.md. Are these the right facts to remember? (Target: ≥80% acceptance.)git diff nodes/shows the new files.git addthe ones you want,git commit. The pre-commit hook regenerates and stages a freshINDEX.md/GRAPH.mdinto the same commit.git restore nodes/<unwanted>.mdto drop the rest.- One more session. Ask Claude “what do you know about this project?” The response references something committed.
If curator output is clearly noise, bump the proposal prompt’s Version: and tighten “what to skip”.
4. init --upgrade against an older install
- Install the last published version. Commit.
- Edit
proposal-extract.md(add a sentinel) and add a custom key toconfig.yaml. - Install the candidate build.
init --upgrade --dry-run. Read the changelist.init --upgrade. Confirm:- Sentinel comment preserved.
- Custom
config.yamlkey preserved. kk-capture.mjsreflects the new version.installed-versionshows the new version.
doctorexits 0, no version mismatch.
5. logs prune on real logs
- After §3, both
_logs/proposal/and_logs/curator/have 1-3 JSONL files. - Backdate one:
touch -d "60 days ago" <file>. logs prunedeletes the backdated file and reportspruned 1 files. Recent files survive.- Set
logsRetentionDays: 0inconfig.yaml, rerun: every*.jsonlunder_logs/is deleted. - Set
logsRetentionDays: 365, rerun on an already-pruned tree: reportspruned 0 files, no error.
6. /kk-bootstrap
- In a sandbox with a small public repo (README + architecture.md),
initand open Claude Code. - Run
/kk-bootstrap. Agent reads source docs, writes nodes directly undernodes/<kind>/, reports a summary including any collisions skipped. - Walk
git diff nodes/(or use a tool like self-review).git committhe ones you want;git restore <path>the rest. - No node carries a literal secret or stale TODO from the source docs.
7. bootstrap discovery and hash-aware re-run
- In a repo with 50+ markdown files (>200k chars), run
finddocsto preview discovery. The output is one+ <relpath>line per surviving file; confirm the count matches your expectation given.kkignore. - Run
bootstrap --from <subset>against 3-5 docs. Inspectbootstrap-state.json. Each processed doc has an entry withcontent_sha256andproduced_nodes. - Re-run
bootstrap --from <subset>. The skill should report that every doc was hash-skipped. - Edit one file. Re-run. Only that file is reprocessed.
8. Single-author skill sessions (no cross-process lock)
Curate and bootstrap take no state.json lock; they are single-author by design.
- Run two
curatelaunchers in parallel against the same repo (&in a shell). Both run to completion; neither errors out with “locked”. After both finish, runindex rebuildanddoctor.state.jsonand session-log frontmatter both parse cleanly (Zod validation passes), even though one writer’s session-stamp updates may have silently lost to the other. - Worst case is some sessions reprocess on the next
curaterun. Confirm: re-runcurate, verify the unstamped sessions get processed. - The proposal-drain hook still takes its own
proposal-drainlock (independent surface). Trigger twoSessionStartevents in quick succession against the same repo; the second drain skips while the first holds the lock and reclaims it after the 30-min TTL on the next run.
9. Settings file
- With no
config.yaml,curateuses built-in defaults (curationThreshold: 5,logsRetentionDays: 30,lintEveryNSessions: 50). - Add project
.ai/kenkeep/config.yamlwithcurationThreshold: 3. Re-run honors 3. - Add an unknown key (e.g.
foo: bar).curateexits with an error naming the file.
10. Doctor exit codes
Reset between scenarios.
- Delete
installed-version. Doctor errors, exit 1. - Hand-edit a node to add fake
derived_from: nonexistent.md. Warning, exit 0. - Hand-edit a node’s
summaryafter a curate run. INDEX stale warning, exit 0. - Install v(N-1), drop v(N) binary without upgrading. Version-mismatch warning, exit 0.
What this plan does NOT cover
Automated tests cover these:
- Frontmatter Zod parsing (
tests/lib/schemas.test.ts) nodes_hash, INDEX/GRAPH determinism (tests/lib/index-gen.test.ts)- Pipeline logic with mocked subprocess (
tests/lib/{proposal-drain,curate,bootstrap}.test.ts) - CLI argument parsing (per-command integration tests)
- Logs-prune duration parsing (
tests/lib/logs-prune.test.ts)
If a manual check uncovers a regression that automation should have caught, write the missing test in the fix PR.