Manual test plan
Checks that resist automation: real Claude Code sessions, non-Linux OS, and human judgment on capture quality. Run before any significant release (minor bump, schema bump, or pinned @anthropic-ai/claude-code bump). Record results in the release PR.
Clean sandbox
mkdir kb-manual-test && cd kb-manual-test
git init
echo "node_modules" > .gitignore
npm pack ../path/to/ai-knowledge-base
npm init -y
npx ./e0ipso-ai-knowledge-base-<v>.tgz init --harnesses claude
npm install
npx @e0ipso/ai-knowledge-base doctor
doctor should exit 0 (warnings OK).
1. Platform smoke
For each OS, set up a clean sandbox and trigger one Stop capture. Confirm _sessions/ contains one new file with secret_scan_status: clean and proposal_status: pending.
- macOS (latest)
- Linux (Ubuntu 22.04+)
- Windows WSL2: hook command in
.claude/settings.jsonusesnode, forward slashes - Windows native PowerShell: best-effort. Watch for CRLF in
.claude/hooks/*.mjs(must be LF).
2. PreCompact timing
The hook contract is ≤1s wall-clock.
- Feed a session ~30k tokens until auto-compact triggers.
- Time from indicator appearing to session resuming. Under 2s (our portion is the lead).
- Resulting
_sessions/<log>.mdcontains the full transcript slice, not a summary. - If the deadline fires, the next
Stop/SessionEndstill produces a log covering the missed window.
Diagnostic: time node .claude/hooks/kb-capture.mjs < /dev/null should be under 200ms cold. Over 1s usually means secretlint is loading from a slow filesystem, or the consumer hasn’t run npm install.
3. Secretlint per platform
After npm install:
- macOS / Linux / WSL2:
node_modules/@secretlint/coreexists.doctorreports secretlint resolvable. - Native Windows: same, since secretlint is a pure-JS Node package with no platform-specific binaries.
Redaction:
- Add a fake secret (e.g.
GITHUB_TOKEN=ghp_1234567890abcdefghijklmnopqrstuvwxyz) in a session. TriggerStop. - The log shows
secret_scan_status: redactedand the secret replaced by[REDACTED:<rule-id>].
4. End-to-end happy path
Quality judgment.
- Fresh sandbox.
- 10-15 messages of substantive conversation about an invented project. End the session.
_sessions/shows oneproposal_status: pendinglog.- Open a new session. Wait 30-90s.
_logs/proposal/has a new.jsonl. Log flips toproposal_status: done. curatewrites 1-4 nodes undernodes/and regeneratesINDEX.md. Are these the right facts to remember? (Target: ≥80% acceptance.)git diff nodes/shows the new files.git addthe ones you want,git commit. The pre-commit hook regenerates and stages a freshINDEX.md/GRAPH.mdinto the same commit.git restore nodes/<unwanted>.mdto drop the rest.- One more session. Ask Claude “what do you know about this project?” The response references something committed.
If curator output is clearly noise, bump the proposal prompt’s Version: and tighten “what to skip”.
5. init --upgrade against an older install
- Install the last published version. Commit.
- Edit
proposal-extract.md(add a sentinel) and add a custom key toconfig.yaml. - Install the candidate build.
init --upgrade --dry-run. Read the changelist.init --upgrade. Confirm:- Sentinel comment preserved.
- Custom
config.yamlkey preserved. kb-capture.mjsreflects the new version.installed-versionshows the new version.
doctorexits 0, no version mismatch.
6. logs prune on real logs
- After §4, both
_logs/proposal/and_logs/curator/have 1-3 JSONL files. - Backdate one:
touch -d "60 days ago" <file>. logs prunedeletes the backdated file and reportspruned 1 files. Recent files survive.- Set
logsRetentionDays: 0inconfig.yaml, rerun: every*.jsonlunder_logs/is deleted. - Set
logsRetentionDays: 365, rerun on an already-pruned tree: reportspruned 0 files, no error.
7. /kb-bootstrap
- In a sandbox with a small public repo (README + architecture.md),
initand open Claude Code. - Run
/kb-bootstrap. Agent reads source docs, writes nodes directly undernodes/<kind>/, reports a summary including any collisions skipped. - Walk
git diff nodes/(or use a tool like self-review).git committhe ones you want;git restore <path>the rest. - No node carries a literal secret or stale TODO from the source docs.
8. bootstrap-incremental chunking
- In a repo with 50+ markdown files (>200k chars), run
--dry-run. Reported batch count matchesceil(files / 20). - Run without
--dry-run. Inspectbootstrap-state.json. - Re-run. Output: 0 files to process.
- Edit one file. Re-run. Only that file is reprocessed.
9. Concurrent locking
curaterunning in one terminal. Startcuratein a second. Second exits with “locked by curator (pid …)”.- Same with
bootstrap-incrementalagainst itself. - Cross-pipeline:
curateandbootstrap-incrementaldo not block each other (distinct lock names). - Stale-lock recovery: edit
state.jsonto setacquired_at31 minutes ago. Nextcuratereclaims and proceeds.
10. Settings file
- With no
config.yaml,curateuses built-in defaults (curationThreshold: 5,logsRetentionDays: 30,lintEveryNSessions: 50). - Add project
.ai/knowledge-base/config.yamlwithcurationThreshold: 3. Re-run honors 3. - Add an unknown key (e.g.
foo: bar).curateexits with an error naming the file.
11. Doctor exit codes
Reset between scenarios.
- Delete
installed-version. Doctor errors, exit 1. - Delete
.secretlintrc.json. Doctor reports the commit-time scan as missing, exit 0 with a warning. - Hand-edit a node to add fake
derived_from: nonexistent.md. Warning, exit 0. - Hand-edit a node’s
summaryafter a curate run. INDEX stale warning, exit 0. - Install v(N-1), drop v(N) binary without upgrading. Version-mismatch warning, exit 0.
What this plan does NOT cover
Automated tests cover these:
- Frontmatter Zod parsing (
tests/lib/schemas.test.ts) nodes_hash, INDEX/GRAPH determinism (tests/lib/index-gen.test.ts)- Pipeline logic with mocked subprocess (
tests/lib/{proposal-drain,curate,bootstrap}.test.ts) - CLI argument parsing (per-command integration tests)
- Logs-prune duration parsing (
tests/lib/logs-prune.test.ts)
If a manual check uncovers a regression that automation should have caught, write the missing test in the fix PR.