feat: /titan-run orchestrator with diff review, semantic assertions, arch snapshots#557
feat: /titan-run orchestrator with diff review, semantic assertions, arch snapshots#557carlos-alm wants to merge 29 commits intomainfrom
Conversation
…in backlog These two items deliver the highest immediate impact on agent experience and graph accuracy without requiring Rust porting or TypeScript migration. They should be implemented before any Phase 4+ roadmap work. - #83: hook-optimized `codegraph brief` enriches passively-injected context - #71: basic type inference closes the biggest resolution gap for TS/Java
Impact: 14 functions changed, 0 affected
Add new Phase 4 covering the port of JS-only build phases to Rust: - 4.1-4.3: AST nodes, CFG, dataflow visitor ports (~587ms savings) - 4.4: Batch SQLite inserts (~143ms) - 4.5: Role classification & structure (~42ms) - 4.6: Complete complexity pre-computation - 4.7: Fix incremental rebuild data loss on native engine - 4.8: Incremental rebuild performance (target sub-100ms) Bump old Phases 4-10 to 5-11 with all cross-references updated. Benchmark evidence shows ~50% of native build time is spent in JS visitors that run identically on both engines.
Take main's corrected #57 section anchors; keep HEAD's v2.7.0 version reference. Impact: 10 functions changed, 11 affected
…ative-acceleration Impact: 25 functions changed, 46 affected
- Add COMMITS=0 guard in publish.yml to return clean version when HEAD is exactly at a tag (mirrors bench-version.js early return) - Change bench-version.js to use PATCH+1-dev.COMMITS format instead of PATCH+COMMITS-dev.SHA (mirrors publish.yml's new scheme) - Fix fallback in bench-version.js to use dev.1 matching publish.yml's no-tags COMMITS=1 default Impact: 1 functions changed, 0 affected
The release skill now scans commit history using conventional commit rules to determine major/minor/patch automatically. Explicit version argument still works as before.
…ns, and architectural snapshot Add /titan-run skill that dispatches the full Titan pipeline (recon → gauntlet → sync → forge) to sub-agents with fresh context windows, enabling end-to-end autonomous execution. Hardening layers added across the pipeline: - Pre-Agent Gate (G1-G4): git health, worktree validity, state integrity, backups - Post-phase validation (V1-V15): artifact structure, coverage, consistency checks - Stall detection with per-phase thresholds and no-progress abort - Mandatory human checkpoint before forge (unless --yes) New validation tools integrated into forge and gate: - Diff Review Agent (forge Step 9): verifies each diff matches the gauntlet recommendation and sync plan intent before gate runs - Semantic Assertions (gate Step 5): export signature stability, import resolution integrity, dependency direction, re-export chain validation - Architectural Snapshot Comparator (gate Step 5.5): community stability, cross-domain dependency direction, cohesion delta, drift detection vs pre-forge baseline
Greptile SummaryThis PR adds Two issues remain:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant TitanRun as /titan-run (Orchestrator)
participant Recon as /titan-recon Sub-Agent
participant Gauntlet as /titan-gauntlet Sub-Agent
participant Sync as /titan-sync Sub-Agent
participant Forge as /titan-forge Sub-Agent
participant Gate as /titan-gate Sub-Agent
User->>TitanRun: /titan-run [args]
TitanRun->>TitanRun: Step 0 — Pre-flight (worktree, args, git sync)
TitanRun->>TitanRun: Step 0.5 — Artifact pre-validation (if --start-from)
TitanRun->>TitanRun: G1-G4 Pre-Agent Gate
TitanRun->>Recon: Dispatch sub-agent
Recon-->>TitanRun: titan-state.json, GLOBAL_ARCH.md
TitanRun->>TitanRun: V1-V4 Post-recon validation
loop Until gauntlet-summary.json complete=true (max 50 iters)
TitanRun->>TitanRun: G1-G4 Pre-Agent Gate
TitanRun->>Gauntlet: Dispatch sub-agent (batch-size N)
Gauntlet-->>TitanRun: gauntlet.ndjson (incremental)
TitanRun->>TitanRun: Stall check + progress tracking
end
TitanRun->>TitanRun: V5-V7 + NDJSON integrity
TitanRun->>TitanRun: G1-G4 Pre-Agent Gate
TitanRun->>Sync: Dispatch sub-agent
Sync-->>TitanRun: sync.json
TitanRun->>TitanRun: V8-V10 Post-sync validation
TitanRun->>TitanRun: Step 3.5a — Capture arch-snapshot.json
TitanRun->>User: FORGE CHECKPOINT — show plan, await confirmation
loop Until all sync phases complete (max 20 iters)
TitanRun->>TitanRun: G1-G4 Pre-Agent Gate
TitanRun->>Forge: Dispatch sub-agent (--phase N [--yes])
loop Per target in phase
Forge->>Forge: Step 8 — git add staged files
Forge->>Forge: Step 9 — Diff Review (D1-D5)
Forge->>Forge: Step 10 — Run tests
Forge->>Gate: Invoke /titan-gate (Step 11)
Note over Gate: Step 5 Semantic assertions<br/>Step 5.5 Arch snapshot compare
Gate-->>Forge: PASS / WARN / FAIL
Forge->>Forge: Step 12 — git commit (on PASS)
Forge->>Forge: Step 13 — rollback (on FAIL)
end
Forge-->>TitanRun: titan-state.json updated
TitanRun->>TitanRun: V11-V13 Post-phase validation
end
TitanRun->>TitanRun: V14-V15 Final state consistency
TitanRun->>User: TITAN PIPELINE COMPLETE report
|
|
|
||
| if currentAuditedCount == previousAuditedCount: | ||
| stallCount += 1 | ||
| Print: "WARNING: Gauntlet iteration <iteration> made no progress (stall <stallCount>/<maxStalls>)" | ||
| if stallCount >= maxStalls: | ||
| Stop: "Gauntlet stalled for <maxStalls> consecutive iterations at <currentAuditedCount>/<expectedTargetCount> targets. Likely stuck on a problematic target. Check gauntlet.ndjson for the last successful entry and investigate the next target in the batch." |
There was a problem hiding this comment.
Undefined variable
previousAuditedCountBeforeAgent in gauntlet efficiency check
The efficiency check references previousAuditedCountBeforeAgent, which is never defined in the pseudocode. By the time this line is reached, previousAuditedCount has already been updated to currentAuditedCount on the line just above, so using it there would always yield 0.
To correctly compute how many targets this iteration processed, you need to capture the pre-agent count before the update. For example:
previousAuditedCount = currentAuditedCount # update for next iteration's stall check
# Efficiency check: if progress is very slow (< 2 targets per iteration), warn
targetsThisIteration = currentAuditedCount - previousAuditedCountBeforeAgent # ← undefined
Should be:
# Save count before update for efficiency check
countBeforeUpdate = previousAuditedCount
previousAuditedCount = currentAuditedCount # update for next iteration's stall check
# Efficiency check
targetsThisIteration = currentAuditedCount - countBeforeUpdate
if targetsThisIteration == 1 and iteration > 3:
Print: "WARNING: Only 1 target per iteration..."
This same issue exists in the identical copy at docs/examples/claude-code-skills/titan-run/SKILL.md at the same line.
There was a problem hiding this comment.
Fixed — saved pre-update count to countBeforeUpdate before reassignment. Applied to both .claude/skills/ and docs/examples/ copies.
Impact: 1662 functions changed, 760 affected
|
Resolved merge conflicts with main and fixed the undefined previousAuditedCountBeforeAgent variable in the gauntlet loop efficiency check. |
| If any SHA doesn't exist → **WARN**: "Commit <sha> recorded in state but not found in git history. State may be out of sync." | ||
|
|
||
| **V15. Gate log consistency:** | ||
| If `.codegraph/titan/gate-log.ndjson` exists: |
There was a problem hiding this comment.
Incomplete merge conflict detection — misses
AU, UA, DU, UD markers
The check only lists UU, AA, DD as conflict indicators in git status --porcelain output. Git also produces AU (added by us, modified by them), UA (modified by us, added by them), DU (deleted by us, modified by them), and UD (modified by us, deleted by them) in the porcelain format for merge conflicts. Missing these cases could allow the pipeline to proceed with unresolved conflicts into a forge sub-agent that commits.
| If `.codegraph/titan/gate-log.ndjson` exists: | |
| - **Merge conflicts** (lines starting with `UU`, `AA`, `DD`, `AU`, `UA`, `DU`, `UD`): Stop immediately: "Unresolved merge conflict detected. Resolve before continuing." |
There was a problem hiding this comment.
Fixed — added AU, UA, DU, UD to the merge conflict detection markers in both .claude/skills/ and docs/examples/ copies.
| Gate now includes semantic assertions (Step 5) and architectural snapshot comparison (Step 5.5). | ||
| The arch snapshot is at .codegraph/titan/arch-snapshot.json. | ||
| 6. Commit on success, rollback on failure | ||
|
|
There was a problem hiding this comment.
--start-from forge silently disables architectural comparison without warning
When using --start-from forge, Step 3.5a (architectural snapshot capture) is skipped entirely since it lives inside Step 3.5, which comes after sync. The gate's Step 5.5 gracefully handles a missing arch-snapshot.json by skipping the comparison, but the user receives no warning that the architectural drift check will be inactive for the entire forge run.
Consider adding an explicit notice at the start of the forge loop when arch-snapshot.json is absent and the start phase was forge:
if startPhase == "forge" and not exists(".codegraph/titan/arch-snapshot.json"):
Print: "NOTE: No arch-snapshot.json found. Architectural comparison in /titan-gate (Step 5.5) will be skipped for this run.
To enable it, run '/titan-run --start-from sync' to re-capture the pre-forge snapshot."
There was a problem hiding this comment.
Fixed — added a pre-loop check in Step 4a that warns when arch-snapshot.json is missing, advising the user to run --start-from sync to re-capture it.
- Fix undefined previousAuditedCountBeforeAgent variable in gauntlet efficiency check (save pre-update count before reassignment) - Add AU, UA, DU, UD to merge conflict detection markers - Add warning when --start-from forge runs without arch-snapshot.json
- Fix undefined previousAuditedCountBeforeAgent variable in gauntlet efficiency check (save pre-update count before reassignment) - Add AU, UA, DU, UD to merge conflict detection markers - Add warning when --start-from forge runs without arch-snapshot.json
| 13. **On failure (test, gate, or diff-review):** | ||
| ```bash | ||
| git checkout -- <changed files> |
There was a problem hiding this comment.
Incomplete rollback — staged files not unstaged in Step 13
The new flow stages files in Step 8 (git add) before running tests (Step 10) and gate (Step 11). When a test or gate failure routes to Step 13, only the working-tree restoration is performed:
git checkout -- <changed files>This restores the file contents on disk but does not remove the files from the staging area. The changes remain indexed in git. If the orchestrator loops back and the next target's git add picks up different files, the leftover staged changes from the failed target will be silently included in the next commit.
The rollback should unstage first, then restore the working tree:
| 13. **On failure (test, gate, or diff-review):** | |
| ```bash | |
| git checkout -- <changed files> | |
| 13. **On failure (test, gate, or diff-review):** | |
| ```bash | |
| git reset HEAD <changed files> | |
| git checkout -- <changed files> | |
| ``` |
Note: the DIFF FAIL path in Step 9 already says "Unstage and revert changes" correctly — this gap only affects Step 13. The same fix is needed in the docs mirror at docs/examples/claude-code-skills/titan-forge/SKILL.md.
There was a problem hiding this comment.
Fixed — added unstaging step before working tree restore in Step 13 rollback, so staged files are properly cleared before checkout. Applied to both .claude/skills/ and docs/examples/ copies.
Both commands are called in titan-forge's new diff review step (Step 9): fn-impact for deletion audit (D4) and exports for re-export chain checks.
|
Addressed the remaining open item from the summary: README command table (P2): Restored The |
| | `codegraph exports` | GAUNTLET | Per-symbol export consumers | | ||
| | `codegraph fn-impact` | GAUNTLET, SYNC | Blast radius | | ||
| | `codegraph search` | GAUNTLET | Duplicate code detection (needs embeddings) | |
There was a problem hiding this comment.
Command table stale after new diff-review and semantic-assertion additions
Three commands were removed from FORGE/GATE in the table, but the new steps introduced in this PR use them:
-
codegraph fn-impact— listed as "GAUNTLET, SYNC" only, but is still called in titan-forge Step 4 (pre-touch understanding) and in the new Step 9 D4 (deletion audit). It's also now used in titan-gate Step 5a. All three users (FORGE, GATE) are missing. -
codegraph context— listed as "SYNC" only, but titan-forge Step 4 still callscodegraph context <target> -T --json(this code is unchanged and confirmed in the current file). FORGE should be restored. -
codegraph exports— listed as "GAUNTLET" only, but the new titan-gate Step 5a explicitly runscodegraph exports <changed-file> -T --json. GATE should be added.
The same table appears in .claude/skills/ (mirrored) so the same correction applies there.
Suggested correction:
| | `codegraph exports` | GAUNTLET | Per-symbol export consumers | | |
| | `codegraph fn-impact` | GAUNTLET, SYNC | Blast radius | | |
| | `codegraph search` | GAUNTLET | Duplicate code detection (needs embeddings) | | |
| | `codegraph exports` | GAUNTLET, GATE | Per-symbol export consumers | | |
| | `codegraph fn-impact` | GAUNTLET, SYNC, FORGE, GATE | Blast radius | |
And separately restore FORGE to the codegraph context row:
| `codegraph context` | SYNC, FORGE | Full function context |
There was a problem hiding this comment.
Fixed — added GATE to exports and fn-impact rows, added FORGE to context row in the command table. Verified against actual usage in titan-gate Step 5a and titan-forge Steps 4 and 9.
There was a problem hiding this comment.
Fixed — added GATE to codegraph communities, codegraph structure, and codegraph deps in the command table. These are all used by GATE Step 5.5 (architectural snapshot comparison) but were missing from the "Used by" column.
| 5. Run /titan-gate — read .claude/skills/titan-gate/SKILL.md and follow it exactly. | ||
| Gate now includes semantic assertions (Step 5) and architectural snapshot comparison (Step 5.5). | ||
| The arch snapshot is at .codegraph/titan/arch-snapshot.json. | ||
| 6. Commit on success, rollback on failure |
There was a problem hiding this comment.
--yes description inaccurate — forge no longer accepts this flag
The argument hint says --yes is "passed through to forge", but this PR removes --yes from titan-forge's supported arguments list (the --yes → skip confirmation prompt line was deleted from titan-forge). The flag is now passed to forge's dispatch invocation (line ~835 of this file: yesFlag = "--yes" if autoConfirm else ""), but forge silently ignores it since it no longer handles that argument.
--yes still has meaningful effects within the orchestrator itself (auto-resume on existing state, skip the pre-pipeline and forge-checkpoint confirmations), so it should stay — but the description should be updated to reflect its actual scope.
| 6. Commit on success, rollback on failure | |
| - `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) |
There was a problem hiding this comment.
Fixed — updated the --yes description to reflect its actual scope: orchestrator-level prompts only (pre-pipeline, forge checkpoint, and resume). Applied to both .claude/skills/ and docs/examples/ copies.
There was a problem hiding this comment.
Fixed — updated the Rules section description of --yes to clarify both purposes: (1) passed through to forge so it skips its per-phase confirmation prompt, and (2) within the orchestrator itself, skips pre-pipeline, forge checkpoint, and resume prompts. Updated in both .claude/skills/titan-run/SKILL.md and the example copy.
Impact: 43 functions changed, 46 affected
…ons" This reverts commit a296b58.
…ptave/codegraph into feat/release-skill-auto-semver Impact: 345 functions changed, 405 affected
|
Addressed round 2 Greptile feedback:
All applied to both .claude/skills/ and docs/examples/ copies. |
.claude/skills/titan-gate/SKILL.md
Outdated
| **A1. Community stability:** | ||
| Compare community assignments between snapshot and current. For each symbol that **moved** to a different community: | ||
| - If the symbol was the target of this forge phase → **OK** (expected) | ||
| - If the symbol was NOT touched in the diff → **WARN**: "Symbol `<name>` shifted from community <X> to <Y> as a side effect" | ||
| - If > 5 untouched symbols shifted communities → **FAIL**: "Significant community restructuring detected — <N> symbols shifted communities. This change may have unintended architectural impact." |
There was a problem hiding this comment.
Louvain community IDs are non-deterministic — A1 comparison will produce false positives
The Louvain algorithm (used by codegraph communities) assigns community IDs non-deterministically. Two runs on the exact same codebase can produce different ID assignments for identical clusters (e.g., "community 1" pre-forge might be "community 3" post-forge even if nothing changed). Comparing community assignments by ID between the pre-forge snapshot and the current state will therefore flag nearly every symbol as having "moved," triggering spurious WARN and FAIL verdicts on completely valid commits.
The same issue exists at docs/examples/claude-code-skills/titan-gate/SKILL.md lines 189–193.
To make this comparison reliable, community membership should be compared by content (e.g., match communities across snapshots by finding the best overlap in member sets — Hungarian algorithm or a simple "largest intersection" heuristic), not by raw ID. Alternatively, codegraph communities --drift is already designed to track community evolution across time and should be used for this comparison instead of directly diffing the raw communities output.
There was a problem hiding this comment.
Fixed — A1 comparison now uses codegraph communities --drift instead of raw community IDs. This uses content-based matching (largest-intersection heuristic) rather than fragile IDs that change across runs. Applied to both .claude/skills/ and docs/examples/ copies.
.claude/skills/titan-run/SKILL.md
Outdated
| - `--skip-gauntlet` → skip gauntlet (assumes artifacts exist) | ||
| - `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge` | ||
| - `--gauntlet-batch-size <N>` → batch size for gauntlet (default: 5) | ||
| - `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) |
There was a problem hiding this comment.
--yes description contradicts the Rules section and the dispatch code
Line 21 says --yes applies to "orchestrator-level prompts only (pre-pipeline, forge checkpoint, and resume prompts)", but:
- Line 574 (Rules section) explicitly says "Pass --yes through to forge if the user provided it, so forge doesn't prompt for confirmation on each phase."
- The dispatch code on line 421 confirms this:
yesFlag = "--yes" if autoConfirm else "", which is then passed to forge on line 422. titan-forgeStep 0.8 still checks for--yesto skip per-phase confirmation.
The description was updated as part of a previous fix but the Rules section (line 574) was not reconciled with it — and the Rules section is actually the more accurate description of the real behavior.
The same inconsistency exists in docs/examples/claude-code-skills/titan-run/SKILL.md at the corresponding lines.
| - `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) | |
| - `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) and in forge (per-phase confirmation) |
There was a problem hiding this comment.
Fixed — updated --yes description to explicitly include forge scope: 'skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) and in forge (per-phase confirmation)'. Consistent with the Rules section and dispatch code.
.claude/skills/titan-gate/SKILL.md
Outdated
| codegraph communities -T --json > /tmp/titan-arch-current-communities.json | ||
| codegraph structure --depth 2 --json > /tmp/titan-arch-current-structure.json | ||
| codegraph communities --drift -T --json > /tmp/titan-arch-current-drift.json |
There was a problem hiding this comment.
Hardcoded
/tmp paths not cleaned up — potential stale data across gate invocations
The three temp files are written to hardcoded /tmp/titan-arch-current-*.json paths and never deleted within the gate skill. Because /tmp is shared across worktrees, two concurrent invocations of /titan-gate (e.g., from parallel forge sessions in different worktrees) would overwrite each other's files mid-run, producing subtly wrong comparisons without any error signal.
Even within a single session, stale files from a prior gate run would sit in /tmp until the next run overwrites them — harmless in the common case but fragile.
Consider either:
- Using a unique path per invocation:
/tmp/titan-arch-<timestamp>-current-communities.json - Adding a cleanup step at the end of Step 5.5 (or in the gate's exit path) to
rm -f /tmp/titan-arch-current-*.json
The same applies to docs/examples/claude-code-skills/titan-gate/SKILL.md at the corresponding lines.
There was a problem hiding this comment.
Fixed — temp files now use unique paths with epoch+PID suffix (/tmp/titan-arch-${TITAN_TMP_ID}-current-*.json) and are cleaned up at the end of Step 5.5. No more cross-worktree collisions.
.claude/skills/titan-gate/SKILL.md
Outdated
| > "GATE FAIL: [reason]. Graph restored, changes unstaged but preserved. Fix and re-stage." | ||
|
|
||
| For structural-only failures (Steps 2-4, 6-8), do NOT auto-rollback — report and let user decide. | ||
| For structural-only failures (Steps 1-3, 5-7), do NOT auto-rollback — report and let user decide. |
There was a problem hiding this comment.
Stale step range excludes new semantic assertions from auto-rollback
The auto-rollback exception "Steps 1-3, 5-7" was written when Step 5 was "Branch structural diff." This PR inserts the new Step 5 (Semantic assertions) and Step 5.5 (Architectural snapshot) before the old Step 5, which is now Step 6. The step numbers in the exception were never updated.
As a result, the new Step 5 (semantic assertions) and Step 5.5 (architectural snapshot) are now implicitly classified as "structural-only" failures that skip auto-rollback. But a semantic FAIL — e.g., "Removed export parseConfig still imported by 3 files" — is a real code-correctness breakage, not a structural advisory. Leaving staged changes in place after such a FAIL means the next operation could commit broken code.
The exception should be updated to refer to the old structural/advisory checks by their new numbers:
| For structural-only failures (Steps 1-3, 5-7), do NOT auto-rollback — report and let user decide. | |
| For structural-only failures (Steps 1-3, 6-8), do NOT auto-rollback — report and let user decide. |
The same line exists in docs/examples/claude-code-skills/titan-gate/SKILL.md at the corresponding position and needs the identical fix.
There was a problem hiding this comment.
Fixed — updated auto-rollback exception from 'Steps 1-3, 5-7' to 'Steps 1-3, 6-8' to reflect the step renumbering after new Step 5 (semantic assertions) and Step 5.5 (architectural snapshot) were inserted.
| - **Back up state before every sub-agent.** The `.bak` file is your safety net against mid-write crashes. | ||
| - **Mandatory pause before forge** unless `--yes` is set. Analysis is safe; code changes deserve human review. | ||
| - **Stall detection is strict for forge** (2 retries) and looser for gauntlet (3 retries) since gauntlet is more likely to hit context limits legitimately. | ||
| - **Respect --start-from.** Skip phases before the specified starting point, but verify their artifacts exist AND pass validation. |
There was a problem hiding this comment.
--start-from artifact validation promised in Rules but absent from the procedure
The Rules section states:
Respect --start-from. Skip phases before the specified starting point, but verify their artifacts exist AND pass validation.
However, none of Steps 1–3 (or any preamble step) contains the corresponding "if phase is being skipped, run its V-checks" logic. When the orchestrator runs --start-from forge, it jumps straight to Step 3.5/4 and reads sync.json directly. A missing or partially-corrupt gauntlet.ndjson (e.g., 35 % corrupt lines from a previous crash) would silently pass through — the V5/V6 integrity checks only run when Step 2 is NOT skipped.
The practical failure mode: an agent running --start-from forge on a corrupted or incomplete gauntlet file will proceed to forge and make real commits based on bad recommendations, with no upfront warning.
Consider adding an explicit pre-loop artifact validation block at the top of each skipped phase, or consolidating it into a "Step 0.5 — Artifact pre-validation" that runs for all phases preceding startPhase. At minimum, the V-checks for the prior phases should be extracted and re-run even when the phase itself is skipped.
The same gap exists in docs/examples/claude-code-skills/titan-run/SKILL.md.
There was a problem hiding this comment.
Fixed — added Step 0.5 (Artifact pre-validation) that runs V-checks for all phases before the start phase. If recon is skipped, runs V1/V2/V4; if gauntlet is skipped, runs V5/V6/V7 + NDJSON integrity; if sync is skipped, runs V8/V9/V10. Missing or invalid artifacts stop the pipeline before any sub-agent is spawned.
…rollback range, FAIL template - Use codegraph communities --drift for A1 comparison instead of raw IDs - Add unique /tmp paths per invocation (epoch+PID) with cleanup - Update auto-rollback exception from Steps 1-3,5-7 to Steps 1-3,6-8 - Split FAIL template: test/lint (rollback) vs structural/semantic (preserved)
… validation - Update --yes description to include forge per-phase confirmation - Add Step 0.5 artifact pre-validation for --start-from skipped phases
|
Addressed all 6 outstanding Greptile items from Round 3:
All changes applied to both |
.claude/skills/titan-run/SKILL.md
Outdated
|
|
||
| ## Rules | ||
|
|
||
| - **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. The ONE exception: the post-forge test run (V13) and NDJSON integrity checks are run directly since they're pure validation. |
There was a problem hiding this comment.
Arch snapshot commands missing from Rules exception list
The Rules section declares exactly two exceptions to "never run codegraph commands directly": the post-forge test run (V13) and NDJSON integrity checks. However, Step 3.5a runs three codegraph commands directly from the orchestrator:
codegraph communities -T --json
codegraph structure --depth 2 --json
codegraph communities --drift -T --json
An agent following the Rules strictly would interpret Step 3.5a as a violation and skip it, resulting in no arch-snapshot.json being written. Since gate's Step 5.5 silently skips when arch-snapshot.json is absent, the entire architectural comparison layer would be silently disabled for every forge run — with no indication to the user.
Step 3.5a should be explicitly added to the Rules exception list, e.g.:
| - **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. The ONE exception: the post-forge test run (V13) and NDJSON integrity checks are run directly since they're pure validation. | |
| - **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. The ONE exception: the post-forge test run (V13), NDJSON integrity checks, and the pre-forge architectural snapshot capture (Step 3.5a) are run directly since they're pure validation or one-time data collection. |
The same line exists in docs/examples/claude-code-skills/titan-run/SKILL.md and needs the same fix.
There was a problem hiding this comment.
Fixed — added Step 3.5a (pre-forge architectural snapshot capture) to the Rules exception list. The wording now reads "Exceptions (pure validation/snapshot, no code changes): the post-forge test run (V13), NDJSON integrity checks, and the pre-forge architectural snapshot capture (Step 3.5a) are run directly by the orchestrator." Applied to both .claude/skills/ and docs/examples/ copies.
|
|
||
| ## Step 0.5 — Artifact pre-validation (--start-from only) | ||
|
|
||
| **Only run this step if `--start-from` was specified.** Phases before the start point are being skipped — their artifacts must exist and be valid before proceeding. | ||
|
|
There was a problem hiding this comment.
--skip-recon/--skip-gauntlet bypass Step 0.5 artifact pre-validation
Step 0.5 says "Only run this step if --start-from was specified." But --skip-recon and --skip-gauntlet also skip phases without triggering Step 0.5's validation. The table in Step 0.5 covers exactly what's needed when these phases are skipped:
--skip-recon→ should run V1/V2/V4 (but doesn't)--skip-gauntlet→ should run V5/V6/V7 + NDJSON integrity check (but doesn't)
The failure mode is concrete: a user running /titan-run --skip-gauntlet with a partially-corrupt gauntlet.ndjson (e.g., from a prior crash) will skip directly to sync, which produces a sync.json based on bad data, and then forge makes real commits against wrong recommendations — all with no upfront warning. --start-from gauntlet and --skip-recon are semantically equivalent, but only one triggers the guard.
The condition in Step 0.5 should also include the skip flags:
| ## Step 0.5 — Artifact pre-validation (--start-from only) | |
| **Only run this step if `--start-from` was specified.** Phases before the start point are being skipped — their artifacts must exist and be valid before proceeding. | |
| **Only run this step if `--start-from` was specified OR if `--skip-recon` or `--skip-gauntlet` is set.** Phases before the start point are being skipped — their artifacts must exist and be valid before proceeding. |
The same issue exists in docs/examples/claude-code-skills/titan-run/SKILL.md at the corresponding location.
There was a problem hiding this comment.
Fixed — Step 0.5 now triggers when any phase-skipping flag is set: --start-from, --skip-recon, or --skip-gauntlet. The heading was updated to "Artifact pre-validation (phase skip)" and the condition text specifies which artifacts to validate for each flag. Applied to both copies.
| # V13. Test suite still green after forge commits | ||
| # Quick sanity — run tests to make sure the cumulative commits haven't broken anything | ||
| npm test 2>&1 | ||
| if tests fail: | ||
| Print: "CRITICAL: Test suite fails after forge phase <nextPhase>. Stopping pipeline." | ||
| Print: "Commits from this phase: git log --oneline <headBefore>..<headAfter>" | ||
| Print: "Consider reverting: git revert <headBefore>..<headAfter>" | ||
| Stop. |
There was a problem hiding this comment.
V13 hardcodes
npm test — inconsistent with gate's tool detection
The gate's Step 4 detects the available test runner by checking package.json scripts before running npm test. V13 here skips that check and unconditionally runs npm test. On a project using pnpm, yarn, bun, or a project with no test script defined, this produces a misleading failure (e.g., npm error Missing script: "test") rather than a clean "no test runner found" message, and the pipeline would stop on a false positive.
The same hardcoded npm test appears in titan-forge Step 10 (line 190 of .claude/skills/titan-forge/SKILL.md). Consider either detecting the runner first (matching gate's approach), or at minimum documenting the assumption that only npm projects are supported.
The same issue exists in docs/examples/claude-code-skills/titan-run/SKILL.md and docs/examples/claude-code-skills/titan-forge/SKILL.md.
There was a problem hiding this comment.
Fixed — V13 in titan-run and Step 10 in titan-forge now say "detect the project's test command from package.json scripts — npm test, yarn test, pnpm test, etc." with \ as the placeholder instead of hardcoded \Unknown command: test"
| TITAN_TMP_ID="$(date +%s)-$$" | ||
| codegraph communities -T --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-communities.json | ||
| codegraph structure --depth 2 --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-structure.json | ||
| codegraph communities --drift -T --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json | ||
| ``` | ||
|
|
||
| ### Compare | ||
|
|
||
| **A1. Community stability:** | ||
| Use the drift output (which uses content-based matching, not raw IDs, to track community movements across runs): | ||
|
|
||
| ```bash | ||
| # Compare current drift report against snapshot drift baseline | ||
| # New drift warnings not present in arch-snapshot.json → side-effect restructuring | ||
| ``` | ||
|
|
||
| Read `.codegraph/titan/arch-snapshot.json → drift` (the pre-forge drift baseline) and compare against `/tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json`: | ||
| - For each **new** drift warning in current that was NOT present in the snapshot: if the drifted symbol was NOT touched in the diff → **WARN**: "Symbol `<name>` drifted community as a side effect" | ||
| - If > 5 untouched symbols appear in new drift warnings → **FAIL**: "Significant community restructuring detected — <N> symbols drifted communities. This change may have unintended architectural impact." | ||
|
|
||
| **A2. Dependency direction between domains:** | ||
| From `GLOBAL_ARCH.md`, extract the expected dependency direction between domains (e.g., "presentation depends on features, not the reverse"). | ||
|
|
||
| Check if any new cross-domain dependency violates the expected direction: | ||
| ```bash | ||
| codegraph deps <changed-file> --json | ||
| ``` | ||
| - New upward dependency (lower layer importing higher layer) not present in snapshot → **FAIL** | ||
| - New lateral dependency within the same layer → **OK** | ||
|
|
||
| **A3. Cohesion delta:** | ||
| Compare directory cohesion scores from `structure`: | ||
| - If any directory's cohesion dropped by > 0.2 → **WARN**: "Directory `<dir>` cohesion dropped from <X> to <Y>" | ||
| - If a directory went from above 0.5 to below 0.3 → **FAIL**: "Directory `<dir>` became tangled (cohesion <X> → <Y>)" | ||
|
|
||
| **A4. New drift warnings:** | ||
| Compare drift warnings between snapshot and current: | ||
| - New drift warning not in snapshot → **WARN** with details | ||
| - Drift warning resolved → note as positive | ||
|
|
||
| ### Cleanup | ||
|
|
||
| ```bash | ||
| rm -f /tmp/titan-arch-${TITAN_TMP_ID}-current-communities.json \ | ||
| /tmp/titan-arch-${TITAN_TMP_ID}-current-structure.json \ | ||
| /tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json |
There was a problem hiding this comment.
TITAN_TMP_ID shell variable won't survive between separate Bash tool invocations
TITAN_TMP_ID is set in the "Capture current state" block (line 182), but the "Cleanup" block (lines 225-227) and the inline A1 comparison prose (line 198) reference ${TITAN_TMP_ID} as if it's still in scope. In Claude Code, each bash code block is run as a separate shell invocation — shell variables do not persist between them.
When an AI agent executes the cleanup block independently, TITAN_TMP_ID is unset, so the rm -f becomes rm -f /tmp/titan-arch--current-*.json, which silently matches nothing. The actual temp files at /tmp/titan-arch-<timestamp>-<pid>-current-*.json are left behind — re-introducing the stale-file problem that the TITAN_TMP_ID fix was meant to solve.
The same issue exists at docs/examples/claude-code-skills/titan-gate/SKILL.md:182-227.
Options to fix:
- Combine the capture, compare, and cleanup into a single bash block so the variable stays in scope:
TITAN_TMP_ID="$(date +%s)-$$"
codegraph communities -T --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-communities.json
codegraph structure --depth 2 --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-structure.json
codegraph communities --drift -T --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json
# ... comparison logic inline ...
rm -f /tmp/titan-arch-${TITAN_TMP_ID}-current-communities.json \
/tmp/titan-arch-${TITAN_TMP_ID}-current-structure.json \
/tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json- Or use a fixed but gated per-invocation directory (mkdir + trap):
TITAN_TMP_DIR="$(mktemp -d /tmp/titan-arch-XXXXXX)"
trap 'rm -rf "$TITAN_TMP_DIR"' EXIT
codegraph communities -T --json > "$TITAN_TMP_DIR/communities.json"
codegraph structure --depth 2 --json > "$TITAN_TMP_DIR/structure.json"
codegraph communities --drift -T --json > "$TITAN_TMP_DIR/drift.json"4e93913 to
9274f3f
Compare
Summary
/titan-runorchestrator skill that dispatches the full Titan pipeline (recon → gauntlet → sync → forge) to sub-agents with fresh context windows, enabling hands-free end-to-end execution/titan-forge(Step 9) — verifies each diff matches the gauntlet recommendation and sync plan intent before gate runs (scope check, intent match, commit message accuracy, deletion audit, leftover check)/titan-gate(Step 5) — export signature stability, import resolution integrity, dependency direction assertions, re-export chain validation/titan-gate(Step 5.5) — captures pre-forge architectural baseline, compares community stability, cross-domain dependency direction, cohesion delta, and drift after each commit/titan-run: Pre-Agent Gate (G1-G4), post-phase validation (V1-V15), stall detection, state file backup/recovery, NDJSON integrity checks, mandatory human checkpoint before forgeTest plan
/titan-runon a test codebase in a worktree — verify full pipeline completes--start-from forgeskips analysis phases but validates their artifacts exist