Skip to content

feat: type analysis modules, parser WASM maps, extract canStripTypes (#554 follow-up)#558

Open
carlos-alm wants to merge 23 commits intomainfrom
feat/ts-deferred-followups
Open

feat: type analysis modules, parser WASM maps, extract canStripTypes (#554 follow-up)#558
carlos-alm wants to merge 23 commits intomainfrom
feat/ts-deferred-followups

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Type db: anyBetterSqlite3.Database across all 9 domain/analysis/*.ts modules, with NodeRow, RelatedNodeRow, and other types from types.ts replacing untyped parameters
  • Type WASM parser maps in domain/parser.ts using web-tree-sitter exported types (Parser, Language, Query, Tree) with a new LanguageRegistryEntry interface
  • Extract canStripTypes Node version probe to tests/helpers/node-version.js shared helper (ready for import when CLI test guards land)

All three items were deferred as follow-ups acknowledged in PR #554 comments.

Test plan

  • npx tsc --noEmit — zero type errors
  • npm run lint — zero errors (only pre-existing style warnings)
  • npx vitest run — 2002 tests pass, 107 test files pass (3 pre-existing CLI spawn failures unchanged)
  • Verify no runtime regressions in CI

…pTypes helper (#554)

Address deferred review feedback from PR #554:
- Convert all 9 domain/analysis/*.js → .ts with BetterSqlite3.Database
  and NodeRow type annotations replacing untyped db/node parameters
- Type WASM parser cache maps (Parser, Language, Query) in domain/parser.ts
  with LanguageRegistryEntry interface and full function signatures
- Extract canStripTypes Node version probe to tests/helpers/node-version.js
  shared helper (ready for import when CLI test TS guards land)
@claude
Copy link

claude bot commented Mar 21, 2026

Claude finished @carlos-alm's task —— View job


Code Review Analysis

Conducting comprehensive 5-phase review for PR #558...

Todo List

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection - check for high-risk changes
  • Phase 3: Test integrity verification - ensure no weakened assertions
  • Phase 4: Code quality analysis - review TypeScript typing improvements
  • Phase 5: Backlog compliance evaluation - rate against foundation principles
  • Final assessment and recommendation

Root Cause Analysis

Problem Being Solved: This PR addresses deferred TypeScript typing improvements from PR #554. It converts 9 analysis modules from .js to .ts with proper type annotations, particularly replacing db: any with BetterSqlite3.Database and using proper row types from types.ts.

Why This Approach: The approach is sound - these are legitimate typing improvements that:

  1. Replace untyped any parameters with proper BetterSqlite3.Database types
  2. Use well-defined row interfaces (NodeRow, RelatedNodeRow, etc.) from types.ts
  3. Add proper WASM parser types using web-tree-sitter exports
  4. Extract a shared helper to reduce code duplication

Risk Assessment: Very low risk. This is purely additive typing with no runtime behavior changes.

Code Quality Assessment

Excellent TypeScript Migration:

  • Proper database typing: All db: anyBetterSqlite3.Database conversions are correct
  • Row type usage: Consistent use of NodeRow, RelatedNodeRow, ChildNodeRow from types.ts
  • WASM parser types: Good use of web-tree-sitter exports (Parser, Language, Query, Tree)
  • Interface design: New LanguageRegistryEntry interface is well-structured
  • Helper extraction: canStripTypes helper properly extracted to shared location

Type Safety Improvements:

  • All SQL query results properly typed with database row interfaces
  • Better-sqlite3 import type correctly used for database parameters
  • WASM type imports from web-tree-sitter are appropriate
  • Consistent biome-ignore comments for legitimate any usage (dynamic config shapes, native addon)

Code Quality: High quality conversion that maintains all existing functionality while adding proper type safety. No runtime changes detected.

Backlog Compliance

  • Zero-dep: ✅ — No new runtime dependencies added
  • Foundation-aligned: ✅ — Fully compliant with all principles, improves type safety without affecting core architecture
  • Problem-fit: 4/5 — Improves developer experience and reduces potential type-related bugs, which helps AI agents work with the codebase more reliably
  • Breaking: No — Purely additive typing, no API changes
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent TypeScript conversion that improves type safety without introducing risk. Clean implementation following existing patterns, proper use of defined interfaces, and good documentation. Only minor deduction because this is primarily maintenance work rather than feature development.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR completes the type-safety follow-up from #554, replacing db: any with BetterSqlite3Database across all nine domain/analysis/*.ts modules, typing the WASM parser maps with proper web-tree-sitter types, and extracting the canStripTypes Node version probe to a shared test helper. It also bundles a solid set of performance fixes previously identified in review: hoisted db.prepare() calls out of BFS loops, WeakMap-cached schema probes, and elimination of duplicate DB roundtrips.

Key changes:

  • All nine domain/analysis/ modules migrate db: anyBetterSqlite3Database with typed row casts (NodeRow, RelatedNodeRow, etc.)
  • parser.ts types _cachedParsers as Map<string, Parser | null>, _cachedLanguages as Map<string, Language>, and _queryCache as Map<string, Query>; promotes TS_BACKFILL_EXTS to a module-level constant
  • nodes.ts adds getLineCountForNode and getMaxEndLineForFile using the cachedStmt pattern, exported through db/index.js
  • types.ts tightens SqliteStatement (adds raw()), BetterSqlite3Database.exec (returns this), and transaction generic (preserves full function signature F)
  • vendor.d.ts adds raw(toggle?: boolean): this to the ambient better-sqlite3 Statement declaration

Issues found:

  • LanguageRegistryEntry is now defined twice with inconsistent field types: the original in src/types.ts (typed LanguageId/ExtractorFn) and a new weaker copy in src/domain/parser.ts (string/any). The types.ts definition was not removed.
  • module-map.ts uses import type BetterSqlite3 from 'better-sqlite3' (package types) while all other nine analysis modules use the project's own BetterSqlite3Database interface.
  • The new _getLineCountForNodeStmt and _getMaxEndLineForFileStmt WeakMaps in nodes.ts lack the StmtCache<T> type annotation used elsewhere.

Confidence Score: 3/5

  • Safe to merge with the LanguageRegistryEntry duplication addressed — it causes no runtime error today but will create confusion as the type diverges over time.
  • The bulk of the PR is a clean, well-executed type migration with zero runtime risk. The performance fixes (hoisted statements, cached probes) are correct and already validated by 2002 passing tests. Score is held at 3 because of the genuine LanguageRegistryEntry duplication: two exported interfaces with the same name but different field types (LanguageId/ExtractorFn in types.ts vs string/any in parser.ts) now coexist in the codebase. Nothing currently imports it by name, so there is no active breakage, but leaving both in place sets a maintenance trap for the next person who adds a registry entry or types a function parameter against it.
  • src/domain/parser.ts (LanguageRegistryEntry duplication with types.ts), src/domain/analysis/module-map.ts (inconsistent DB type import), src/db/repository/nodes.ts (untyped WeakMap caches)

Important Files Changed

Filename Overview
src/domain/parser.ts Adds proper types for WASM parser maps (Map<string, Parser
src/domain/analysis/module-map.ts Replaces db: any with BetterSqlite3.Database (from the better-sqlite3 package directly), which is inconsistent with all other nine analysis modules that use the project's BetterSqlite3Database interface from types.ts.
src/db/repository/nodes.ts Adds getLineCountForNode and getMaxEndLineForFile helpers using the cachedStmt pattern, but the WeakMap caches (_getLineCountForNodeStmt, _getMaxEndLineForFileStmt) are untyped (new WeakMap()), inconsistent with the StmtCache<T> annotation pattern used elsewhere.
src/domain/analysis/context.ts Thorough type migration: replaces db: any with BetterSqlite3Database, types all DB row results, adds cachedStmt (_explainNodeStmtCache: StmtCache<NodeRow>), imports getLineCountForNode/getMaxEndLineForFile, and eliminates duplicate findCallers DB roundtrip via allCallerRows.
src/domain/analysis/dependencies.ts Hoists upstreamStmt and nodeByIdStmt out of BFS/closure loops, replaces db: any with BetterSqlite3Database, types all row shapes, and eliminates per-node redundant ID re-query by including n.id in the upstream SELECT.
src/domain/analysis/exports.ts Adds _hasExportedColCache WeakMap to cache schema probe per db handle, hoists four db.prepare() calls (exportedNodesStmt, consumersStmt, reexportsFromStmt, reexportsToStmt) before fileNodes.map(), and removes db: any in favour of BetterSqlite3Database.
src/domain/analysis/impact.ts Hoists defsStmt before changedRanges loop, types _hasImplementsCache and bfsTransitiveCallers with BetterSqlite3Database, and types all level/result arrays. impactAnalysisData opts narrowed to { noTests? } (limit/offset were accepted but never applied).
src/types.ts Adds raw(toggle?: boolean): this to SqliteStatement, tightens exec to return this, and improves transaction generic from T to F extends (...args: any[]) => any to preserve full caller signature. LanguageRegistryEntry at line 470 (with typed LanguageId and ExtractorFn fields) is now shadowed by a weaker duplicate in parser.ts.
src/vendor.d.ts Adds raw(toggle?: boolean): this to the Statement interface, consistent with the SqliteStatement addition in types.ts, allowing the schema probe in exports.ts to drop the as any cast.
tests/helpers/node-version.js New shared helper that exports canStripTypes based on Node version (>= 22.6). Simple, correct, and ready for import by CLI test guards.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph "Type Sources"
        T["src/types.ts\nBetterSqlite3Database\nLanguageRegistryEntry (LanguageId/ExtractorFn)\nNodeRow, RelatedNodeRow, StmtCache&lt;T&gt;"]
        V["src/vendor.d.ts\nbetter-sqlite3 Statement\n+ raw(toggle?: boolean): this"]
    end

    subgraph "DB Layer"
        DBI["src/db/index.js\n+ getLineCountForNode\n+ getMaxEndLineForFile"]
        DBRN["src/db/repository/nodes.ts\n+ getLineCountForNode\n+ getMaxEndLineForFile\n(cachedStmt pattern)"]
    end

    subgraph "Analysis Modules — db: any → BetterSqlite3Database"
        A1["brief.ts"]
        A2["context.ts\n+ cachedStmt _explainNodeStmtCache\n+ allCallerRows dedup"]
        A3["dependencies.ts\n+ upstreamStmt hoisted\n+ nodeByIdStmt hoisted"]
        A4["exports.ts\n+ _hasExportedColCache\n+ 4 stmts hoisted"]
        A5["impact.ts\n+ defsStmt hoisted"]
        A6["implementations.ts"]
        A7["module-map.ts\nBetterSqlite3.Database ⚠️"]
        A8["roles.ts"]
        A9["symbol-lookup.ts"]
    end

    subgraph "Parser"
        P["src/domain/parser.ts\nMap&lt;string, Parser|null&gt;\nMap&lt;string, Language&gt;\nMap&lt;string, Query&gt;\nLanguageRegistryEntry ⚠️ (duplicate)"]
    end

    subgraph "Tests"
        H["tests/helpers/node-version.js\nexport canStripTypes (Node >= 22.6)"]
    end

    T -->|"type imports"| A1 & A2 & A3 & A4 & A5 & A6 & A8 & A9
    V -->|"raw() type fix"| A4
    DBRN -->|"exports"| DBI
    DBI -->|"imports"| A2
    T -.->|"LanguageRegistryEntry also in"| P
Loading

Reviews (8): Last reviewed commit: "fix: add null guard for result.get(relPa..." | Re-trigger Greptile

Comment on lines +5 to +6
const [_major, _minor] = process.versions.node.split('.').map(Number);
export const canStripTypes = _major > 22 || (_major === 22 && _minor >= 6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Underscore-prefix on used variables is misleading

Both _major and _minor carry the underscore "potentially-unused" convention, yet both are referenced in the canStripTypes expression. Biome's noUnusedVariables rule silently skips identifiers whose names start with _, so future readers (and the linter) receive a false signal that these bindings are intentionally discarded.

Consider dropping the leading underscores so the intent is clear:

Suggested change
const [_major, _minor] = process.versions.node.split('.').map(Number);
export const canStripTypes = _major > 22 || (_major === 22 && _minor >= 6);
const [major, minor] = process.versions.node.split('.').map(Number);
export const canStripTypes = major > 22 || (major === 22 && minor >= 6);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — dropped the underscore prefix from major and minor since both are referenced in the canStripTypes expression.

Comment on lines +116 to +121
try {
db.prepare('SELECT exported FROM nodes LIMIT 0').raw();
// biome-ignore lint/suspicious/noExplicitAny: Statement.raw() exists at runtime but is missing from type declarations
(db.prepare('SELECT exported FROM nodes LIMIT 0') as any).raw(true);
hasExportedCol = true;
} catch (e) {
debug(`exported column not available, using fallback: ${e.message}`);
} catch (e: unknown) {
debug(`exported column not available, using fallback: ${(e as Error).message}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 as any cast for Statement.raw() may be unnecessary

The biome-ignore comment explains that Statement.raw() is "missing from type declarations," but current better-sqlite3 TypeScript typings (v9+) do declare Statement.raw(raw?: boolean): this. If the project's installed type version already includes this signature, the as any cast and its accompanying biome-ignore directive are both unnecessary — and the latter suppresses the Biome noExplicitAny warning for all code on that line.

It's worth checking whether dropping the cast resolves cleanly:

db.prepare('SELECT exported FROM nodes LIMIT 0').raw(true);

If tsc still complains, it confirms the local typings are outdated and the workaround is warranted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added raw(toggle?: boolean): this to the Statement interface in vendor.d.ts, which allows us to remove the as any cast and the biome-ignore comment.

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +468 to +477
relatedTests,
...implInfo,
};
});

const base = { name, results };
return paginateResult(base, 'results', { limit: opts.limit, offset: opts.offset });
} finally {
db.close();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inline SQL bypasses the db/index.js abstraction layer

Two queries in explainFileImpl are issued directly on db rather than going through the functions in db/index.js:

const metric = db
  .prepare(`SELECT nm.line_count FROM node_metrics nm WHERE nm.node_id = ?`)
  .get(fn.id) as { line_count: number } | undefined;
...
const maxLine = db
  .prepare(`SELECT MAX(end_line) as max_end FROM nodes WHERE file = ?`)
  .get(fn.file) as { max_end: number | null } | undefined;

Keeping all SQL behind the DB layer makes future schema changes easier to track and ensures uniform error handling. If these queries are genuinely one-off, a brief comment explaining why they're inlined (e.g., "no existing helper for node_metrics") would make the intent clear to future readers.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — extracted the two inline SQL queries into getLineCountForNode and getMaxEndLineForFile functions in db/repository/nodes.js, using the standard cachedStmt pattern. context.ts now imports and calls these instead of issuing raw SQL.

@carlos-alm
Copy link
Contributor Author

Addressed all Greptile review feedback:

  1. Underscore prefix on used vars — dropped _major/_minor to major/minor in tests/helpers/node-version.js
  2. as any cast for Statement.raw() — added raw() to vendor.d.ts Statement interface, removed the cast and biome-ignore in exports.ts
  3. Inline SQL bypasses db abstraction — extracted getLineCountForNode and getMaxEndLineForFile into db/repository/nodes.js using cachedStmt pattern
  4. N+1 re-query in buildTransitiveCallers — callers already carry .id from findCallers; updated parameter type to accept id directly, eliminating per-caller SELECT
  5. customDbPath type inconsistency — aligned implementations.ts to string (was string | undefined)

All passing: zero type errors, zero lint errors, 546 integration tests green.

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +80 to +93
const upstream = db
.prepare(`
SELECT n.name, n.kind, n.file, n.line
FROM edges e JOIN nodes n ON e.source_id = n.id
WHERE e.target_id = ? AND e.kind = 'calls'
`)
.all(f.id) as Array<{ name: string; kind: string; file: string; line: number }>;
for (const u of upstream) {
if (noTests && isTestFile(u.file)) continue;
const uid = (
db
.prepare('SELECT id FROM nodes WHERE name = ? AND kind = ? AND file = ? AND line = ?')
.get(u.name, u.kind, u.file, u.line) as { id: number } | undefined
)?.id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 db.prepare() inside nested BFS loops forces redundant SQL compilation and an extra DB roundtrip per node

Two db.prepare() calls are issued on every inner-loop iteration during BFS traversal. better-sqlite3 compiles the query string on each call, so the first prepare runs O(depth × frontier) compilations instead of one. More critically, the first query omits n.id from its SELECT, forcing a second query (SELECT id FROM nodes WHERE name = ? AND kind = ? AND file = ? AND line = ?) for every upstream node just to obtain the ID — even though the ID is already available on the join side (e.source_id).

For comparison, bfsShortestPath in the same file handles this correctly: the statement is compiled once before the loops, and n.id is included in the single-query SELECT.

Fix: hoist both statements out of the loops and add n.id to the first SELECT to eliminate the second lookup entirely:

// Hoist before the loop
const upstreamStmt = db.prepare(`
  SELECT n.id, n.name, n.kind, n.file, n.line
  FROM edges e JOIN nodes n ON e.source_id = n.id
  WHERE e.target_id = ? AND e.kind = 'calls'
`);

for (let d = 2; d <= depth; d++) {
  const nextFrontier: typeof frontier = [];
  for (const f of frontier) {
    if (visited.has(f.id)) continue;
    visited.add(f.id);
    const upstream = upstreamStmt.all(f.id) as Array<{ id: number; name: string; kind: string; file: string; line: number }>;
    for (const u of upstream) {
      if (noTests && isTestFile(u.file)) continue;
      if (!visited.has(u.id)) {
        nextFrontier.push(u);
      }
    }
  }
  ...
}

This collapses two queries per node into one and avoids repeated statement compilation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — hoisted the upstream query statement out of the BFS loops and added n.id to the SELECT. This eliminates both the repeated db.prepare() compilation (O(depth × frontier) → 1) and the per-node ID re-query. The second SELECT id FROM nodes is completely removed since e.source_id already provides the ID via the join.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed in the prior round (commit 3e3cb50) — the upstream query is hoisted to upstreamStmt before the BFS loop at line 75, and neighborStmt is hoisted at line 277 in bfsShortestPath. No db.prepare() remains inside any loop in this file.

const patched = patchNativeResult(result);
// Only backfill typeMap for TS/TSX — JS files have no type annotations,
// and the native engine already handles `new Expr()` patterns.
const TS_BACKFILL_EXTS = new Set(['.ts', '.tsx']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate inline Set construction — should be a module-level constant

new Set(['.ts', '.tsx']) is constructed three separate times across the same module:

  • parseFileAuto (line 505) as TS_BACKFILL_EXTS
  • parseFilesAuto (line 557) as TS_EXTS
  • parseFileIncremental (line 669) as TS_BACKFILL_EXTS

All three sets represent the same concept. Promoting this to a single module-level constant prevents the sets from being re-created on every call and makes future extension (e.g. adding .mts) a one-line change.

Suggested change
const TS_BACKFILL_EXTS = new Set(['.ts', '.tsx']);
const TS_BACKFILL_EXTS = new Set(['.ts', '.tsx']);

Then reference TS_BACKFILL_EXTS at lines 505, 557 (replacing TS_EXTS), and 669.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — promoted \ to a module-level \ at the top of . Removed the 3 duplicate inline \ constructions at lines 505, 557 (), and 669. All now reference the single constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed in the prior round (commit b3fc6d5) — TS_BACKFILL_EXTS is a module-level constant at line 52. All three inline new Set(['.ts', '.tsx']) constructions were removed.

@carlos-alm
Copy link
Contributor Author

Addressed round 2 Greptile feedback:

  1. Hoisted db.prepare() out of BFS loops and added n.id to SELECT — eliminates repeated compilation and extra roundtrip per node
  2. Promoted TS_BACKFILL_EXTS to module-level constant, removed 3 duplicate inline Set constructions

All passing: zero type errors, 546 integration tests green.

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +338 to +342
const testCallerRows = findCallers(db, node.id) as RelatedNodeRow[];
const seenFiles = new Set<string>();
const relatedTests = testCallerRows
.filter((r) => isTestFile(r.file) && !seenFiles.has(r.file) && seenFiles.add(r.file))
.map((r) => ({ file: r.file }));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Duplicate findCallers() call — redundant DB roundtrip per node

findCallers(db, node.id) is called twice on every iteration of the nodes.slice(0, 10).map(...) loop: once on line 330 to build the callers array, and again here on line 338 to source testCallerRows. Since the DB query is identical (find all callers of node.id), the second call is pure waste — it re-runs the same prepared statement and allocates the same rows a second time.

The first call's raw rows should be retained before the .map() transform so they can be reused for relatedTests:

Suggested change
const testCallerRows = findCallers(db, node.id) as RelatedNodeRow[];
const seenFiles = new Set<string>();
const relatedTests = testCallerRows
.filter((r) => isTestFile(r.file) && !seenFiles.has(r.file) && seenFiles.add(r.file))
.map((r) => ({ file: r.file }));
const rawCallerRows = findCallers(db, node.id) as RelatedNodeRow[];
let callers = rawCallerRows.map((c) => ({
name: c.name,
kind: c.kind,
file: c.file,
line: c.line,
}));
if (noTests) callers = callers.filter((c) => !isTestFile(c.file));
const seenFiles = new Set<string>();
const relatedTests = rawCallerRows
.filter((r) => isTestFile(r.file) && !seenFiles.has(r.file) && seenFiles.add(r.file))
.map((r) => ({ file: r.file }));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — stored findCallers(db, node.id) result in allCallerRows and reused it for both the callers list and the relatedTests filter, eliminating the duplicate DB roundtrip.

Comment on lines +141 to +161
const buildSymbolResult = (s: NodeRow, fileLines: string[] | null) => {
let consumers = db
.prepare(
`SELECT n.name, n.file, n.line FROM edges e JOIN nodes n ON e.source_id = n.id
WHERE e.target_id = ? AND e.kind = 'calls'`,
)
.all(s.id) as Array<{ name: string; file: string; line: number }>;
if (noTests) consumers = consumers.filter((c) => !isTestFile(c.file));

return {
name: s.name,
kind: s.kind,
line: s.line,
endLine: s.end_line ?? null,
role: s.role || null,
signature: fileLines ? extractSignature(fileLines, s.line, displayOpts) : null,
summary: fileLines ? extractSummary(fileLines, s.line, displayOpts) : null,
consumers: consumers.map((c) => ({ name: c.name, file: c.file, line: c.line })),
consumerCount: consumers.length,
};
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 db.prepare() inside per-symbol closure — same pattern as the BFS-loop issue already fixed in this PR

buildSymbolResult is defined inside exportsFileImpl and called once per exported symbol (exported.map(...)) and once per re-exported symbol (for (const s of targetExported)). Because db.prepare() is inside the closure, the SQL is compiled on every invocation — O(exported + reexported) compilations per file instead of 1.

The same "SELECT * FROM nodes WHERE file = ? AND kind != 'file' AND exported = 1" prepare also appears inside the for (const reexTarget of reexportTargets) loop at line 190, which adds another O(reexport-targets) compile per file.

This is the same anti-pattern as the upstreamStmt BFS issue fixed earlier in dependencies.ts. The fix is to hoist both statements before the closures/loops:

// Hoist before buildSymbolResult is defined
const consumersStmt = db.prepare(
  `SELECT n.name, n.file, n.line FROM edges e JOIN nodes n ON e.source_id = n.id
   WHERE e.target_id = ? AND e.kind = 'calls'`,
);
const exportedByFileStmt = db.prepare(
  "SELECT * FROM nodes WHERE file = ? AND kind != 'file' AND exported = 1 ORDER BY line",
);

This pattern also appears at:

  • src/domain/analysis/dependencies.ts:335db.prepare('SELECT name, kind, file, line FROM nodes WHERE id = ?') inside the getNode closure in reconstructPath, called once per unique path node.
  • src/domain/analysis/impact.ts:344db.prepare(...) inside the for (const [file, ranges] of changedRanges) loop in findAffectedFunctions, compiled once per changed file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — hoisted consumersStmt = db.prepare(...) above the buildSymbolResult closure so it compiles once per fileNode iteration instead of once per exported symbol.

…t of per-symbol closure

context.ts: reuse allCallerRows for both callers list and relatedTests
instead of calling findCallers(db, node.id) twice per iteration.

exports.ts: hoist consumersStmt above buildSymbolResult closure so
db.prepare() runs once per fileNode instead of once per exported symbol.
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +128 to +194
// Use the exported column populated during build
exported = db
.prepare(
"SELECT * FROM nodes WHERE file = ? AND kind != 'file' AND exported = 1 ORDER BY line",
)
.all(fn.file) as NodeRow[];
} else {
// Fallback: symbols that have incoming calls from other files
const exportedIds = findCrossFileCallTargets(db, fn.file) as Set<number>;
exported = symbols.filter((s) => exportedIds.has(s.id));
}
const internalCount = symbols.length - exported.length;

const consumersStmt = db.prepare(
`SELECT n.name, n.file, n.line FROM edges e JOIN nodes n ON e.source_id = n.id
WHERE e.target_id = ? AND e.kind = 'calls'`,
);

const buildSymbolResult = (s: NodeRow, fileLines: string[] | null) => {
let consumers = consumersStmt.all(s.id) as Array<{ name: string; file: string; line: number }>;
if (noTests) consumers = consumers.filter((c) => !isTestFile(c.file));

return {
name: s.name,
kind: s.kind,
line: s.line,
endLine: s.end_line ?? null,
role: s.role || null,
signature: fileLines ? extractSignature(fileLines, s.line, displayOpts) : null,
summary: fileLines ? extractSummary(fileLines, s.line, displayOpts) : null,
consumers: consumers.map((c) => ({ name: c.name, file: c.file, line: c.line })),
consumerCount: consumers.length,
};
};

const results = exported.map((s) => buildSymbolResult(s, getFileLines(fn.file)));

const totalUnused = results.filter((r) => r.consumerCount === 0).length;

// Files that re-export this file (barrel -> this file)
const reexports = (
db
.prepare(
`SELECT DISTINCT n.file FROM edges e JOIN nodes n ON e.source_id = n.id
WHERE e.target_id = ? AND e.kind = 'reexports'`,
)
.all(fn.id) as Array<{ file: string }>
).map((r) => ({ file: r.file }));

// For barrel files: gather symbols re-exported from target modules
const reexportTargets = db
.prepare(
`SELECT DISTINCT n.file FROM edges e JOIN nodes n ON e.target_id = n.id
WHERE e.source_id = ? AND e.kind = 'reexports'`,
)
.all(fn.id) as Array<{ file: string }>;

const reexportedSymbols: Array<ReturnType<typeof buildSymbolResult> & { originFile: string }> =
[];
for (const reexTarget of reexportTargets) {
let targetExported: NodeRow[];
if (hasExportedCol) {
targetExported = db
.prepare(
"SELECT * FROM nodes WHERE file = ? AND kind != 'file' AND exported = 1 ORDER BY line",
)
.all(reexTarget.file) as NodeRow[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Repeated db.prepare() calls inside map and nested loop

Four db.prepare() calls still live inside the fileNodes.map() callback (lines 129–133, 141, 169–175, 178–183), compiling the SQL once per matched file node. Most critically, the prepare at lines 190–194 is inside a nested for (const reexTarget of reexportTargets) loop, so it runs O(fileNodes × reexportTargets) times — the same anti-pattern that was fixed in dependencies.ts earlier in this PR.

Additionally, the SQL at line 190–194 is identical to the one at line 129–133. Both should share a single hoisted statement.

The prior round correctly hoisted consumersStmt out of buildSymbolResult (down to per-fileNode), but the remaining prepares were left behind. Hoisting all five statements before fileNodes.map(...) reduces compilation from O(fileNodes × reexportTargets) to a constant 5 total.

// Hoist before fileNodes.map()
const exportedNodesStmt = db.prepare(
  "SELECT * FROM nodes WHERE file = ? AND kind != 'file' AND exported = 1 ORDER BY line",
);
const reexportsFromStmt = db.prepare(
  `SELECT DISTINCT n.file FROM edges e JOIN nodes n ON e.source_id = n.id
   WHERE e.target_id = ? AND e.kind = 'reexports'`,
);
const reexportsToStmt = db.prepare(
  `SELECT DISTINCT n.file FROM edges e JOIN nodes n ON e.target_id = n.id
   WHERE e.source_id = ? AND e.kind = 'reexports'`,
);
const consumersStmt = db.prepare(
  `SELECT n.name, n.file, n.line FROM edges e JOIN nodes n ON e.source_id = n.id
       WHERE e.target_id = ? AND e.kind = 'calls'`,
);

return fileNodes.map((fn) => {
  ...
  if (hasExportedCol) {
    exported = exportedNodesStmt.all(fn.file) as NodeRow[];
  }
  ...
  const reexports = (reexportsFromStmt.all(fn.id) as Array<{ file: string }>).map((r) => ({ file: r.file }));
  const reexportTargets = reexportsToStmt.all(fn.id) as Array<{ file: string }>;
  ...
  for (const reexTarget of reexportTargets) {
    if (hasExportedCol) {
      targetExported = exportedNodesStmt.all(reexTarget.file) as NodeRow[];  // reuse the same statement
    }
    ...
  }
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — hoisted all five db.prepare() calls (exportedNodesStmt, consumersStmt, reexportsFromStmt, reexportsToStmt) before fileNodes.map(). The duplicate exportedNodesStmt inside the reexport loop now reuses the same hoisted statement. SQL compilation reduced from O(fileNodes × reexportTargets) to constant 4.

…rts.ts

Moves exportedNodesStmt, consumersStmt, reexportsFromStmt, and reexportsToStmt
before the fileNodes.map() callback. Eliminates O(fileNodes × reexportTargets)
SQL compilations — now compiles each statement exactly once per exportsFileImpl call.
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +333 to +342
const getNode = (id: number) => {
if (nodeCache.has(id)) return nodeCache.get(id)!;
const row = db.prepare('SELECT name, kind, file, line FROM nodes WHERE id = ?').get(id) as {
name: string;
kind: string;
file: string;
line: number;
};
nodeCache.set(id, row);
return row;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 db.prepare() inside getNode closure — same pattern fixed elsewhere in this PR

reconstructPath hoists nothing: every unique path node that misses the nodeCache triggers a fresh db.prepare() call, compiling the same SQL once per node. For a BFS bounded at maxDepth=10, that is up to 11 compilations per pathData() call. The exact same anti-pattern was fixed this PR in buildTransitiveCallers (hoisting upstreamStmt) and in bfsShortestPath (hoisting neighborStmt).

Hoist the statement outside getNode:

Suggested change
const getNode = (id: number) => {
if (nodeCache.has(id)) return nodeCache.get(id)!;
const row = db.prepare('SELECT name, kind, file, line FROM nodes WHERE id = ?').get(id) as {
name: string;
kind: string;
file: string;
line: number;
};
nodeCache.set(id, row);
return row;
) {
const nodeCache = new Map<number, { name: string; kind: string; file: string; line: number }>();
const nodeByIdStmt = db.prepare('SELECT name, kind, file, line FROM nodes WHERE id = ?');
const getNode = (id: number) => {
if (nodeCache.has(id)) return nodeCache.get(id)!;
const row = nodeByIdStmt.get(id) as {
name: string;
kind: string;
file: string;
line: number;
};
nodeCache.set(id, row);
return row;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — hoisted db.prepare() out of the getNode closure into nodeByIdStmt before the closure definition in reconstructPath. The prepared statement is now compiled once and reused across all calls.

Comment on lines +308 to +312
let nodes = db
.prepare(
`SELECT * FROM nodes WHERE name LIKE ? AND kind IN ('function','method','class','interface','type','struct','enum','trait','record','module','constant') ORDER BY file, line`,
)
.all(`%${target}%`) as NodeRow[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 db.prepare() recompiled on each recursive explainCallees call

explainFunctionImpl is called once per unique callee during explainCallees traversal (depth × breadth calls for a deep query). Each invocation compiles this SQL string again. The correct fix here is to use cachedStmt (like all other DB helpers in this codebase) so the compiled statement is reused across calls for the same db connection. Alternatively, pass the hoisted statement as a parameter from explainData.

This is consistent with how other modules in this PR use cachedStmt in db/repository/nodes.js (see getLineCountForNode and getMaxEndLineForFile).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced the inline db.prepare() in explainFunctionImpl with the cachedStmt WeakMap pattern (module-level _explainNodeStmtCache: StmtCache<NodeRow>). The statement is now compiled once per db handle and reused across recursive explainCallees calls.

Comment on lines +114 to +121
// Detect whether exported column exists
let hasExportedCol = false;
try {
db.prepare('SELECT exported FROM nodes LIMIT 0').raw(true);
hasExportedCol = true;
} catch (e: unknown) {
debug(`exported column not available, using fallback: ${(e as Error).message}`);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Schema probe re-runs on every exportsData call

exportsFileImpl re-evaluates the exported column existence on every invocation. For a tool like the MCP server that may call exportsData frequently (once per file query), this is a redundant schema check each time.

impact.ts in this same PR sets a precedent with _hasImplementsCache: WeakMap<BetterSqlite3.Database, boolean> — the same pattern applies here:

const _hasExportedColCache = new WeakMap<BetterSqlite3.Database, boolean>();

function hasExportedCol(db: BetterSqlite3.Database): boolean {
  if (_hasExportedColCache.has(db)) return _hasExportedColCache.get(db)!;
  let result = false;
  try {
    db.prepare('SELECT exported FROM nodes LIMIT 0');
    result = true;
  } catch {
    // column absent
  }
  _hasExportedColCache.set(db, result);
  return result;
}

Then replace the inline probe with if (hasExportedCol(db)) {.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a module-level _hasExportedColCache: WeakMap<BetterSqlite3.Database, boolean> that caches the schema probe result per db handle, same pattern as _hasImplementsCache in impact.ts. The SELECT exported FROM nodes LIMIT 0 probe now runs at most once per connection.

Impact: 423 functions changed, 501 affected
… feedback

- dependencies.ts: hoist db.prepare() out of getNode closure in reconstructPath
- context.ts: use cachedStmt WeakMap pattern for explainFunctionImpl node query
- exports.ts: cache exported-column schema probe with WeakMap per db handle

Impact: 4 functions changed, 4 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +344 to +348
const defs = db
.prepare(
`SELECT * FROM nodes WHERE file = ? AND kind IN ('function', 'method', 'class') ORDER BY line`,
)
.all(file) as any[];
.all(file) as NodeRow[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 db.prepare() inside loop — same anti-pattern fixed elsewhere in this PR

findAffectedFunctions iterates over every entry in changedRanges and calls db.prepare() on each iteration, compiling the same SQL string once per changed file. This is exactly the anti-pattern that was hoisted out in buildTransitiveCallers (upstreamStmt), reconstructPath (nodeByIdStmt), exportsFileImpl (exportedNodesStmt, consumersStmt, etc.), and explainFunctionImpl (cachedStmt). In a large diff with many changed files, this becomes O(changedFiles) compilations instead of 1.

Hoist the statement before the loop:

Suggested change
const defs = db
.prepare(
`SELECT * FROM nodes WHERE file = ? AND kind IN ('function', 'method', 'class') ORDER BY line`,
)
.all(file) as any[];
.all(file) as NodeRow[];
): NodeRow[] {
const affectedFunctions: NodeRow[] = [];
const affectedFuncsStmt = db.prepare(
`SELECT * FROM nodes WHERE file = ? AND kind IN ('function', 'method', 'class') ORDER BY line`,
);
for (const [file, ranges] of changedRanges) {
if (noTests && isTestFile(file)) continue;
const defs = affectedFuncsStmt.all(file) as NodeRow[];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — hoisted db.prepare() out of the changedRanges loop into defsStmt before the loop. The statement is now compiled once and reused for each file iteration.

Comment on lines +570 to +580
const symbols = result.get(relPath);
symbols.typeMap =
extracted.symbols.typeMap instanceof Map
? extracted.symbols.typeMap
: new Map(
(extracted.symbols.typeMap as any).map((e: any) => [e.name, e.typeName]),
extracted.symbols.typeMap.map((e: { name: string; typeName: string }) => [
e.name,
e.typeName,
]),
);
(symbols as any)._typeMapBackfilled = true;
symbols._typeMapBackfilled = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dropped ! non-null assertion — silent failure path

The previous code used result.get(relPath)! to assert the entry exists before writing typeMap. The new code drops the assertion since result is Map<string, any> and symbols is typed as any, so TypeScript no longer enforces it. If result.get(relPath) returned undefined for any reason, symbols.typeMap = ... would throw a TypeError: Cannot set properties of undefined — which is silently swallowed by the surrounding catch block. This means a missing-entry bug would fail quietly instead of surfacing.

Restoring a non-null assertion (or an explicit guard) keeps the intent clear:

Suggested change
const symbols = result.get(relPath);
symbols.typeMap =
extracted.symbols.typeMap instanceof Map
? extracted.symbols.typeMap
: new Map(
(extracted.symbols.typeMap as any).map((e: any) => [e.name, e.typeName]),
extracted.symbols.typeMap.map((e: { name: string; typeName: string }) => [
e.name,
e.typeName,
]),
);
(symbols as any)._typeMapBackfilled = true;
symbols._typeMapBackfilled = true;
const symbols = result.get(relPath);
if (!symbols) continue;
symbols.typeMap =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added an explicit null guard () before the property access. This replaces the dropped \ non-null assertion with a safe runtime check that skips the entry when the map doesn't contain it.

…tions (#558)

The prepared statement was being compiled on every iteration of the
changedRanges loop. Hoisted to compile once before the loop.
The ! non-null assertion was dropped during TS migration. Added an
explicit guard with continue to skip safely when the entry is missing,
preventing silent TypeError on the subsequent property access.
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +57 to +64
export interface LanguageRegistryEntry {
id: string;
extensions: string[];
grammarFile: string;
// biome-ignore lint/suspicious/noExplicitAny: extractor signatures vary per language
extractor: (...args: any[]) => any;
required: boolean;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 LanguageRegistryEntry duplicated with weaker types

This PR adds a new LanguageRegistryEntry definition here in parser.ts, but the original definition in src/types.ts at line 470 was not removed. Both are now exported under the same name from different modules:

  • src/types.ts:470: id: LanguageId, extractor: ExtractorFn (stricter, project-typed)
  • src/domain/parser.ts:57: id: string, extractor: (...args: any[]) => any (looser, any-typed)

LANGUAGE_REGISTRY (line 331) and _extToLang (line 411) in parser.ts are typed against the local looser version. Any consumer that imports LanguageRegistryEntry from types.ts and passes a registry entry to a function expecting the parser.ts variant (or vice versa) will encounter a mismatch in the id and extractor fields.

The fix is to remove the duplicate here and restore the import from types.ts:

import type { LanguageRegistryEntry } from '../types.js';

@@ -1,4 +1,5 @@
import path from 'node:path';
import type BetterSqlite3 from 'better-sqlite3';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent DB type source across analysis modules

All nine other analysis modules changed in this PR use import type { BetterSqlite3Database } from '../../types.js' (the project's own minimal interface). This file is the only one that imports directly from the better-sqlite3 package:

import type BetterSqlite3 from 'better-sqlite3';

Using the package import couples module-map.ts to the better-sqlite3 type declarations while the rest of the layer is decoupled through BetterSqlite3Database. For consistency, replace with:

Suggested change
import type BetterSqlite3 from 'better-sqlite3';
import type { BetterSqlite3Database } from '../../types.js';

and replace all BetterSqlite3.Database occurrences with BetterSqlite3Database.

Comment on lines +278 to +279

// ─── Metric helpers ──────────────────────────────────────────────────────
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Untyped WeakMap caches — inconsistent with StmtCache<T> pattern

Both new caches are declared without generic parameters:

const _getLineCountForNodeStmt = new WeakMap();
const _getMaxEndLineForFileStmt = new WeakMap();

The established pattern in this codebase (see _explainNodeStmtCache in context.ts) uses the StmtCache<T> type alias for these module-level statement caches:

Suggested change
// ─── Metric helpers ──────────────────────────────────────────────────────
const _getLineCountForNodeStmt: StmtCache<{ line_count: number }> = new WeakMap();
const _getMaxEndLineForFileStmt: StmtCache<{ max_end: number | null }> = new WeakMap();

This also requires importing StmtCache from ../../types.js.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant