Skip to content

feat(knowledge): add Ollama embedding provider support#3714

Open
teedonk wants to merge 47 commits intosimstudioai:stagingfrom
teedonk:feat/ollama-embedding-support
Open

feat(knowledge): add Ollama embedding provider support#3714
teedonk wants to merge 47 commits intosimstudioai:stagingfrom
teedonk:feat/ollama-embedding-support

Conversation

@teedonk
Copy link

@teedonk teedonk commented Mar 22, 2026

Summary

  • Adds Ollama as an embedding provider for knowledge bases, enabling fully offline/private RAG workflows
  • Implements per-KB dynamic pgvector tables to handle variable embedding dimensions across Ollama models
  • Auto-detects embedding dimension and context length from Ollama's /api/show endpoint
  • Adds provider selection UI with preset models (nomic-embed-text, mxbai-embed-large, all-minilm, etc.)

Fixes (#3037)

  • Per-KB embedding tables solve the variable dimension problem
  • Context-aware chunking and smart batching prevent context length overflow
  • Cross-provider search normalizes distance scores across OpenAI and Ollama KBs
  • All existing OpenAI/Azure paths remain untouched

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

Tested with Ollama (nomic-embed-text) on local setup:

  • Created KB with Ollama provider — dimension auto-detected
  • Uploaded small and large documents (40+ pages) — all chunks processed successfully
  • Search returns relevant results across both OpenAI and Ollama KBs
  • All 4267 existing tests pass, zero TypeScript errors, Biome lint clean

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the [Contributor License Agreement (CLA)](./CONTRIBUTING.md#contributor-license-agree
Video.Project.1.mp4

ment-cla)

waleedlatif1 and others added 30 commits February 16, 2026 00:36
…stash, algolia tools; isolated-vm robustness improvements, tables backend (simstudioai#3271)

* feat(tools): advanced fields for youtube, vercel; added cloudflare and dataverse tools (simstudioai#3257)

* refactor(vercel): mark optional fields as advanced mode

Move optional/power-user fields behind the advanced toggle:
- List Deployments: project filter, target, state
- Create Deployment: project ID override, redeploy from, target
- List Projects: search
- Create/Update Project: framework, build/output/install commands
- Env Vars: variable type
- Webhooks: project IDs filter
- Checks: path, details URL
- Team Members: role filter
- All operations: team ID scope

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style(youtube): mark optional params as advanced mode

Hide pagination, sort order, and filter fields behind the advanced
toggle for a cleaner default UX across all YouTube operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* added advanced fields for vercel and youtube, added cloudflare and dataverse block

* addded desc for dataverse

* add more tools

* ack comment

* more

* ops

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(tables): added tables (simstudioai#2867)

* updates

* required

* trashy table viewer

* updates

* updates

* filtering ui

* updates

* updates

* updates

* one input mode

* format

* fix lints

* improved errors

* updates

* updates

* chages

* doc strings

* breaking down file

* update comments with ai

* updates

* comments

* changes

* revert

* updates

* dedupe

* updates

* updates

* updates

* refactoring

* renames & refactors

* refactoring

* updates

* undo

* update db

* wand

* updates

* fix comments

* fixes

* simplify comments

* u[dates

* renames

* better comments

* validation

* updates

* updates

* updates

* fix sorting

* fix appearnce

* updating prompt to make it user sort

* rm

* updates

* rename

* comments

* clean comments

* simplicifcaiton

* updates

* updates

* refactor

* reduced type confusion

* undo

* rename

* undo changes

* undo

* simplify

* updates

* updates

* revert

* updates

* db updates

* type fix

* fix

* fix error handling

* updates

* docs

* docs

* updates

* rename

* dedupe

* revert

* uncook

* updates

* fix

* fix

* fix

* fix

* prepare merge

* readd migrations

* add back missed code

* migrate enrichment logic to general abstraction

* address bugbot concerns

* adhere to size limits for tables

* remove conflicting migration

* add back migrations

* fix tables auth

* fix permissive auth

* fix lint

* reran migrations

* migrate to use tanstack query for all server state

* update table-selector

* update names

* added tables to permission groups, updated subblock types

---------

Co-authored-by: Vikhyath Mondreti <vikhyath@simstudio.ai>
Co-authored-by: waleed <walif6@gmail.com>

* fix(snapshot): changed insert to upsert when concurrent identical child workflows are running (simstudioai#3259)

* fix(snapshot): changed insert to upsert when concurrent identical child workflows are running

* fixed ci tests failing

* fix(workflows): disallow duplicate workflow names at the same folder level (simstudioai#3260)

* feat(tools): added redis, upstash, algolia, and revenuecat (simstudioai#3261)

* feat(tools): added redis, upstash, algolia, and revenuecat

* ack comment

* feat(models): add gemini-3.1-pro-preview and update gemini-3-pro thinking levels (simstudioai#3263)

* fix(audit-log): lazily resolve actor name/email when missing (simstudioai#3262)

* fix(blocks): move type coercions from tools.config.tool to tools.config.params (simstudioai#3264)

* fix(blocks): move type coercions from tools.config.tool to tools.config.params

Number() coercions in tools.config.tool ran at serialization time before
variable resolution, destroying dynamic references like <block.result.count>
by converting them to NaN/null. Moved all coercions to tools.config.params
which runs at execution time after variables are resolved.

Fixed in 15 blocks: exa, arxiv, sentry, incidentio, wikipedia, ahrefs,
posthog, elasticsearch, dropbox, hunter, lemlist, spotify, youtube, grafana,
parallel. Also added mode: 'advanced' to optional exa fields.

Closes simstudioai#3258

* fix(blocks): address PR review — move remaining param mutations from tool() to params()

- Moved field mappings from tool() to params() in grafana, posthog,
  lemlist, spotify, dropbox (same dynamic reference bug)
- Fixed parallel.ts excerpts/full_content boolean logic
- Fixed parallel.ts search_queries empty case (must set undefined)
- Fixed elasticsearch.ts timeout not included when already ends with 's'
- Restored dropbox.ts tool() switch for proper default fallback

* fix(blocks): restore field renames to tool() for serialization-time validation

Field renames (e.g. personalApiKey→apiKey) must be in tool() because
validateRequiredFieldsBeforeExecution calls selectToolId()→tool() then
checks renamed field names on params. Only type coercions (Number(),
boolean) stay in params() to avoid destroying dynamic variable references.

* improvement(resolver): resovled empty sentinel to not pass through unexecuted valid refs to text inputs (simstudioai#3266)

* fix(blocks): add required constraint for serviceDeskId in JSM block (simstudioai#3268)

* fix(blocks): add required constraint for serviceDeskId in JSM block

* fix(blocks): rename custom field values to request field values in JSM create request

* fix(trigger): add isolated-vm support to trigger.dev container builds (simstudioai#3269)

Scheduled workflow executions running in trigger.dev containers were
failing to spawn isolated-vm workers because the native module wasn't
available in the container. This caused loop condition evaluation to
silently fail and exit after one iteration.

- Add isolated-vm to build.external and additionalPackages in trigger config
- Include isolated-vm-worker.cjs via additionalFiles for child process spawning
- Add fallback path resolution for worker file in trigger.dev environment

* fix(tables): hide tables from sidebar and block registry (simstudioai#3270)

* fix(tables): hide tables from sidebar and block registry

* fix(trigger): add isolated-vm support to trigger.dev container builds (simstudioai#3269)

Scheduled workflow executions running in trigger.dev containers were
failing to spawn isolated-vm workers because the native module wasn't
available in the container. This caused loop condition evaluation to
silently fail and exit after one iteration.

- Add isolated-vm to build.external and additionalPackages in trigger config
- Include isolated-vm-worker.cjs via additionalFiles for child process spawning
- Add fallback path resolution for worker file in trigger.dev environment

* lint

* fix(trigger): update node version to align with main app (simstudioai#3272)

* fix(build): fix corrupted sticky disk cache on blacksmith (simstudioai#3273)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Lakee Sivaraya <71339072+lakeesiv@users.noreply.github.com>
Co-authored-by: Vikhyath Mondreti <vikhyath@simstudio.ai>
Co-authored-by: Vikhyath Mondreti <vikhyathvikku@gmail.com>
… fixes, removed retired models, hex integration
…ogle tasks and bigquery integrations, workflow lock
@vercel
Copy link

vercel bot commented Mar 22, 2026

@teedonk is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@cursor
Copy link

cursor bot commented Mar 22, 2026

PR Summary

High Risk
High risk because it adds a new embedding provider, introduces dynamic per-knowledge-base pgvector tables with raw SQL, and changes document ingestion + search paths that affect core data storage and retrieval behavior.

Overview
Adds Ollama as an embedding provider for knowledge bases, including creation-time validation against Ollama’s /api/show, optional ollamaBaseUrl handling, and UI controls to pick preset/custom models and dimensions.

Introduces lib/knowledge/dynamic-tables.ts to manage per-KB pgvector tables (create/drop, insert/delete, and vector/tag search), and wires this into the system: KB deletion drops the per-KB table, document processing routes embeddings into either the shared embedding table (OpenAI) or the KB-specific table (Ollama), and search now splits requested KBs by provider, generates provider-specific query embeddings, queries both backends, then normalizes/merges scores before returning results.

Updates embedding/chunking to be Ollama-aware: generateEmbeddings/generateSearchEmbedding can call Ollama’s /api/embed with smart batching/truncation, and document chunk sizing/overlap are capped based on Ollama model context length; token estimation for chunking becomes model-dependent.

Written by Cursor Bugbot for commit 075b005. This will update automatically on new commits. Configure here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR adds Ollama as a fully local/private embedding provider for knowledge bases, implementing per-KB dynamic pgvector tables to handle variable embedding dimensions across different Ollama models. The overall architecture is well-reasoned — auto-detecting dimensions from /api/show, context-aware chunking, smart batching, and cross-provider search normalization — and the OpenAI/Azure paths are left untouched.

Key issues found:

  • Data loss on Ollama document reprocessing (service.ts:611): The Ollama path deletes existing embeddings then inserts new ones without a transaction. A failure mid-insert leaves the document with no embeddings and no recovery path. The OpenAI path correctly uses db.transaction; Ollama should too.
  • Orphaned KB row on table creation failure (route.ts:136): If createKBEmbeddingTable throws after createKnowledgeBase has already committed, the KB row is left in the database with no corresponding embedding table. The catch block should delete the orphaned KB row before re-throwing.
  • Document status update regression (service.ts:628): The refactor extracted processingStatus: 'completed' from the db.transaction to a standalone call. For the OpenAI path this is a regression — if the embedding transaction succeeds but the status update fails, the document is permanently stuck in 'processing' state.
  • sql.raw vector interpolation (dynamic-tables.ts:238): The query vector is interpolated into a sql.raw template string. While the value is always JSON.stringify(number[]) today, this pattern bypasses Drizzle's parameterization and is fragile for future changes. A sql tagged template with a placeholder is safer.
  • Native <select> in the UI (create-base-modal.tsx): The model-picker uses a plain HTML <select> with hand-written Tailwind classes instead of the project's Select component, which won't inherit dark-mode tokens or future design-system updates.
  • Module-level cache in serverless (embeddings.ts:39): ollamaModelInfoCache won't persist across serverless invocations, making the TTL cache a no-op in typical deployment targets.

Confidence Score: 3/5

  • Not yet safe to merge — two P1 data-integrity bugs (non-atomic Ollama delete+insert and orphaned KB row) need to be fixed before this lands.
  • The feature design is solid and well-scoped, but service.ts introduces a data-loss scenario for Ollama document processing (no transaction around delete+insert) and a regression for OpenAI (status update pulled out of its transaction). The KB creation flow also leaves orphaned DB rows if pgvector table creation fails. These three issues touch the core data path and would cause observable bugs in production. The fixes are small and targeted, but they need to land before merge.
  • apps/sim/lib/knowledge/documents/service.ts (data-loss and regression), apps/sim/app/api/knowledge/route.ts (orphaned KB row), apps/sim/lib/knowledge/dynamic-tables.ts (sql.raw interpolation)

Important Files Changed

Filename Overview
apps/sim/lib/knowledge/dynamic-tables.ts New file implementing per-KB dynamic pgvector tables for Ollama embeddings. Core logic is sound (kbId → table name, create/drop/insert/search), but the vector literal is interpolated via sql.raw rather than parameterized, and the search functions don't wrap delete+insert atomically.
apps/sim/lib/knowledge/documents/service.ts Two bugs introduced: (1) Ollama delete + insert embeddings are not wrapped in a transaction — partial insert failure leaves document with no embeddings (data-loss); (2) the document status update was extracted from the OpenAI transaction, meaning status can get stuck in 'processing' if the update call fails after embeddings are successfully inserted (regression).
apps/sim/app/api/knowledge/route.ts Adds Ollama KB creation flow with dimension auto-detection. Bug: if createKBEmbeddingTable throws after createKnowledgeBase succeeds, the KB row is left orphaned in the DB with no corresponding embedding table and no cleanup.
apps/sim/lib/knowledge/embeddings.ts Adds getOllamaModelInfo (with TTL cache), getOllamaModelContextLength, callOllamaEmbeddingAPI, and Ollama branches in generateEmbeddings/generateSearchEmbedding. Logic is clean; module-level cache is a minor concern in serverless environments.
apps/sim/app/api/knowledge/search/route.ts Refactored to split OpenAI and Ollama KB IDs, generate separate query embeddings per provider, and merge results. The per-provider routing logic is clean and correctly handles mixed KB searches.
apps/sim/app/workspace/[workspaceId]/knowledge/components/create-base-modal/create-base-modal.tsx Adds provider toggle (OpenAI/Ollama) with preset model selector and custom model/dimension inputs. Functional, but uses a raw HTML instead of the project's component library Select, which won't inherit dark-mode or design-system updates automatically.

Sequence Diagram

sequenceDiagram
    participant UI as CreateBaseModal
    participant API as POST /api/knowledge
    participant OllamaAPI as Ollama /api/show
    participant DB as Database
    participant DynTable as Per-KB Table

    UI->>API: POST {name, embeddingModel: "ollama/nomic-embed-text", ollamaBaseUrl}
    API->>OllamaAPI: POST /api/show {name: "nomic-embed-text"}
    OllamaAPI-->>API: {embedding_length: 768, context_length: 2048}
    API->>DB: createKnowledgeBase(data)
    DB-->>API: newKnowledgeBase {id}
    API->>DynTable: CREATE TABLE kb_embeddings_{id} (embedding vector(768))
    DynTable-->>API: OK

    Note over UI,DynTable: Document Upload Flow
    UI->>API: POST /api/knowledge/{id}/documents
    API->>OllamaAPI: GET context length (cached)
    OllamaAPI-->>API: contextLength=2048
    API->>API: chunk document (effectiveChunkSize = 0.3 × contextLength)
    API->>OllamaAPI: POST /api/embed {model, input: chunks[]}
    OllamaAPI-->>API: embeddings[]
    API->>DynTable: DELETE WHERE document_id=X
    API->>DynTable: INSERT embeddings (batched, no transaction)

    Note over UI,DynTable: Search Flow
    UI->>API: POST /api/knowledge/search
    API->>DB: SELECT embeddingModel FROM knowledgeBase WHERE id IN (...)
    DB-->>API: [{id, embeddingModel: "ollama/..."}]
    API->>OllamaAPI: POST /api/embed {model, input: [query]}
    OllamaAPI-->>API: queryVector[]
    API->>DynTable: SELECT ... ORDER BY embedding <=> queryVector LIMIT topK
    DynTable-->>API: SearchResult[]
    API-->>UI: merged results
Loading

Reviews (1): Last reviewed commit: "test(knowledge): update search tests for..." | Re-trigger Greptile

Comment on lines +611 to +614
if (embeddingProvider === 'ollama') {
// Per-KB table: delete old chunks then bulk-insert new ones
await deleteKBDocumentEmbeddings(knowledgeBaseId, documentId)
await insertKBEmbeddings(knowledgeBaseId, embeddingRecords, kb[0].embeddingDimension)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Non-atomic delete + insert risks data loss

The Ollama path deletes all existing embeddings for the document and then inserts new ones without a wrapping transaction. If insertKBEmbeddings fails mid-way (e.g., after a partial batch insert), the old embeddings have already been deleted but the new ones are only partially written — leaving the document with fewer embeddings or none at all, with no way to recover automatically.

The OpenAI path below correctly wraps both operations in a db.transaction. Consider wrapping the Ollama path similarly. Since deleteKBDocumentEmbeddings and insertKBEmbeddings both use db.execute / db.insert, they should participate in a transaction too.

Comment on lines 136 to +147
const newKnowledgeBase = await createKnowledgeBase(createData, requestId)

if (provider === 'ollama') {
try {
await createKBEmbeddingTable(newKnowledgeBase.id, effectiveDimension)
} catch (tableError) {
logger.error(
`[${requestId}] Failed to create embedding table for KB ${newKnowledgeBase.id}`,
tableError
)
throw tableError
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Orphaned KB row when embedding table creation fails

createKnowledgeBase persists the KB row to the database at line 136 before createKBEmbeddingTable is called. If createKBEmbeddingTable throws (e.g. the pgvector extension isn't enabled or a naming collision occurs), the error is re-thrown and the request returns 500 — but the KB record is left behind in the database without a corresponding embedding table. Any subsequent document upload to this KB will fail with a table-not-found error and the user has no way to fix it from the UI.

A safe fix is to delete the orphaned KB row in the catch block before re-throwing:

} catch (tableError) {
  logger.error(...)
  // Clean up orphaned KB row
  await deleteKnowledgeBase(newKnowledgeBase.id, requestId).catch(() => {})
  throw tableError
}

Comment on lines +238 to +246
// Use drizzle's insert API with dynamic table schema
await db.insert(dynamicTable).values(batch)
} catch (err: unknown) {
const pg = err as { code?: string; detail?: string; message?: string; cause?: unknown }
logger.error(`insertKBEmbeddings failed for table ${table}`, {
code: pg.code,
detail: pg.detail,
message: pg.message,
cause: pg.cause,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Vector literal injected via sql.raw — fragile and risky pattern

queryVector is constructed by the caller as JSON.stringify(emb) where emb is a number[], so in practice it contains no single quotes. However, interpolating it directly into a sql.raw string template is fragile: a future refactor that passes a different value could silently introduce a SQL injection or at minimum a syntax error.

Drizzle's parameterized approach is safer and just as performant:

// Instead of:
const vecLiteral = sql.raw(`'${queryVector}'::vector`)

// Prefer:
const vecLiteral = sql`${queryVector}::vector`

Using a Drizzle placeholder ensures the value is always parameterized correctly, regardless of its content.

Comment on lines +628 to +638
await db
.update(document)
.set({
chunkCount: processed.metadata.chunkCount,
tokenCount: processed.metadata.tokenCount,
characterCount: processed.metadata.characterCount,
processingStatus: 'completed',
processingCompletedAt: now,
processingError: null,
})
.where(eq(document.id, documentId))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Document status update moved outside the transaction (regression)

In the original code, both the embedding inserts and the processingStatus: 'completed' update were inside a single db.transaction, so they were atomic. This PR extracts the status update to a separate await db.update(document)... call that runs after the transaction.

For the OpenAI path, if the transaction (embedding inserts) succeeds but the subsequent status update fails, the document stays permanently in 'processing' state — even though its embeddings are fully in place. There's no retry that would recover this (the catch block at line 656 sets status to 'failed', which is also incorrect since the embeddings are already there).

This is a regression for the OpenAI path that was introduced by the refactor to support the Ollama code path. The processingStatus update should remain inside the db.transaction for the OpenAI path.

Comment on lines +477 to +495
onChange={(e) => {
const preset = OLLAMA_PRESET_MODELS.find(
(m) => m.value === e.target.value
)
setOllamaPreset(e.target.value)
if (preset && preset.dimension > 0) {
setOllamaDimension(preset.dimension)
}
}}
className='rounded-[4px] border border-[var(--border-1)] bg-[var(--surface-1)] px-[10px] py-[6px] text-[12px] text-[var(--text-primary)]'
>
{OLLAMA_PRESET_MODELS.map((m) => (
<option key={m.value} value={m.value}>
{m.label}
</option>
))}
</select>
</div>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Native <select> instead of the component library's Select

The rest of the form uses components from the existing @/components/ui library (e.g. Input, Label, Button). This <select> is a raw HTML element with hand-written Tailwind classes for the border/background/text, which won't automatically respect dark mode tokens or any future design-system updates.

Consider replacing it with the project's Select / SelectContent / SelectItem / SelectTrigger pattern used elsewhere in the codebase.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

baseUrl = 'http://localhost:11434'
): Promise<OllamaModelInfo> {
const cacheKey = `${modelName}@${baseUrl}`
const cached = ollamaModelInfoCache.get(cacheKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Module-level cache ineffective in serverless/edge deployments

ollamaModelInfoCache is a module-level Map. In serverless environments (Next.js API routes deployed to Vercel, AWS Lambda, etc.) each invocation may spawn a fresh process where the module is re-initialized, so the cache is effectively a no-op. Additionally, multiple concurrent invocations will bypass the cache and issue redundant /api/show calls.

If the project already uses a shared cache layer (e.g. Redis via lib/core/storage), consider using it here. Otherwise, this is worth documenting as a known limitation so future maintainers aren't surprised.

)
results = [...openaiResults, ...ollamaResults]
.sort((a, b) => a.distance - b.distance)
.slice(0, validatedData.topK)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-provider normalization skips single-result sets causing unfair ranking

Low Severity

normalizeScores returns single-item arrays unchanged (if (items.length <= 1) return items). In cross-provider search, this means if one provider returns only 1 result, its raw distance (e.g., 0.5) is sorted alongside the other provider's normalized 0–1 distances. The raw score from one embedding space isn't comparable to normalized scores from another, producing incorrect ranking when merged.

Fix in Cursor Fix in Web

// Store ollamaBaseUrl inside chunkingConfig JSONB to avoid a schema migration
chunkingConfig: data.ollamaBaseUrl
? { ...data.chunkingConfig, ollamaBaseUrl: data.ollamaBaseUrl }
: data.chunkingConfig,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned chunkingConfig omits stored ollamaBaseUrl field

Low Severity

createKnowledgeBase stores ollamaBaseUrl inside chunkingConfig JSONB in the database (line 104–106), but the returned object uses the original data.chunkingConfig without ollamaBaseUrl (line 123). This means the API response from KB creation reflects a different chunkingConfig than what's persisted, which could cause inconsistencies if consumers cache or rely on the returned value.

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

let openaiQueryVector: string | null = null
if (hasQuery && openaiKbIds.length > 0) {
const emb = await generateSearchEmbedding(validatedData.query!, undefined, workspaceId)
openaiQueryVector = JSON.stringify(emb)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenAI search embedding ignores KB's configured model

High Severity

The search route generates a single OpenAI query embedding using the hardcoded default text-embedding-3-small, ignoring each KB's actual embeddingModel. This diff newly allows text-embedding-3-large in the creation schema, but search always calls generateSearchEmbedding(query, undefined, workspaceId) where undefined defaults to text-embedding-3-small. A KB created with text-embedding-3-large (3072 dimensions) would fail at search time because the query vector (1536 dimensions) can't be compared against stored vectors via pgvector's <=> operator. The Ollama path correctly generates per-model embeddings, but the OpenAI path does not.

Additional Locations (1)
Fix in Cursor Fix in Web

tokenCount: Math.ceil(chunk.text.length / 4),
embedding: embeddings[chunkIndex] || null,
embeddingModel: 'text-embedding-3-small',
embeddingModel: embeddingModelName,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stored token count uses wrong ratio for Ollama

Low Severity

The tokenCount in embedding records is always calculated as Math.ceil(chunk.text.length / 4), using the OpenAI token estimation ratio. For Ollama models, the TextChunker uses a ratio of 3 characters per token, so the stored tokenCount is inconsistent with how the chunker actually estimated tokens. This affects any downstream analytics or display that relies on per-chunk token counts.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants