Skip to content

Embeddings & Vector Search

Embeddings power two features in Lumiverse: semantic world book activation (finding lorebook entries by meaning, not just keywords) and long-term chat memory (recalling relevant past moments). Both require an embedding provider to be configured.


What Are Embeddings?

An embedding is a numerical representation of text — a list of numbers that captures the meaning of a passage. Similar texts produce similar embeddings. This lets Lumiverse find relevant content based on what it means, not just whether exact keywords match.

Without embeddings: World book entries activate only on keyword matches. Chat history outside the context window is lost.

With embeddings: World book entries can activate on semantically similar concepts. Past conversation moments can be recalled based on relevance.


Setting Up

Open Settings > Embeddings and follow the setup checklist:

1. Enable Embeddings

Toggle the master switch on.

2. Select a Provider

Provider Notes
OpenAI Official OpenAI API (text-embedding-3-small recommended)
OpenAI Compatible Any service implementing the OpenAI embeddings API (local models, self-hosted)
OpenRouter Aggregation service
ElectronHub Model aggregator
BananaBread Lumiverse's local embedding server. Defaults to http://localhost:8008/v1/embeddings and pulls its model list from /v1/models.
Nano-GPT Pay-per-token aggregator

3. Configure the Connection

Field Description
API URL Base URL for the provider. Auto-appends /v1/embeddings if no path is specified.
Embedding Model Model name (e.g., text-embedding-3-small)
API Key Your provider's authentication key
Dimensions Vector size — auto-detected when you run a test
Send Dimensions Whether to include the dimension value in API requests (some providers require it, others reject it)

4. Test the API

Click Test API to verify your setup. A successful test auto-detects the model's native dimensions and applies them.


What Gets Vectorized

Enable vectorization for the content types you want:

Content Setting What It Does
World Book Entries vectorize_world_books Enables semantic search for lorebook entries — activates entries by meaning, not just keywords
Chat Documents vectorize_chat_documents Indexes databank and chat-attached documents for #slug mentions and document RAG
Chat Messages vectorize_chat_messages Enables long-term memory — recalls relevant past messages during generation

When chat-message vectorization is enabled, the Memory Retrieval Mode (chat_memory_mode) controls how aggressively past messages are recalled:

Mode Behavior
Conservative Fewer, high-quality memories — strict threshold
Balanced Standard retrieval (recommended)
Aggressive More memories, lower threshold — better for long epics

World Book Vector Presets

A quick preset row above the chunk parameters auto-tunes lorebook vectorization:

Preset Best For
Lean Tight token budgets, short chunks
Balanced General use (recommended)
Deep Large lorebooks where each entry is dense
Custom Manual control — editing any value switches the row to Custom

The preset row drives the Retrieved Entries, Chunk Target / Max / Overlap Tokens, and Stored Chunks Per Entry values.


Retrieval Settings

Similarity Threshold

Maximum cosine distance for matches. Lower values = stricter matching.

  • 0 — No filtering (accept all matches)
  • 0.3-0.5 — Moderate filtering
  • 0.8+ — Very strict (only highly similar content)

Cosine distance can exceed 1.0 in LanceDB's implementation, so this isn't capped at 1.

Rerank Cutoff

For world book vectors: minimum score required after boost/penalty adjustments. Helps filter out low-quality matches after post-processing. Set to 0 to disable.


Hybrid Weight

Controls the balance between traditional keyword matching and semantic vector search:

Mode Behavior
Keyword First Prioritize exact word matches; use vectors as a tiebreaker
Balanced Weight both methods equally (recommended)
Vector First Prioritize semantic similarity; keywords are secondary

Runtime

Setting Description
Batch Size Entries or chunks embedded per request during reindexing (1-200, default 50)
Request Timeout Per-request timeout in seconds (0 disables, max 300). Useful for slow self-hosted models.
Preferred Context Size Recent messages used to build the chat-memory search query (default 6, max 64)

Tips

Start with OpenAI's small model

text-embedding-3-small is cheap, fast, and effective. It's the best starting point for most users.

Enable world book vectorization first

Semantic world book search is the highest-impact use of embeddings. Long-term memory is valuable too, but world book vectorization gives immediate improvement with less configuration.

Test after setup

Always click Test API after configuration. This verifies your credentials work and auto-detects the correct dimensions — getting dimensions wrong produces garbage results.