Building a RAG on SQLite

With SQLite you have the feeling that everything just works. One file, no server, no ops overhead. Drop it into your app, and it quietly does its job.

Recently, we developed two extensions that bring vector search and AI capabilities to SQLite.

SQLite-Vector brings vector search capabilities to your embedded database. You can store embeddings and run similarity queries directly in SQL.
SQLite-AI lets SQLite talk to local AI models for embedding generation and semantic tasks, all from within the database.

To show what they can do together, we built SQLite RAG, an example of semantic search that runs inside SQLite.

Hybrid Semantic Search with SQLite

SQLite RAG is built with Python, SQLite, SQLite-AI, and SQLite-Vector. Python acts as the glue that connects the database, the vector store, and the embedding generator, but all the embeddings, vectors, and both similarity and full-text search logic stay within SQLite.

The system ingests documents, generates embeddings through sqlite-ai, stores them as vectors with sqlite-vector, and runs hybrid searches that combine FTS and semantic matching. Despite its simplicity, it provides a complete end-to-end retrieval pipeline, all within one database file.

Document → SQLite-AI (embeddings) → SQLite-Vector (storage/search)
                        ↓
                     SQLite FTS5
                        ↓
                    Hybrid RRF results

In a compact Python codebase, SQLite RAG can process documents and store embeddings and retrieve semantically relevant content in just a few hundred milliseconds. It can be used either as a Python module in your app or directly via its CLI.

sqlite-rag add /path/to/docs
sqlite-rag search "how does synchronization work?"

Documentation Search

Our first documentation search relied on SQLite’s Full Text Search (FTS5), which worked well for exact words but couldn’t capture meaning: searching for 'what’s OffSync?' wouldn’t match documents about 'SQLite-Sync', even though they refer to the same concept.

With SQLite RAG, we combined the precision of FTS5 with the flexibility of semantic search powered by sqlite-vector. Queries now run through both systems and results are merged using Reciprocal Rank Fusion (RRF), a simple ranking method that promotes results appearing high in either list. The outcome is a balanced hybrid search that returns relevant results whether the query matches exact keywords or just the underlying idea.

Running on the Edge

Our first use case for the RAG runs in a resource-constrained environments which combines two stages: a build-time process that prepares the searchable database and a runtime process that serves user queries.

At build time, a GitHub Action handles embedding generation whenever the documentation changes. Our documentation includes 182 files, each averaging about 640 words. On a standard GitHub Runner, the full embedding process (which generates both chunk-level and sentence-level embeddings to improve search precision and result previews) takes about 25 minutes. The output is a single SQLite database containing all the document vectors.

At runtime, the search is served directly from this database through an SQLite Cloud Edge Function. When a user performs a query, a dedicated lightweight server (4 vCPUs, using roughly 100 MB of memory) generates the embedding for that query using the Gemma Embedding 300M Q8 model. The resulting vector is sent to the Edge Function, which executes the hybrid search on the user’s database. The full query-response cycle takes about 370 ms on average.

We distribute this documentation search setup, based on SQLite RAG, through sqlite-aisearch-docs. See our guide to build AI Search for your documentation.

A Laboratory for SQLite Extensions

SQLite RAG is our laboratory for improving and developing real use cases around SQLite-AI and SQLite-Vector, especially for edge and local scenarios. It’s a project that lets us experiment with new use cases, enhance the extensions’ capabilities, and explore simplified solutions for running local AI on the edge with SQLite.

Our next steps include support for text generation to answer user queries directly from retrieved results, extending the same pipeline to images and audio, and refining the SQLite RAG for even faster performance and better results.

Explore the projects:

Building a RAG on SQLite

Hybrid Semantic Search with SQLite

Documentation Search

Running on the Edge

A Laboratory for SQLite Extensions

More Articles

SQLite Cloud and the AWS Journey (Part 2) – Deploying a Secure EKS Cluster with Pod Identities & ALB using Terraform Modules

Connect Your Database to AI Models with the Model Context Protocol (MCP) Server

SQLite Cloud and the AWS journey (part 1)

Try it out today and experience the power of SQLite in the cloud.

Products

Documentation

Company

Legal