SQLite Virtual Tables
A virtual table is SQLite's extension mechanism for tables backed by user code rather than
the built-in b-tree. To SQL it looks identical: SELECT, INSERT,
UPDATE, joins all work. Underneath, a module implements the
callbacks that produce rows, evaluate constraints, and report cost estimates to the query
planner. SQLite ships with several powerful built-in modules — FTS5 for
full-text search, R*Tree for spatial indexes, JSON1 for JSON
values, dbstat for introspecting the database file, and csv for
reading CSV files. This page walks the module API, the planner integration via
xBestIndex, and what each shipped module does.
Virtual Table Architecture
Key Numbers
Why a Virtual Table API
Defining a Virtual Table
CREATE VIRTUAL TABLE invokes the module's xCreate.
-- Create an FTS5 full-text index
CREATE VIRTUAL TABLE docs USING fts5(title, body, tokenize='porter');
-- Insert and query
INSERT INTO docs VALUES('Intro to SQL', 'SELECT FROM WHERE GROUP BY...');
SELECT * FROM docs WHERE docs MATCH 'sql AND select';
-- ^ MATCH is the FTS5 query op
-- Create an R*Tree on bounding boxes
CREATE VIRTUAL TABLE bounds USING rtree(id, minX, maxX, minY, maxY);
INSERT INTO bounds VALUES(1, 0.0, 5.0, 0.0, 5.0);
SELECT id FROM bounds WHERE minX <= 3 AND maxX >= 1 AND minY <= 3 AND maxY >= 1;
Behind the scenes, the FTS5 module creates several "shadow tables" — real b-tree-backed
tables holding the inverted index data. They have names like docs_data,
docs_idx, docs_content, docs_docsize,
docs_config. Direct queries on shadow tables are allowed but typically
meaningless to applications; FTS5 itself manages them.
xBestIndex: Cost Negotiation with the Planner
The most important callback for performance.
When SQL hits a virtual table in a SELECT, the planner gathers the predicates and ORDER BY
and calls xBestIndex with them as input. The module returns: an estimated cost
(number, lower is better), an estimated number of rows, an idxNum (an integer the module
assigns meaning to), an idxStr (similar but a string), and a flag indicating whether the
results will already be sorted.
int xBestIndex(sqlite3_vtab *vt, sqlite3_index_info *info) {
// info->aConstraint: list of WHERE constraints
// info->aOrderBy: requested ORDER BY columns
// info->estimatedCost: output, lower wins
// info->estimatedRows: output, used for join planning
// info->idxNum, info->idxStr: output, opaque to SQLite, passed to xFilter
// info->orderByConsumed: output, 1 if this access method already sorts
// FTS5 example: see if there's a MATCH constraint on a fts column
for (int i = 0; i < info->nConstraint; i++) {
if (info->aConstraint[i].usable && matches_fts_match(...)) {
info->estimatedCost = 100.0; // cheap with index
info->idxNum = 1;
info->aConstraintUsage[i].argvIndex = 1; // tell SQLite to pass this arg to xFilter
return SQLITE_OK;
}
}
info->estimatedCost = 1000000.0; // expensive without
info->idxNum = 0;
return SQLITE_OK;
}
When SQLite decides to use this access method, it calls xFilter(idxNum, idxStr,
argv) on the cursor; the module uses idxNum to dispatch (1=index scan, 0=full scan,
etc.) and uses argv for the constraint values.
FTS5: Full-Text Search
Inverted index over text columns with relevance ranking.
CREATE VIRTUAL TABLE docs USING fts5( title, body, tokenize = 'unicode61 remove_diacritics 1', content = '' -- contentless: don't store original text ); INSERT INTO docs(rowid, title, body) VALUES(1, 'Hello', 'World of SQL'); -- Phrase, prefix, NEAR, AND/OR/NOT SELECT rowid FROM docs WHERE docs MATCH 'sql NOT mongodb'; SELECT rowid FROM docs WHERE docs MATCH 'NEAR(world sql, 5)'; SELECT rowid, bm25(docs) FROM docs WHERE docs MATCH 'sql' ORDER BY bm25(docs);
Tokenizers split text into tokens (words). Built-in: simple,
porter (English stemming), unicode61 (Unicode-aware),
ascii, trigram (n-gram, for substring queries). Custom tokenizers
can be registered via the FTS5 API.
The on-disk structure is an inverted index: for each unique token, a list of (rowid,
column, position) entries. Queries combine these lists with AND/OR/NEAR. BM25 ranking
gives relevance scores. Snippets and highlights are accessible via the
snippet() and highlight() auxiliary functions.
R*Tree: Spatial Indexing
Bounding-box queries in 1-5 dimensions.
CREATE VIRTUAL TABLE places USING rtree( id, minLat, maxLat, -- 1st dimension (latitude) minLng, maxLng -- 2nd dimension (longitude) ); INSERT INTO places VALUES(1, 37.7, 37.8, -122.5, -122.4); -- SF area INSERT INTO places VALUES(2, 40.7, 40.8, -74.1, -74.0); -- NYC area -- Find all places overlapping a query box SELECT id FROM places WHERE maxLat >= 37.0 AND minLat <= 38.0 AND maxLng >= -123.0 AND minLng <= -122.0;
The R*-tree is a balanced tree of bounding boxes; each interior node contains the bounding box of its children. A query box descends only into branches whose box overlaps the query — pruning aggressively for sparse data. For 1M random points in 2D, R-tree queries return candidates in O(log N) time, typically 5-10x faster than a btree index on lat or lng alone.
Limitations: bounding boxes can be approximate (a polygon's box overlaps the query but the polygon doesn't). After R-tree returns candidates, application code typically does exact geometry tests. R*Tree handles 1-5 dimensions; for higher-D vector search, use a different module or external library.
JSON1: Querying JSON Values
Built-in functions for SELECT WHERE on JSON columns.
SELECT json_extract(data, '$.user.name') FROM events;
SELECT * FROM events WHERE json_extract(data, '$.user.id') = 42;
-- json_each is a vtab: expand JSON arrays to rows
SELECT value FROM json_each('[1,2,3,4]');
-- 1, 2, 3, 4
SELECT * FROM events, json_each(events.tags) WHERE json_each.value = 'admin';
JSON1 isn't a single virtual table — it's a collection of scalar functions (json_extract,
json_set, json_remove, json_array, json_object, etc.) plus two table-valued functions
(json_each, json_tree) that act as virtual tables. json_each(arr) expands an
array into rows; json_tree(doc) walks a nested JSON structure recursively.
Other Bundled Modules
| Module | Use case | Form |
|---|---|---|
| dbstat | per-page database introspection | SELECT FROM dbstat |
| stmt | list of currently prepared statements | SELECT FROM stmt |
| generate_series | numeric sequence generator | SELECT FROM generate_series(1,100) |
| geopoly | polygon-based spatial | CREATE VIRTUAL TABLE … USING geopoly |
| csv | read CSV files as tables | CREATE VIRTUAL TABLE … USING csv |
| fts5vocab | list FTS5 tokens and stats | CREATE VIRTUAL TABLE … USING fts5vocab |
| completion | SQL keyword completion | SELECT FROM completion |
FAQ
How do virtual tables differ from regular tables?
A virtual table looks identical to SQL — you SELECT, INSERT, UPDATE, DELETE the same way — but the storage is provided by user code (a 'module') rather than the b-tree backend. The module implements callbacks for xCreate, xConnect, xBestIndex, xOpen, xFilter, xNext, xColumn, xRowid, etc. SQLite's query planner consults xBestIndex during planning to ask 'what's the cost of using these constraints?' and routes execution through the module. This is how FTS5 (full-text search), R*Tree (spatial indexes), JSON1 (JSON queries), and CSV (read CSV files as tables) all integrate.
What is xBestIndex doing?
It's the cost estimator for the virtual table. SQLite's planner shows the module a list of constraints (e.g. 'WHERE name = ?' and 'ORDER BY date') and asks: 'if you used these, what would the cost be? what would the row order be?' The module returns a cost estimate plus an idxNum and idxStr that describe its chosen access method to itself. The planner uses this to compare virtual-table plans against regular-table plans during query planning. A poorly-implemented xBestIndex makes the planner pick bad plans.
What is FTS5 actually?
Full-Text Search version 5 — a virtual table module shipped with SQLite that maintains an inverted index over text columns and supports MATCH queries (with phrase, prefix, NEAR operators, BM25 ranking, custom tokenizers). The on-disk storage is several shadow tables (you can see them with .tables in CLI: x_data, x_idx, x_content, x_docsize, x_config) that hold the inverted index, doclist data, and metadata. Queries that use MATCH on the FTS5 vtab use the inverted index; non-MATCH queries fall back to scanning x_content.
What about R*Tree?
R*Tree is a virtual table module for spatial indexing — minimum bounding rectangles in 1D, 2D, 3D, 4D, or 5D. CREATE VIRTUAL TABLE bounds USING rtree(id, minX, maxX, minY, maxY) creates an R-tree index on min/max coordinates. SELECT id FROM bounds WHERE minX <= 5 AND maxX >= 3 AND minY <= 4 AND maxY >= 2 uses the R-tree to return only candidate rows whose rectangles intersect the query rectangle. Used heavily in geospatial systems built on SQLite.
Can I write my own virtual table?
Yes. Implement the sqlite3_module struct with the required callbacks, register it via sqlite3_create_module(), and now CREATE VIRTUAL TABLE foo USING mymod(...) creates instances. Common uses: expose CSV files as tables, expose REST APIs as tables, expose another database engine, expose process /proc filesystems. The csv vtab in SQLite's source is a good template — about 800 lines for a complete CSV reader.
What's dbstat?
A built-in virtual table that exposes the on-disk page-by-page layout of the database itself. SELECT * FROM dbstat returns one row per page: name (which table/index it belongs to), pageno, pagetype (leaf/interior, table/index), ncell, payload bytes, unused bytes, etc. Useful for understanding storage layout, finding pages with lots of fragmentation, and debugging space usage. Compile with -DSQLITE_ENABLE_DBSTAT_VTAB to enable; it's on by default in many distributions.