Cursor Basics
Cursor Codebase Indexing: How the AI Understands Your Repository
Cursor indexes your codebase into searchable vectors using its own embedding model. Indexing starts automatically when you open a workspace and finishes in the background; semantic search becomes available at 80% completion. Code is encrypted at indexing time and never stored in plaintext — Cursor only stores embeddings, not your source.
On this page
What does codebase indexing do in Cursor?
When you open a project, Cursor reads your files and converts them into vector embeddings using a custom embedding model. Those embeddings let the agent answer questions like "where do we handle authentication?" or "find all places that call this function" by meaning — not just by exact string match.
Research from Cursor's team shows that combining semantic search with grep produces 12.5% higher accuracy answering codebase questions compared to grep alone. For large codebases where the relevant code isn't obviously named, the accuracy gap grows further.
When the agent finds the right file without you specifying it, it used the semantic index. You gave it a natural language description of the task; it turned that into a query against the embeddings and pulled the most relevant code segments.
How does Cursor build and maintain the index?
- 1You open a workspace — Cursor begins indexing immediately.
- 2Semantic search becomes available at 80% completion; a progress indicator appears in the status bar.
- 3The index syncs automatically every 5 minutes, processing only changed files (not the whole repo each time).
- 4If you haven't opened the project in 6 weeks, the hosted index is deleted; reopening triggers a fresh index.
- Included
- All files not covered by .gitignore or .cursorignoreA file listing paths Cursor must never index or read, kept separate from .gitignore..
- Excluded
- Files listed in .gitignore or .cursorignoreA file listing paths Cursor must never index or read, kept separate from .gitignore.; binary files; very large files above the size limit.
- Scope
- The full workspace (all files in the opened folder and its subfolders).
Is my source code safe during indexing?
Cursor encrypts code chunks during indexing — source is held in memory briefly, then discarded. Filenames are obfuscated and code content is never stored in plaintext on Cursor's servers. What Cursor stores are encrypted embeddings (numeric vectors), which cannot be reversed into your original code.
- Data
- Source code
- What Cursor stores
- Never stored in plaintext
- What Cursor discards
- Discarded after embedding is computed
- Data
- Filenames
- What Cursor stores
- Obfuscated identifiers
- What Cursor discards
- Real filenames not stored on server
- Data
- Embeddings
- What Cursor stores
- Encrypted vector representations
- What Cursor discards
- Deleted after 6 weeks of inactivity
| Data | What Cursor stores | What Cursor discards |
|---|---|---|
| Source code | Never stored in plaintext | Discarded after embedding is computed |
| Filenames | Obfuscated identifiers | Real filenames not stored on server |
| Embeddings | Encrypted vector representations | Deleted after 6 weeks of inactivity |
Source: cursor.com/blog/secure-codebase-indexing
Cursor's blog post on secure codebase indexing details the technical implementation. For enterprise teams with strict data-handling requirements, read cursor.com/security alongside this before rollout.
How do I control what gets indexed?
The index respects .gitignore automatically. To exclude additional files without adding them to .gitignore, create a .cursorignore file at the project root — it uses the same syntax as .gitignore.
- `node_modules/`, `dist/`, `build/` — typically already in
.gitignore; excluded by default. - Secrets or env files — add them to
.cursorignoreif not already in.gitignore. - Large generated files — exclude files above Cursor's size limit automatically; very large ones slow indexing.
- Sensitive internal docs — add to
.cursorignoreif you don't want them fed to the semantic search.
When should I still add context manually with @?
The semantic index is automatic context the agent searches in the background. For tasks where you know exactly which files are relevant, adding them manually with @filename is faster and more reliable than hoping the index surfaces the right results.
- Situation
- You know the exact file to edit
- Best approach
- @filename — explicit is faster
- Situation
- You want the agent to find related code across the repo
- Best approach
- Describe the task; let the index work
- Situation
- Bug hunt across unfamiliar code
- Best approach
- Let the agent search semantically first, then pin files with @ once found
- Situation
- Architecture question spanning many modules
- Best approach
- Use @codebase or let Max Mode + index cover the breadth
| Situation | Best approach |
|---|---|
| You know the exact file to edit | @filename — explicit is faster |
| You want the agent to find related code across the repo | Describe the task; let the index work |
| Bug hunt across unfamiliar code | Let the agent search semantically first, then pin files with @ once found |
| Architecture question spanning many modules | Use @codebase or let Max Mode + index cover the breadth |
Explicit @ context and semantic indexing are complementary, not alternatives.
Frequently asked questions
Can I check whether my codebase has finished indexing?
Yes. A progress indicator in the Cursor status bar shows the indexing percentage. Semantic search is available from 80% onward. You can also open Settings → Cursor → Indexing to see the current index status.
Does codebase indexing work on private or air-gapped repositories?
Indexing requires an internet connection to Cursor's servers to compute and store the embeddings. Air-gapped or fully offline deployments cannot use the hosted index. For strict air-gap requirements, Cursor does not currently offer a self-hosted embedding server; this is a known gap in the enterprise offering.
How does the index handle very large repositories?
Cursor indexes all files up to a per-file size limit and excludes files in ignore lists. For very large monorepos, the initial index takes longer but still completes in the background. Semantic search becomes available at 80%, so you can start working before indexing is complete.
Does adding @codebase in a prompt use the semantic index?
Yes. @codebase triggers a semantic search against the index, retrieves the most relevant code segments and injects them as context. It is the explicit way to invoke the same indexing that the agent uses automatically.
If I add new files, do they get indexed automatically?
Yes. The automatic sync runs every 5 minutes and processes only changed or new files. New files are indexed within a few minutes of being saved without any manual action.
Sources & last verified
- Cursor - Codebase Indexing
- Cursor - Semantic Search in Agent
- Cursor - Secure Codebase Indexing (Blog)
- Cursor - Improving agent with semantic search (Blog)
- Cursor - Context: Codebase Indexing (Docs)
Cursor ships frequently. Facts verified against primary sources on June 25, 2026.