Cursor Basics

Cursor Codebase Indexing: How the AI Understands Your Repository

By Learn Cursor teamUpdated June 25, 2026

Cursor indexes your codebase into searchable vectors using its own embedding model. Indexing starts automatically when you open a workspace and finishes in the background; semantic search becomes available at 80% completion. Code is encrypted at indexing time and never stored in plaintext — Cursor only stores embeddings, not your source.

On this page

What does codebase indexing do in Cursor?
How does Cursor build and maintain the index?
Is my source code safe during indexing?
How do I control what gets indexed?
When should I still add context manually with @?

What does codebase indexing do in Cursor?

When you open a project, Cursor reads your files and converts them into vector embeddings using a custom embedding model. Those embeddings let the agent answer questions like "where do we handle authentication?" or "find all places that call this function" by meaning — not just by exact string match.

Research from Cursor's team shows that combining semantic search with grep produces 12.5% higher accuracy answering codebase questions compared to grep alone. For large codebases where the relevant code isn't obviously named, the accuracy gap grows further.

Semantic search is why the agent can 'just know'

When the agent finds the right file without you specifying it, it used the semantic index. You gave it a natural language description of the task; it turned that into a query against the embeddings and pulled the most relevant code segments.

How does Cursor build and maintain the index?

1You open a workspace — Cursor begins indexing immediately.
2Semantic search becomes available at 80% completion; a progress indicator appears in the status bar.
3The index syncs automatically every 5 minutes, processing only changed files (not the whole repo each time).
4If you haven't opened the project in 6 weeks, the hosted index is deleted; reopening triggers a fresh index.

What gets indexed

Included: All files not covered by .gitignore or .cursorignoreA file listing paths Cursor must never index or read, kept separate from .gitignore..
Excluded: Files listed in .gitignore or .cursorignoreA file listing paths Cursor must never index or read, kept separate from .gitignore.; binary files; very large files above the size limit.
Scope: The full workspace (all files in the opened folder and its subfolders).

Is my source code safe during indexing?

Cursor encrypts code chunks during indexing — source is held in memory briefly, then discarded. Filenames are obfuscated and code content is never stored in plaintext on Cursor's servers. What Cursor stores are encrypted embeddings (numeric vectors), which cannot be reversed into your original code.

Data: Source code
What Cursor stores: Never stored in plaintext
What Cursor discards: Discarded after embedding is computed

Data: Filenames
What Cursor stores: Obfuscated identifiers
What Cursor discards: Real filenames not stored on server

Data: Embeddings
What Cursor stores: Encrypted vector representations
What Cursor discards: Deleted after 6 weeks of inactivity

Data	What Cursor stores	What Cursor discards
Source code	Never stored in plaintext	Discarded after embedding is computed
Filenames	Obfuscated identifiers	Real filenames not stored on server
Embeddings	Encrypted vector representations	Deleted after 6 weeks of inactivity

Source: cursor.com/blog/secure-codebase-indexing

Review the full security doc for enterprise use

Cursor's blog post on secure codebase indexing details the technical implementation. For enterprise teams with strict data-handling requirements, read cursor.com/security alongside this before rollout.

How do I control what gets indexed?

The index respects .gitignore automatically. To exclude additional files without adding them to .gitignore, create a .cursorignore file at the project root — it uses the same syntax as .gitignore.

`node_modules/`, `dist/`, `build/` — typically already in .gitignore; excluded by default.
Secrets or env files — add them to .cursorignore if not already in .gitignore.
Large generated files — exclude files above Cursor's size limit automatically; very large ones slow indexing.
Sensitive internal docs — add to .cursorignore if you don't want them fed to the semantic search.

When should I still add context manually with @?

The semantic index is automatic context the agent searches in the background. For tasks where you know exactly which files are relevant, adding them manually with @filename is faster and more reliable than hoping the index surfaces the right results.

Situation: You know the exact file to edit
Best approach: @filename — explicit is faster

Situation: You want the agent to find related code across the repo
Best approach: Describe the task; let the index work

Situation: Bug hunt across unfamiliar code
Best approach: Let the agent search semantically first, then pin files with @ once found

Situation: Architecture question spanning many modules
Best approach: Use @codebase or let Max Mode + index cover the breadth

Situation	Best approach
You know the exact file to edit	@filename — explicit is faster
You want the agent to find related code across the repo	Describe the task; let the index work
Bug hunt across unfamiliar code	Let the agent search semantically first, then pin files with @ once found
Architecture question spanning many modules	Use @codebase or let Max Mode + index cover the breadth

Explicit @ context and semantic indexing are complementary, not alternatives.

Frequently asked questions

Can I check whether my codebase has finished indexing?

Yes. A progress indicator in the Cursor status bar shows the indexing percentage. Semantic search is available from 80% onward. You can also open Settings → Cursor → Indexing to see the current index status.

Does codebase indexing work on private or air-gapped repositories?

Indexing requires an internet connection to Cursor's servers to compute and store the embeddings. Air-gapped or fully offline deployments cannot use the hosted index. For strict air-gap requirements, Cursor does not currently offer a self-hosted embedding server; this is a known gap in the enterprise offering.

How does the index handle very large repositories?

Cursor indexes all files up to a per-file size limit and excludes files in ignore lists. For very large monorepos, the initial index takes longer but still completes in the background. Semantic search becomes available at 80%, so you can start working before indexing is complete.

Does adding @codebase in a prompt use the semantic index?

Yes. @codebase triggers a semantic search against the index, retrieves the most relevant code segments and injects them as context. It is the explicit way to invoke the same indexing that the agent uses automatically.

If I add new files, do they get indexed automatically?

Yes. The automatic sync runs every 5 minutes and processes only changed or new files. New files are indexed within a few minutes of being saved without any manual action.

Sources & last verified

Cursor ships frequently. Facts verified against primary sources on June 25, 2026.

Cursor Codebase Indexing: How the AI Understands Your Repository

What does codebase indexing do in Cursor?

Semantic search is why the agent can 'just know'

How does Cursor build and maintain the index?

1You open a workspace — Cursor begins indexing immediately.
2Semantic search becomes available at 80% completion; a progress indicator appears in the status bar.
3The index syncs automatically every 5 minutes, processing only changed files (not the whole repo each time).
4If you haven't opened the project in 6 weeks, the hosted index is deleted; reopening triggers a fresh index.

What gets indexed

Included: All files not covered by .gitignore or .cursorignoreA file listing paths Cursor must never index or read, kept separate from .gitignore..
Excluded: Files listed in .gitignore or .cursorignoreA file listing paths Cursor must never index or read, kept separate from .gitignore.; binary files; very large files above the size limit.
Scope: The full workspace (all files in the opened folder and its subfolders).

Is my source code safe during indexing?

Data: Source code
What Cursor stores: Never stored in plaintext
What Cursor discards: Discarded after embedding is computed

Data: Filenames
What Cursor stores: Obfuscated identifiers
What Cursor discards: Real filenames not stored on server

Data: Embeddings
What Cursor stores: Encrypted vector representations
What Cursor discards: Deleted after 6 weeks of inactivity

Data	What Cursor stores	What Cursor discards
Source code	Never stored in plaintext	Discarded after embedding is computed
Filenames	Obfuscated identifiers	Real filenames not stored on server
Embeddings	Encrypted vector representations	Deleted after 6 weeks of inactivity

Source: cursor.com/blog/secure-codebase-indexing

Review the full security doc for enterprise use

How do I control what gets indexed?

`node_modules/`, `dist/`, `build/` — typically already in .gitignore; excluded by default.
Secrets or env files — add them to .cursorignore if not already in .gitignore.
Large generated files — exclude files above Cursor's size limit automatically; very large ones slow indexing.
Sensitive internal docs — add to .cursorignore if you don't want them fed to the semantic search.

When should I still add context manually with @?

Situation: You know the exact file to edit
Best approach: @filename — explicit is faster

Situation: You want the agent to find related code across the repo
Best approach: Describe the task; let the index work

Situation: Bug hunt across unfamiliar code
Best approach: Let the agent search semantically first, then pin files with @ once found

Situation: Architecture question spanning many modules
Best approach: Use @codebase or let Max Mode + index cover the breadth

Situation	Best approach
You know the exact file to edit	@filename — explicit is faster
You want the agent to find related code across the repo	Describe the task; let the index work
Bug hunt across unfamiliar code	Let the agent search semantically first, then pin files with @ once found
Architecture question spanning many modules	Use @codebase or let Max Mode + index cover the breadth

Explicit @ context and semantic indexing are complementary, not alternatives.

Frequently asked questions

Can I check whether my codebase has finished indexing?

Does codebase indexing work on private or air-gapped repositories?

How does the index handle very large repositories?

Does adding @codebase in a prompt use the semantic index?

If I add new files, do they get indexed automatically?

Yes. The automatic sync runs every 5 minutes and processes only changed or new files. New files are indexed within a few minutes of being saved without any manual action.

What does codebase indexing do in Cursor?

How does Cursor build and maintain the index?

Is my source code safe during indexing?

How do I control what gets indexed?

When should I still add context manually with @?

Frequently asked questions

Can I check whether my codebase has finished indexing?

Does codebase indexing work on private or air-gapped repositories?

How does the index handle very large repositories?

Does adding @codebase in a prompt use the semantic index?

If I add new files, do they get indexed automatically?

Sources & last verified

Keep reading

What does codebase indexing do in Cursor?

How does Cursor build and maintain the index?

Is my source code safe during indexing?

How do I control what gets indexed?

When should I still add context manually with @?

Frequently asked questions

Can I check whether my codebase has finished indexing?

Does codebase indexing work on private or air-gapped repositories?

How does the index handle very large repositories?

Does adding @codebase in a prompt use the semantic index?

If I add new files, do they get indexed automatically?

Sources & last verified

Keep reading