For your role

Cursor for Data Scientists: Notebooks, Pipelines and Analysis

By Learn Cursor teamUpdated July 28, 2026

Data scientists use Cursor to write and refactor Python, debug pipelines, explain unfamiliar analysis code and scaffold ML experiments. Give it your data schema and conventions via @-context and rules. Keep notebooks reproducible by reviewing the generated code before running it.

On this page

What do data scientists use Cursor for?
What order should a data scientist set this up in?
How do I set up Jupyter notebooks in Cursor?
How do I keep Cursor's data code reproducible?
Does any of this change on a two-person team?

What do data scientists use Cursor for?

Writing fresh code is the part people expect, and it's the part that matters least. Cursor will turn a plain-English description into wrangling code and scaffold the boilerplate around an ML experiment, sure. But the reading-and-fixing work earns its keep: making sense of analysis someone else wrote, untangling an ETL or feature pipeline that broke. The full list is below.

Data wrangling from a plain-English description.
Pipelines for ETL and feature work.
Explain & refactor inherited analysis code.
ML scaffolding for training loops, evaluation and plotting.

Generated code is the cheapest kind to check, because you wrote the prompt and you know what you asked for. Inherited analysis is the expensive kind, since you are reconstructing intent from somebody else's variable names, and a wrong reconstruction looks identical to a right one until the number lands in a deck. I'd hand a new user the inherited-notebook case first for that reason, even though generation makes the better demo.

Start with the data science cookbook

Cursor publishes role-specific cookbooks (workflow recipes), and the data science one is the recommended place to start. It covers Jupyter notebook development and connecting to database frameworks: Supabase, Postgres, plus extensions for BigQuery, SQLite and Snowflake inside Cursor.

Cursor has a number of different cookbooks, which are basically workflows for different types of roles.

This exact topic is a hands-on lesson: Reliability Analytics & Self-Serve Tooling — about 29 minutes, free to read.

What order should a data scientist set this up in?

Context first, then a rule, then a plan, then let the agent write. Hand it your schema or a sample row before you ask for a line of code, add a Python rule so the output matches your stack, plan anything longer than a couple of cells, and execute last.

Each of those steps removes a class of error the next one would otherwise carry forward. Without the schema the model picks plausible column names, and plausible is the dangerous outcome here, because created_at reads perfectly well right up to the moment your table turns out to say created_ts. Without the rule the code works and arrives in a dialect nobody on your team writes, so review becomes an argument about style rather than about the analysis. Without the plan you get a notebook that runs, which is a lower bar than a notebook you can defend when somebody asks why those rows went missing.

The plan is the step that actually gets dropped. Cursor's own guidance points plan mode at work with several valid approaches or many files in play, and a real analysis is usually both.

There is a case for breaking the order, and Cursor's docs name it. For a quick change, or something you have done many times, going straight to agent mode is fine. A throwaway lookup against a table you already know cold does not need a rule and a plan wrapped around it. The test I use is whether anyone else will quote the output. If they will, do the setup, because the setup is most of what makes the number defensible a month later.

How do I set up Jupyter notebooks in Cursor?

Install the Jupyter extensions from the same marketplace Cursor inherits from VS Code. Pick the ones published by MS Tools AI (verify by publisher and download count) and grab all the relevant notebook extensions while you're there.

Cursor may not run your cells for you

One caveat that shows up in practice: Cursor can't reliably execute Jupyter cells automatically, so you often run them manually yourself. There may be an MCPModel Context Protocol. A standard that lets an AI agent pull in context from outside the repo, like Jira tickets or internal docs. Press Enter for the full definition. server someone has built for this, but it isn't a settled part of the workflow yet. The deeper setup walkthrough lives in the Jupyter guide.

Excel in, notebook out

Cursor is strong at importing spreadsheets. Point it at a large Excel file (say a state education funding workbook) and ask it to "take some of the funding data and create a new Jupyter notebook querying that data." Ask it to display a CSV as a table and it offers options. When you hand it a creative task, it automatically invokes a brainstorming skill that Cursor built for exactly that. The presenter's rule of thumb: if you don't know, ask; once you've asked, implement.

How do I keep Cursor's data code reproducible?

Reproducibility comes from feeding the model your real context up front and never running its output blind. Give it the schema so the column names are correct, encode your stack in a Python rule, read every cell before you execute it, and pin seeds and versions so the numbers come back the same tomorrow.

Give it your schema/sample via @-context so column names are right.
Add a Python rule (.cursor/rules) for typing, linting and your libraries.
Review before running. Do not execute generated cells blind.
Pin seeds and versions so results reproduce.

One piece of the standard Cursor loop does not survive contact with a notebook. Everywhere else you let the agent run, read the diff and then keep or reject. Here you are the runtime, since Cursor cannot reliably execute cells for you, so the edit and the result arrive at separate moments and you hold both in your head. That gap is where blind cell execution creeps in. Read the cell, then run it, in that order, and plan for a loop that is slower than the demos make it look.

Model choice matters less on this work than on most Cursor work, or that is my read of it. Python is heavily represented in what these models trained on, which is why they handle venv and version tangles so well. A stronger model earns its cost at the planning end, before any code exists, where the open question is which comparison the chart is supposed to support.

When the code looks fine and the answer looks wrong, check the row count either side of every join, then the dtypes after each read. Both checks are cheap and between them they catch most of it. A filter that silently matched nothing is the other one worth a glance, since an empty frame will render a chart quite happily.

Plan mode as a living design doc

Plan modeA mode that makes no edits: it researches the codebase and produces an editable plan you review before any code changes. Press Enter for the full definition. is the recommended on-ramp for a first-time user. Because it loads inside Cursor, you see code and plan side by side: the plan can link directly to the files and the exact spots a step will change, instead of sitting in Notion disconnected from the code. Edit it directly: delete a step like "sleep charts" and the agent picks up the change, or add a step asking for tests or a different chart. Check the plan into Git for the team, and run several plans at once (a notebook and a web app drawing from the same DB) via separate agents.

I would always have the technical design doc in Notion or Google Drive. With plan mode you'll have everything in one place.

Does any of this change on a two-person team?

Not much, beyond how much you bother writing down. On your own, a rule file and one or two skills cover it. On a bigger analytics team the artifacts other people depend on are the ones worth real effort, which in practice means the rule lives in the shared repo and the schema lookup lives in a skill rather than in your chat history.

The expensive part of a bigger team is not producing the analysis, it is agreeing on what the analysis did. A plan somebody can read before the notebook exists gives them something cheap to disagree with. I used to think the point of writing the plan was better output from the agent. It does do that, and the larger effect turns out to be on the reviewer, who now argues with a document instead of with a finished notebook and a sunk afternoon, so these days I'd write the plan even on work where I already know exactly what to build.

Skills are where I would spend the effort if you only pick one thing. Let the agent crawl the database once and pay for that exploration, then ask it to write a skill covering the lookup it just worked out. Later runs reach for the skill and skip the rediscovery. On a team the same file doubles as onboarding, because a new analyst can run it instead of reading a wiki page that describes it.

Frequently asked questions

Does Cursor work with Jupyter notebooks?

Yes. Install the Jupyter extensions published by MS Tools AI from the marketplace. Cursor edits and refactors notebook code well, though it often can't run cells automatically, so you may execute them yourself. Give it the schema and conventions for best results.

In what order should I set up Cursor for data work?

Context, then a rule, then a plan, then execution. Give it the schema or a sample so column names are real, add a Python rule so generated code matches your stack, plan anything longer than a couple of cells, and let the agent write last. For a quick lookup against data you know well, Cursor's docs say going straight to agent mode is fine.

What should I check when a generated analysis returns a wrong number?

Start with the row count either side of every join and the dtypes after each read. Silent damage in generated data code usually happens at a join or a type coercion rather than in the visible logic. Also check that your filter matched anything at all, since an empty frame still renders a chart.

Is Cursor good for machine learning code?

Yes for scaffolding training loops, evaluation and plotting. It is also useful for explaining or refactoring existing ML code. Review generated code and pin seeds or versions for reproducibility.

Which databases does Cursor connect to for analysis?

The data science cookbook covers Supabase and Postgres, plus extensions for BigQuery, SQLite and Snowflake inside Cursor. Start with the cookbook for the role-specific workflow recipes.

Sources & last verified

Cursor: For data science

Cursor ships frequently. Last updated July 28, 2026.

Cursor for Data Scientists: Notebooks, Pipelines and Analysis

What do data scientists use Cursor for?

Data wrangling from a plain-English description.
Pipelines for ETL and feature work.
Explain & refactor inherited analysis code.
ML scaffolding for training loops, evaluation and plotting.

Start with the data science cookbook

Cursor has a number of different cookbooks, which are basically workflows for different types of roles.

What order should a data scientist set this up in?

The plan is the step that actually gets dropped. Cursor's own guidance points plan mode at work with several valid approaches or many files in play, and a real analysis is usually both.

How do I set up Jupyter notebooks in Cursor?

Cursor may not run your cells for you

Excel in, notebook out

How do I keep Cursor's data code reproducible?

Give it your schema/sample via @-context so column names are right.
Add a Python rule (.cursor/rules) for typing, linting and your libraries.
Review before running. Do not execute generated cells blind.
Pin seeds and versions so results reproduce.

Plan mode as a living design doc

I would always have the technical design doc in Notion or Google Drive. With plan mode you'll have everything in one place.

Does any of this change on a two-person team?

Frequently asked questions

Does Cursor work with Jupyter notebooks?

In what order should I set up Cursor for data work?

What should I check when a generated analysis returns a wrong number?

Is Cursor good for machine learning code?

Yes for scaffolding training loops, evaluation and plotting. It is also useful for explaining or refactoring existing ML code. Review generated code and pin seeds or versions for reproducibility.

Which databases does Cursor connect to for analysis?

The data science cookbook covers Supabase and Postgres, plus extensions for BigQuery, SQLite and Snowflake inside Cursor. Start with the cookbook for the role-specific workflow recipes.

What do data scientists use Cursor for?

What order should a data scientist set this up in?

How do I set up Jupyter notebooks in Cursor?

How do I keep Cursor's data code reproducible?

Does any of this change on a two-person team?

Frequently asked questions

Does Cursor work with Jupyter notebooks?

In what order should I set up Cursor for data work?

What should I check when a generated analysis returns a wrong number?

Is Cursor good for machine learning code?

Which databases does Cursor connect to for analysis?

Sources & last verified

Keep reading

What do data scientists use Cursor for?

What order should a data scientist set this up in?

How do I set up Jupyter notebooks in Cursor?

How do I keep Cursor's data code reproducible?

Does any of this change on a two-person team?

Frequently asked questions

Does Cursor work with Jupyter notebooks?

In what order should I set up Cursor for data work?

What should I check when a generated analysis returns a wrong number?

Is Cursor good for machine learning code?

Which databases does Cursor connect to for analysis?

Sources & last verified

Keep reading