Adapted for learning · June 2026

The Cursor Field Engineer
Visual Field Manual

Your 7-day prep plan told you what to learn. This teaches you the actual material — every mental model explained, every system drawn, every interview line backed by reasoning you can defend under pressure.

8 teaching modules 16+ diagrams Facts verified vs. live docs Reference customer: Northstar Financial

START

Foundations — the lens you carry all week

Before any day: understand what kind of system you are walking into, and what your job actually is inside it.

A Cursor Field Engineer is not hired to demo an editor. You are hired to walk into a large company's existing engineering system — its pipelines, its gates, its auditors, its skeptics — and show, credibly, where an AI coding tool makes good changes easier to produce and evidence cheaper to generate without asking anyone to dismantle a control they need. Everything in this manual serves that one job.

◆ The thesis — memorize it

Cursor is not a parallel SDLC. It helps teams execute their existing engineering standards with better context, faster feedback, stronger repeatability, and carefully governed increases in autonomy. The field posture is never "remove the gates." It is: make good changes easier to produce, make evidence cheaper to generate, and keep risk-based gates exactly where they are justified.

The enterprise delivery system is five connected layers

Almost every question you'll get — technical, security, or commercial — is really a question about one of these five layers. Learn them as a stack: the top four are where work flows, and the bottom one is the control plane that governs all of it. Cursor improves layers 1–4. It must respect layer 5.

Figure 0.1 — The five-layer delivery system

Read it top-down: an idea enters at layer 1 and only becomes revenue-bearing software after passing through 2, 3 and 4 — all while layer 5 watches, permissions, and records every step.

▲ Why this matters in the room

When a skeptical VP says "won't this just generate more code and overload my reviewers?", they're pointing at layer 2. When security says "where does our code go?", that's layer 5. When a platform lead asks about rollback, that's layer 3. Naming the layer first shows you see the whole system, not just the editor.

Meet Northstar Financial — one customer, all week

You will rehearse against a single, realistic reference customer so every exercise compounds. Internalize this profile now; by Day 7 you should be able to narrate it from memory.

Northstar Financial — primary

Scale200 engineers · 6 product teams + platform, security, SRE

StackJava monolith + TypeScript services + Terraform

ToolchainGitHub Enterprise · Jira · Jenkins · Artifactory · Kubernetes · LaunchDarkly · Datadog · ServiceNow

ProcessScrum per team · SAFe-ish quarterly planning · Kanban for platform/SRE

Gates2 reviewers + passing CI · 5 envs (dev→int→QA→staging→prod) · weekly change windows · CAB for high-risk

RegulationSOX-relevant services need traceable requirements, approvals, test evidence, separation of duties

Painsslow onboarding · brittle tests · review queues · legacy migrations · recurring CI failures · inconsistent docs

Aurora Health — the contrast

Use for stretch exercises when you want to prove you can adapt to a harder environment.

Scale500 engineers

StackAzure DevOps

Constraintstrict network controls · cloud agents not approved

▶ Why a contrast customer earns points

Designing a useful pilot for Aurora using only the lower autonomy rungs (no cloud agents) proves you fit the product to the customer's risk tolerance instead of forcing your favorite feature. That's the whole job in miniature.

How to use this companion

Each day below follows the same rhythm so the material sticks: Teach (the concept, explained and drawn) → See it (a diagram you could reproduce on a whiteboard) → Say it (the interview line, with the reasoning underneath) → Check (a self-quiz). Mark each section done to fill your progress bar. Read in order the first time — the mental models stack.

✦ Verified vs. live docs — June 2026 (re-verify before any interview)

This product moves weekly. The facts below were checked against Cursor's live docs/blog the day this manual was built. Treat them as current but perishable; never present a capability as a commitment without confirming availability and the customer's plan entitlement.

Organizations — now GA to Enterprise (shipped ~first week of June 2026): one admin plane over multiple teams, each team with its own security/governance/budget/feature settings; lightweight Groups for cohort-level model access, spend limits, and agent permissions.
Bugbot — June 2026 update: ~3× faster, 22% cheaper, ~10% more bugs found, 90% of runs finish under 3 minutes. Autofix spins isolated cloud-VM agents that push fix commits; ~35% of autofix changes get merged by developers. (The "~70% of flags resolved pre-merge" figure in your plan is an older directional field stat — verify before quoting.)
Agent surface — Cursor 3.1 (Apr 14 2026) added CLI /debug; 3.5 (May 20 2026) shipped Cloud Agents in isolated cloud VMs with terminal/browser access, parallel multi-repo work, async report-back, plus Composer 2.5.
Security posture — SOC 2 Type II, AES-256 at rest, TLS 1.2+ in transit, annual third-party pen testing, Privacy Mode with zero-data-retention terms (note: ZDR does not apply when using your own API keys). Private connectivity via AWS PrivateLink and Cloudflare Tunnel.
New proof point — Cursor's enterprise page now claims "trusted by 64% of the Fortune 500." Box case study figures (85%+ daily, 30–50% throughput, 80–90% less migration effort, +75% usage in 6 weeks via mentorship) are confirmed on Cursor's blog.
Pricing — Business/Teams list at ~$40/user/mo; Enterprise negotiated, volume discounts at 100+ seats. List price is a starting point, not the deal.

Mark Foundations complete

DAY 1

Reconstruct the enterprise SDLC

See the customer's real delivery system — not its stated methodology.

The single most common rookie mistake is to take a customer's word for how they ship. They'll say "we're agile," and you'll picture continuous flow — but the actual change still waits two weeks for an integration environment, a security review, and a Thursday change window. Day 1 trains you to reconstruct the delivery system as it truly operates, with named artifacts, owners, and the gates where work actually waits.

1 · The full lifecycle — forward path and the return loop most people forget

A feature doesn't travel in a straight line and stop at "deployed." It travels idea → production, and then a second value stream runs incident → corrective change. That return loop is a goldmine of Cursor use cases (faster triage, characterization tests, smaller fixes), and naming it sets you apart from candidates who only know the happy path.

Figure 1.1 — The enterprise lifecycle as two connected value streams

Every ⛔ is a human gate — a place where work waits for a person, not a machine. Mapping where work waits is worth more than knowing the methodology's name.

▶ The two-minute "how a feature ships" narration

Be able to say this cold: "A PM frames an opportunity, it becomes a PRD and epics in Jira, a tech lead writes an RFC or ADR the architect reviews, an IC implements on a short-lived branch, opens a PR that CODEOWNERS must approve with passing CI, it clears QA and test-evidence gates, a release manager or train promotes it through environments behind a change ticket, and SRE operates it with runbooks and on-call. When it breaks, the incident becomes a postmortem and a corrective change that re-enters the same pipeline." Then add the kicker: "handoffs, queues, and environment availability usually matter more than typing speed."

2 · Methodology vs. lifecycle — don't confuse the cadence with the system

This is the distinction that makes you sound senior. Methodologies organize when and how much work moves. They do not replace the engineering practices of design, test, release, and operate. A team can be devoutly Scrum and still batch-release quarterly behind a CAB. Know what each framework actually contributes — and what it conspicuously leaves untouched.

Scrum

Supplies: team cadence — sprints, backlog, standups, a Definition of Done.

Silent on: how you design, how you test, how you release, how you operate. DoD is a checklist, not a pipeline.

Kanban

Supplies: flow management — WIP limits, cycle-time visibility, pull-based work. Common for platform/SRE.

Silent on: ceremonies and estimation; it manages the queue, not the engineering.

SAFe

Supplies: coordination at scale — PI planning, ARTs, release trains, dependency & investment management.

Why regulated orgs adopt it: predictability, dependency management, and an audit-friendly paper trail.

◆ The line that lands

"A customer may run Scrum, but work can still wait two weeks for an integration environment, a security review, or a change window. I map the end-to-end value stream, the required evidence, and the recovery path — then decide where Cursor removes toil or improves quality." Note how "we're agile" and heavyweight downstream gates coexist comfortably in regulated orgs; that coexistence is normal, not hypocrisy.

3 · The artifact graph and its systems of record

Here is the field-engineer reframe that turns a process diagram into a Cursor opportunity map. Every artifact is context the AI either has access to or doesn't, and every handoff between systems is friction Cursor can reduce. Trace one change across its systems of record and you'll see both the audit trail an auditor reconstructs and the context boundaries an agent must respect.

Figure 1.2 — The artifact graph: one change across five systems of record

MCP integrations (Jira, Confluence, internal docs) are how you hand the agent the context that lives outside the repo — the gap this picture exposes.

4 · The persona map — who blocks you, and what they fear

Nine roles touch the lifecycle. Each owns something, is measured on something, and can block you for a specific reason. This map becomes the backbone of Day 6's value mapping — you'll sell a different headline to each one. Learn the fear column most: objections come from fears, and you disarm a fear by naming it before they do.

Persona	Owns / cares about	What they fear about AI coding	Cursor headline (Day 6)
IC developer	Their flow, shipping their tickets	Tool that slows them or writes code they must babysit	Faster navigation, local test help, flow
Tech lead	Code standards, review load	50 devs each prompting their own way	Project Rules encode standards once, enforced everywhere
Eng manager	Throughput, onboarding time	A rollout that's bought but not used	Cycle time, time-to-first-commit for new hires
Architect	Consistency, documented decisions	Inconsistent, undocumented generated designs	Consistency via rules; RFC/ADR drafting
QA lead	Test coverage & quality	Tests that mirror the bug, not the intent	Characterization tests, test-debt reduction
Platform / DevOps	CI/CD health, the pipeline	More code → more flaky builds to triage	CI failure triage, migration tooling
SRE	Reliability, incidents, MTTR	Unreviewable changes raising blast radius	Faster, evidenced corrective changes
Security	Data flow, controls, audit	Code exfiltration, unaccountable agents	Controls + audit + replacing shadow AI
Release manager	Change records, approvals	Author/approver blur if agents auto-commit	Richer evidence, unchanged separation of duties

▶ Say it like this

"I separate the named methodology from the actual delivery system. I map the end-to-end value stream, the required evidence, and the recovery path — then decide where Cursor removes toil or improves quality. The gates exist for coordination, compliance, and blast radius. Cursor works inside those gates, not around them."

Because methodology describes cadence, not the delivery system. An agile team can still have a two-week wait for an integration environment, a mandatory security review, a CAB, and quarterly batch releases. You plan a pilot around where work actually waits and what evidence is mandatory — which you only learn by reconstructing the real value stream and its gates, not by hearing the framework name.

The return loop is the second value stream: incident → postmortem → corrective change, which re-enters the forward pipeline under the same gates. It matters because it's dense with high-value, low-risk Cursor use cases — faster failure triage (CLI /debug), characterization tests before a fix, smaller evidenced diffs — and naming it shows you understand operations, not just feature development.

▲ Resource quality test (so you don't waste study time)

Good Day-1 sources show decisions, ownership, flow, and trade-offs and name specific tools (first-person posts from Shopify/Uber/Stripe engineering, DORA, Team Topologies). Reject neat linear diagrams, methodology ideology, and principle-only content — they teach the textbook, not the system you'll actually walk into.

Mark Day 1 complete

DAY 2

CI/CD, release engineering & the toolchain

Trace one change from commit to production and understand every control it encounters.

If Day 1 was the org chart of delivery, Day 2 is the machine. The pipeline is the executable definition of what "acceptable change" means at this company. You never position Cursor as a way around it — you position Cursor as a way to feed the machine smaller, better-tested changes that generate the evidence the machine already demands.

1 · Branching strategy — and why you must argue both directions

Branching model isn't religion; it's a trade-off between integration frequency and isolation. Elite orgs trend toward trunk-based + feature flags because it maximizes integration frequency and shrinks merge risk. But enterprises rationally keep release branches for parallel supported versions, audit comfort, and CAB alignment. A field engineer who can only sell one model looks naive; one who can explain when each is correct earns trust.

Figure 2.1 — Three branching models on one axis of integration frequency

The model determines where AI-generated code meets the gates. Cursor is agnostic to the model; your recommendation is not.

▶ Branching memo soundbite

"Most enterprises sit between GitFlow and trunk-based — release branches for audit comfort, flags creeping in. I'd move a 200-dev GitFlow shop toward short-lived branches plus flags incrementally, but I would not recommend pure trunk-based where they maintain several supported versions or where the CAB needs a stable release branch to point an auditor at."

2 · The pipeline and its gates — standard vs. regulated

Memorize the canonical stage sequence, then learn what a regulated org adds on top. The difference between these two diagrams is Day 3's entire subject, previewed here. Mark every place a human must act — those red gates are where separation of duties lives and where "AI just writes the code faster" runs into reality.

Figure 2.2 — Standard pipeline (≈300-dev SaaS)

Figure 2.3 — Regulated variant: the same change, more gates & evidence

This is the picture you'll annotate on Day 3 — at each control point, does AI strengthen evidence or threaten it?

Progressive delivery & the rollback truth

Know the four ways to release gradually and the one thing that breaks rollback:

Blue/green — two identical envs, flip traffic, instant rollback Canary — small % first, watch metrics, widen Rings — internal → beta → broad, staged cohorts Rolling — replace instances incrementally

▲ The rollback gotcha interviewers probe

Database migrations are what actually constrain rollback. You can flip traffic back in seconds, but you can't un-drop a column. That's why mature teams use expand/contract migrations, decouple deploy from release with flags, and often prefer roll-forward over rollback for schema changes. If you say "just roll back" without mentioning data, a platform lead will catch you.

3 · The toolchain map — learn the archetype, then the swaps

You'll meet a hundred tool combinations. They're variations on one archetype. Memorize the archetype and what stays structurally identical when a vendor swaps (gates, evidence, promotion logic) versus what only changes syntax (YAML dialect, plugin names).

Role in the chain	Northstar's pick	Common alternatives	What changes on swap
Source control	GitHub Enterprise	GitLab · Bitbucket · Azure DevOps	UI & API; protected-branch logic is the same
CI	Jenkins	GitHub Actions · GitLab CI · CircleCI · Buildkite	Pipeline syntax & plugins only
Artifacts	Artifactory	Nexus · GitHub Packages	Almost nothing structural
Infra as code	Terraform	Pulumi · CloudFormation	Language; the promotion model holds
Flags	LaunchDarkly	Unleash · Split · home-grown	SDK; the decouple-deploy-from-release idea is constant
Observability	Datadog	Splunk · Grafana · New Relic	Query language & dashboards
Change / ITSM	ServiceNow	Jira Service Mgmt	Where the change record lives; it always exists

The archetype to say out loud: GitHub + Jenkins + Artifactory + Terraform + Datadog + ServiceNow. When you hear any stack, map it onto this and you'll never sound lost.

4 · DORA — the scoreboard you don't have to invent

The four DORA metrics are the lingua franca of delivery performance, and crucially, the platform team already reports them. Anchoring a pilot on DORA means you argue on the customer's existing scoreboard instead of inventing a new one nobody trusts. Two measure speed, two measure stability — and you must always pair them, because speed without stability is not a credible enterprise story.

Figure 2.4 — The four DORA metrics: speed × stability

Pair them or be wrong. "Cursor lifts deployment frequency" is incomplete without "and holds change failure rate flat."

◆ Constraint thinking — the senior move

Faster code generation can overload the system's real constraint — usually review, test, or release, not typing. Optimize the constraint, not the activity that's already fast. "If review is the bottleneck, generating code 2× faster just grows the review queue. So I'd target review quality and CI triage first, and watch the QA/review constraints as usage climbs."

▶ Say it like this

"I'd never position Cursor as a way around the pipeline — the pipeline is the executable definition of acceptable change. The opportunity is helping engineers produce smaller, better-tested changes, diagnose failures faster, and generate the evidence the process already requires. And I anchor pilots on DORA, because the platform team already reports it — I don't want to invent a new scoreboard to argue about."

Because it's a speed metric with no stability counterpart. An enterprise audience hears "more deploys" as "more risk" unless you pair it with change failure rate and MTTR. It can also be a vanity result: more deploys that overload review or raise defect escape is worse, not better. Always present DORA as paired speed + stability, and tie the speed gain to a constraint you've shown won't buckle.

Database migrations. Traffic and binaries roll back fast; schema changes often can't (you can't un-drop a column or un-migrate data cleanly). Mature answer: expand/contract migrations, decouple deploy from release with flags, and prefer roll-forward for schema. Naming this signals you understand release engineering, not just deployment mechanics.

Mark Day 2 complete

DAY 3

Governance, compliance & Cursor's control plane

The day most candidates skip — and exactly where you differentiate.

Most candidates can demo. Very few can sit across from a bank's security team and speak the language of controls and evidence fluently. Day 3 is layer 5 from the foundations: the controls the customer must keep, matched one-for-one against the controls Cursor actually ships. Master this and you stop being "an AI tool rep" and start being someone security can work with.

1 · SOX / ITGC change management — risk and evidence, not bureaucracy

For SOX-relevant services (anything touching financial reporting), auditors require IT General Controls over how code changes reach production. The point isn't paperwork for its own sake — it's to guarantee that no single person can unilaterally push an unreviewed change to a system that affects the financials. The mechanism is separation of duties.

Figure 3.1 — Separation of duties: the control that makes AI autonomy a conversation

Why this is THE governance concept for Cursor: if an agent both writes and auto-commits a change, it can blur author ≠ approver. The guardrail: agents propose, humans approve — separation of duties survives intact.

◆ How modern orgs satisfy auditors (say this and you sound experienced)

Auditors increasingly accept automated evidence: PR approvals plus pipeline logs are the audit trail. Standard, low-risk changes ride pre-approved automated paths (no CAB meeting); only higher-risk changes need human CAB approval. So the framing is: "Governance is risk management and evidence. AI assistance doesn't change the risk tiers — it makes the evidence for the standard tier cheaper and richer to produce."

2 · Security's seats in the SDLC — and the review where you are the vendor

Security shows up in the pipeline as automated gates (SAST, DAST, SCA, secrets scanning) and as human review gates for sensitive changes, plus supply-chain concerns (SBOM, provenance, the SLSA vocabulary). But on Day 3 there's a second security story: the vendor security review of Cursor itself. A bank will run you through a questionnaire — data-flow diagrams, SOC 2 evidence, sub-processors, retention. You must be able to answer it from memory and then verify against the docs.

SAST static analysis DAST dynamic/runtime SCA dependency & license Secrets scanning SBOM bill of materials SLSA supply-chain levels Provenance who built what

3 · Cursor's control surface, mapped to what the customer asks for

This is the table you should be able to reproduce on a whiteboard. For every customer control requirement, name the Cursor control that answers it. Group it into five families: identity, data, policy, network, visibility — plus Organizations as the cross-team admin plane.

Family	Customer asks…	Cursor control that answers it
Identity	"Enforce our SSO, provision/deprovision, scope roles"	SSO (SAML/OIDC), SCIM, RBAC, disable local login, MDM policies
Data	"Our code can't be retained or train models"	Privacy Mode with contractual zero-data-retention, AES-256 at rest, TLS 1.2+ in transit
Policy	"Restrict which models, repos, MCP servers, commands"	Model / MCP-server / repo allowlists & blocklists, hooks, terminal sandboxing, agent guardrails
Network	"Keep traffic on our network, reach private repos"	IP allowlisting, proxies, AWS PrivateLink, Cloudflare Tunnel
Visibility	"Prove who did what; show me AI usage"	Audit logs, admin analytics, AI-code tracking
Org plane	"Different policy per team / subsidiary"	Organizations (GA): per-team security/governance/budget; Groups for cohort-level model & agent permissions

✦ Verified facts to state with confidence (June 2026)

SOC 2 Type II · AES-256 at rest · TLS 1.2+ in transit · annual third-party penetration testing · Privacy Mode zero-data-retention terms with model providers. Honest nuance to volunteer: ZDR does not apply when a team brings its own model API keys, and Privacy Mode is a setting that admins should enforce org-wide — don't imply it's automatic on every plan. Cursor's enterprise page also now cites "trusted by 64% of the Fortune 500."

4 · The trust equation — what turns a blocker into an ally

Senior engineers and security become blockers when AI changes are unreviewable, undisclosed, or unaccountable. The same people become allies when each of those is inverted. Memorize this as a transform:

Figure 3.2 — The trust equation

5 · "10 hard questions" — and the two where honesty wins

Your strongest trust signal is admitting a limit and pairing it with a mitigation. Prepare honest answers; here are worked examples of the genre — note the two that concede a real limitation.

"Where does our code go, and is it retained?"

Code is sent to the model provider to serve completions/agent actions; with Privacy Mode + ZDR terms, providers don't store it or train on it. Encrypted in transit (TLS 1.2+) and at rest (AES-256). Limit to volunteer: ZDR doesn't apply if you use your own API keys — so for a SOX pilot I'd standardize on Cursor-managed access with Privacy Mode enforced.

"What can the agent actually execute?"

As much or as little as you allow: terminal sandboxing, command allow/blocklists, and hooks gate execution; cloud agents run in isolated VMs. Limit: autonomy is a spectrum and misconfiguration is possible — so I scope minimum privilege for the pilot and expand only on evidence.

"How do we audit AI usage?"

Admin analytics, audit logs, and AI-code tracking; Organizations rolls usage and spend up per team. You can answer "who used what, where" for an auditor.

"Sub-processors & tenancy?"

Point to the Trust Center / SOC 2 report and sub-processor list rather than improvising. Discipline: never invent a compliance, integration, or roadmap claim — record it as a follow-up and send the document.

▶ Say it like this

"Cursor's job is to fit inside the control framework, not fight it. The human PR review gate stays exactly where it is — AI raises the quality of what arrives at the gate, and the admin plane gives security the policy and audit controls to govern how it's used. An auditor reconstructing a deploy sees the same trail — ticket, PR, approvals, pipeline log — often richer, because the change is better described."

▲ Vocabulary discipline

Use "separation of duties," "risk tier," "evidence," "ITGC" naturally — once or twice, not as a tic. Most candidates can demo; few can say these correctly. Overusing the jargon reads as memorized; deploying it precisely reads as experienced.

Separation of duties (author ≠ approver ≠ deployer). If the agent both authors and commits/merges, it can collapse the author/approver boundary an ITGC auditor relies on. The guardrail: agents propose changes as PRs that a human must review and approve; you keep the human gate and audit trail intact. This is why "AI proposes" is default-safe and "AI commits autonomously" is a governed exception, not the starting posture.

Security teams assume every vendor oversells. A rep who says "here's the limitation, here's the mitigation" (e.g., "ZDR doesn't apply with your own API keys, so we'd enforce Cursor-managed access for the SOX repos") is more credible, not less — it signals you'll tell them the truth when it matters operationally. Honesty about limits is a field engineer's strongest trust signal, and it converts the security gatekeeper from adversary to collaborator.

Mark Day 3 complete

DAY 4

Cursor team workflows in a shared repository

Move from individual prompting to governed, repeatable, repository-grounded team workflows.

A clever prompt helps one person once. The unit that matters for an enterprise is a repeatable, reviewable workflow encoded close to the repository. This is the day you stop thinking "AI assistant" and start thinking "encoded team standards that happen to be executed by an agent." That reframe — rules and shared context turn 50 individual prompters into one team — is governance, not convenience.

1 · The progression from ad-hoc to governed

Teams climb this ladder whether or not anyone plans it. Your job is to make the climb deliberate. Each rung moves capability out of one person's head and into shared, version-controlled artifacts the whole team inherits.

Figure 4.1 — From 50 prompters to one team

Project Rules are the pivot: "how a tech lead encodes standards once and has them enforced in every AI interaction."

2 · Context strategy in large repos — an engineering problem, not a prompt-length problem

In a Java monolith plus dozens of TS services, "just paste more" fails. Context quality comes from system design: let codebase indexing and search discover code, give exact artifacts when you know them (@-mention files, symbols, docs), pull external context via MCP (Jira, Confluence), and exclude sensitive or irrelevant paths with ignore controls. Start from the task and the system boundaries, not from a wall of text.

Do

Start from the task & the system boundary
Let indexing/search discover code
@-mention exact files/symbols/docs you know
Bring in Jira/Confluence via MCP
Exclude secrets & irrelevant paths (ignore)

Don't

Dump the whole repo and hope
Treat context as a length contest
Leak sensitive config into prompts
Assume the agent sees Confluence/Jira by default

3 · Change-shaping discipline — "done" ≠ "ready to merge"

The phrase to repeat: "Agent completed the task" does not mean "the change is ready to merge." Shape every change toward reviewability: small diffs, explicit constraints, tests as verifiable targets, isolated branches/worktrees, plan-before-implement, and a clear handoff to human review. The explicit anti-goal is the one-enormous-AI-rewrite PR no human can responsibly approve.

4 · The agentic surface — each step up in autonomy needs a step up in guardrails

Know what runs where, and which admin controls and audit visibility apply at each layer. The mental model is a staircase: as you move from tab-complete to autonomous cloud agents, autonomy rises — and so must the guardrails, in lockstep.

Figure 4.2 — Autonomy vs. guardrails must rise together

Never lead with the highest-autonomy capability in a low-trust environment. Match the rung to the customer's evidence and risk tolerance.

The four prompt patterns worth memorizing verbatim

These encode change-shaping discipline directly. Practice them until they're muscle memory — they double as demo narration on Day 7.

# 1 · Explore before proposing (read-only)
Trace how this behavior works today. Identify entry points, data flow,
tests, configuration, operational dependencies. Cite files; separate
confirmed behavior from assumptions. Do not edit.

# 2 · Plan a reviewable change
Propose the smallest implementation satisfying these acceptance criteria.
Include affected files, interfaces, test changes, rollout concerns,
unresolved decisions. Do not implement until I approve the plan.

# 3 · Constrain a refactor
Preserve public behavior and interfaces. Change only this bounded subsystem.
Add characterization tests first, make the smallest coherent diff,
stop if tests reveal undocumented behavior.

# 4 · Prepare human review
Review the final diff against the ticket and repo conventions. Summarize
behavior changes, risks, test evidence, operational impact, and
what the reviewer must verify manually.

The capability × SDLC-phase one-pager (your whiteboard asset)

Map each Cursor capability to a lifecycle phase and to the enterprise control that governs it. This is what you draw from memory when someone says "show me where this fits."

SDLC phase	Cursor capability	Governing control
Design	Ask mode exploration; RFC/ADR drafting; Plan mode	Architect review stays human
Implement	IDE Agent under Project Rules; small diffs	Branch protection; rules as encoded standards
Review	Bugbot PR review; "prepare human review" prompt	CODEOWNERS + 2 reviewers unchanged
Test	Characterization & unit test generation	Coverage gate; tests assert intent
CI / integrate	CLI `/debug` triage; headless in pipeline	Required checks; no disabling gates
Release	Richer change descriptions / evidence	CAB, change windows, separation of duties
Operate	Incident triage; corrective-change drafting	Postmortem ownership; audit logs

▶ Say it like this

"For a team, the unit that matters is not the clever prompt — it's a repeatable, reviewable workflow encoded close to the repository. Shared rules and commands, planning required for ambiguous work, tests and linters as feedback, small diffs, and the same branch and PR controls the team already trusts. Project rules are how a tech lead encodes standards once and has them enforced in every AI interaction."

◆ The ownership rule (put it in the playbook)

Generated code is owned by the developer who submits it. That one sentence resolves most "who's accountable for AI code?" anxiety and keeps the human in the loop where governance needs them.

Because they move a standard out of individual habit and into a shared, version-controlled artifact every agent interaction inherits. Without them you have 50 developers each prompting their own way — inconsistent style, tests, and architectural boundaries. With them, a tech lead encodes the standard once (style, test requirements, forbidden patterns, architectural boundaries) and it's enforced uniformly. That's the same function as a linter or a CODEOWNERS file: encoded, repeatable governance.

Map it to the autonomy/guardrails staircase: the highest-autonomy rung requires the most guardrails and the most earned trust, so you don't start there. Propose a maturity path — useful assistive and governed-workflow use now, with higher autonomy made conditional on evidence and controls (scoped permissions, sandboxing, audit, verified low change-failure rate on bounded tasks). "Never lead with the highest-autonomy capability in a low-trust environment."

▲ Be ready for: "what shipped recently that matters to enterprises?"

Have two or three changelog items with the enterprise reason each matters: Organizations (per-team governance/budget at scale), Bugbot Autofix (independent review signal + isolated-VM fixes), CLI /debug + Cloud Agents in isolated VMs (scriptable, auditable automation with sandboxing). Knowing the warts too (monorepo indexing friction, rule tuning) is credibility, not weakness.

Mark Day 4 complete

DAY 5

AI-assisted review, testing, CI debugging & anti-patterns

Design AI assistance around independent verification and existing PR controls — and know exactly how teams break trust.

The unit of trust in an enterprise is the PR. Everything Cursor does should land as a well-scoped, well-described, well-tested PR — so that nothing downstream of the merge has to change. Day 5 is about adding AI as an independent signal inside the review process the team already trusts, never as a replacement for required human approval — and about naming the ways teams destroy trust before anyone asks.

1 · Layered review — AI is one independent signal, not the gate

Review in a serious org is layered. Each layer catches a different class of problem, and in regulated orgs the final human approval is mandatory — separation of duties survives intact. AI slots in as an extra, independent pass that improves what reaches the human, not as a substitute for them.

Figure 5.1 — The layered review stack

✦ Bugbot mechanics — verified June 2026

Auto-runs on each PR update; reads the diff and top-level + inline PR comments for context; leaves inline comments at the exact issue location with suggested fixes; catches logic bugs, edge cases, security and quality issues. Custom rules via .cursor/BUGBOT.md in natural language, scopable per area (backend vs. frontend vs. migrations). Autofix spins isolated cloud-VM agents that push fix commits — ~35% of autofix changes get merged. June 2026: ~3× faster, 22% cheaper, ~10% more bugs found, 90% of runs under 3 minutes. (Your plan's "~70% of flags resolved pre-merge" is an older directional stat — verify before quoting.)

▲ What actually destroys trust in AI review

False positives and ungrounded comments. A bot that cries wolf trains reviewers to ignore it — and then it's worse than nothing. So tuning is a first-class activity with a named owner and a cadence. Treat BUGBOT.md rules like code: scoped, severity-rated, reviewed, and pruned. "We turned Bugbot off, it's noise" is almost always an untuned-rules / wrong-expectations problem, not a product failure.

2 · Test generation with verification discipline

The trap: generated tests that mirror the implementation instead of asserting intended behavior — they pass, prove nothing, and lock in bugs. The discipline: tests must assert intent; write characterization tests before a refactor to pin existing behavior; treat coverage as a gate input, never the goal. Coverage-as-goal is how you get 90% coverage of meaningless assertions.

3 · Cursor in the CI loop — the bright line

Cursor's CLI/agent can triage failing builds from logs (/debug), generate tests to clear coverage gates, and draft fixes as normal PRs; it can run headless inside controlled automation. The bright line you must articulate clearly:

"AI proposes" — default-safe

Drafts a fix, opens a PR, suggests tests. A human still reviews and approves. No new control surface; fits existing gates.

"AI commits autonomously" — governed exception

Requires explicit guardrails, sandboxing, audit, scoped permissions. Earned with evidence, never the starting posture.

◆ Never "fix" a pipeline by disabling a check

If a gate is failing, the answer is a better change or a conversation with the policy owner — never silently disabling the check. Say this unprompted and platform leads relax.

The 5-step CI break-fix workflow you'd teach a customer team

Summarize the failure — have the agent read the logs and state what failed, citing evidence.
Separate primary from secondary — one root cause usually cascades; isolate it from noise.
Reproduce — propose the minimal reproduction steps before touching code.
Identify likely causes & state uncertainty — ranked hypotheses, not false confidence.
Smallest safe fix as a PR — minimal diff, tests added, normal review path.

4 · The anti-pattern taxonomy — know ~9 cold, each with its guardrail

A candidate who says "here's how teams misuse this and how I'd prevent it" reads as field-experienced. One who only lists features reads as a fan. Learn each anti-pattern as a triple: the failure → who loses trust → the preventing guardrail.

1 · Unreviewed vibe-merges

Merging AI code nobody understood. Loses: seniors, security. Guardrail: required human review unchanged; ownership rule.

2 · Mega-diff PRs

One giant rewrite no one can review. Loses: reviewers. Guardrail: small-diff discipline; slice the work.

3 · Prompt-and-pray debugging

Guessing fixes with no reproduction. Loses: platform/SRE. Guardrail: evidence-first break-fix workflow.

4 · Fabricated confidence

Uncited claims stated as fact. Loses: everyone. Guardrail: require citations; state uncertainty.

5 · Hidden generated code

No disclosure of AI authorship. Loses: security, auditors. Guardrail: disclosure norms; AI-code tracking.

6 · Context rot

Stale/contradictory rules. Loses: the team. Guardrail: rules owned, reviewed, pruned on a cadence.

7 · Secrets & prompt injection

Sensitive context exposed; untrusted content hijacks the agent. Loses: security. Guardrail: ignore controls, sandboxing, minimum privilege.

8 · Excessive agent permissions

Agent can run/network more than needed. Loses: security, SRE. Guardrail: least privilege; expand on evidence only.

9 · Volume as success + mandated usage

Counting AI lines; forcing adoption with no enablement (seniors excluded → they become the resistance). Loses: seniors, EMs. Guardrail: outcome metrics; enablement, not mandates.

▶ Say it like this

"AI review adds an independent signal; it never collapses separation of duties. I give agents the minimum context, tools, network access, and command autonomy the use case requires, and expand only after evidence. And I'll tell you the failure modes before you ask — vibe-merges, mega-diffs, review-noise fatigue — because preventing them is the actual job."

Almost never a product failure — it's untuned rules, wrong severity thresholds, no tuning owner, or wrong expectations. Recovery: assign an owner and cadence for BUGBOT.md; scope rules per area; set severities so only real blockers block (others comment); reset expectations that AI is one independent signal, not the gate; track false-positive rate and prune. False positives are precisely what destroy trust, so tuning is first-class work, not an afterthought.

Because tests can mirror the implementation rather than assert intended behavior — you get high coverage that proves nothing and can lock in bugs. Coverage is a useful gate input but a terrible goal. The discipline: tests assert intent, characterization tests pin existing behavior before a refactor, and a human verifies the assertions are meaningful. Optimizing coverage as the target is Goodhart's law in miniature.

Mark Day 5 complete

DAY 6

Discovery, value mapping & the 90-day pilot

Diagnose before demonstrating; then turn a useful tool into a controlled organizational change. The core of the job.

Everything before this was preparation; this is the job. A field engineer who leads with a demo is a sales engineer doing it backwards. You diagnose first — uncover how work moves, where it waits, what security needs, and who decides — then prescribe a bounded, measurable, governed pilot. The deliverable from this day, a 90-day rollout plan for a SOX-constrained 200-dev org, is your single best interview asset.

1 · A discovery framework you own — and label everything

Run discovery across seven dimensions. For every answer, tag it fact hypothesis unknown — that labeling discipline is what separates discovery from a pitch in disguise.

Dimension	What you're trying to learn
Org shape	Teams, repos, monorepo?, languages, who's representative vs. exceptional
SDLC	Methodology, artifacts, design-doc culture
CI/CD	Tools, gates, cadence, who owns the pipeline
Security posture	Data classification, vendor review, existing AI policy
Current AI usage	Sanctioned and shadow — there's always shadow usage
Pain	Cycle time, review latency, onboarding, legacy/test debt, migrations
Buying process	Champion, economic buyer, security gatekeeper, procurement

◆ Shadow AI is a sale, not a scandal

There is always shadow AI usage. Don't moralize about it — replacing ungoverned usage with governed usage is a security win you can sell. "You already have AI in your codebase; the question is whether it's under your SSO, your model allowlist, and your audit log — or not."

2 · Value mapping — pain → capability → metric, per persona

This is where Day 1's persona map and Day 4's capability map combine. For each persona, trace a line from their pain to a Cursor capability to a metric they'd accept. Then translate it for the economic buyer in engineer-hours, cycle time, and new-hire time-to-first-commit — the only dialect that funds a deal.

Figure 6.1 — The value-map spine (one row per persona)

Prioritize use cases by value, user pull, repo readiness, testability, risk, time-to-evidence, and repeatability.

3 · The maturity ladder — stage capabilities, not just users

The subtle rollout insight: most people stage users in waves. You also stage capabilities in rungs. Different teams advance at different rates, and you never lead with the highest-autonomy rung in a low-trust environment.

Figure 6.2 — The five-rung capability maturity ladder

4 · The 90-day pilot for 100–500 engineers

Representative-but-motivated cohort (2–3 teams including one legacy codebase). Guardrails day one. Enablement you can cite from Box. Baselined metrics. Pre-agreed expand/modify/pause/stop criteria. Here's the shape:

Figure 6.3 — The 90-day rollout

Enablement you can cite: Box's mentorship model — power users coaching peers 1 hr/week — lifted usage +75% in six weeks. Market pattern: 3–6 month pilots → 500–5,000-seat deployments.

▲ The pilot scorecard — baseline everything first

Four lenses, or the readout becomes a vibes argument: Adoption & capability (weekly active in cohort, repeat use across workflows — not autocomplete-acceptance, time-to-first-review-ready workflow); Flow (work-start → first review-ready PR, PR cycle time, CI-repair time, migration progress); Quality & safety (change failure rate, escaped defects, rework, review-finding acceptance & false-positive rate, policy exceptions); Experience & trust (developer-reported time saved with examples, reviewer confidence, security/platform confidence, champion narratives).

Discovery question bank — the openers that work

The killer opener: "Walk me through the last meaningful change from request to production — where did it wait?" Waiting time is where the value is.
"Which repos and teams are representative, and which are exceptional?"
"Where do engineers spend time understanding rather than changing code?"
"Which failures escape to production, and which checks catch them?"
"What would security need to approve a pilot — and what would make them stop one?"
"What result after 30 days justifies expansion; what signal causes a pause?"
"What did your last dev-tool rollout teach you?" (failures teach selection & enablement)

▶ Say it like this

"I wouldn't start with 200 identical seats and hope. I'd pick a representative but motivated cohort, set security and workflow baselines, prove two repeatable use cases, and expand against explicit criteria — segmenting permissions and autonomy by risk, staging capabilities, not just users. Guardrails first, enablement second, expansion third. Skip enablement and the tool gets bought but not used; skip guardrails and security kills the deal at week six."

Because saying "not yet" to a technically-possible but organizationally-premature use case is a credibility move — it proves you're optimizing for the customer's success and risk tolerance, not for showing off the flashiest capability. It also protects the pilot: a single high-risk use case that goes wrong can sink trust for the whole rollout. You select one primary, one secondary, and one explicitly deferred, with the reason stated.

It measures usage, not value, and usage is only a leading indicator. More generated code can actually increase downstream load on review and QA — so volume can correlate with worse outcomes. You baseline cycle time, review latency, change failure rate, and developer-reported time saved before the pilot, then argue the result on those. Otherwise the readout is a vibes argument the skeptics will win.

Mark Day 6 complete

DAY 7

Demo, objections & the interview loop

Performance day. Combine discovery, governance, workflow, and value into credible field execution.

Everything compounds here. The demo, the objection handling, and your point of view are where six days of system-thinking either land as credible field execution or evaporate into a feature tour. The bar: if your audience walks away remembering features, you failed. They should be able to retell the problem, the workflow change, the preserved controls, and the next decision.

1 · Demo craft — anchor on their world, not a todo app

Anchor on their stack and a realistic repo (a 15-year-old Java monolith, never a toy). Structure every segment as tell → show → tell, and run the whole demo on one arc:

Figure 7.1 — The demo arc: every segment ends by naming the control

The 15–20 minute Northstar demo, in order

Jira-style ticket — business context, acceptance criteria, constraints, out-of-scope.
Explore current behavior (Ask mode) — read-only, cite files, separate fact from assumption.
Reviewable plan (Plan mode) — smallest change, unknowns surfaced, you approve before code.
Small bounded change under Project Rules — show where the rules changed the output.
Meaningful tests — asserting intent, then run them.
Local diff review → PR with risk & test evidence in the description.
Bugbot review layered under the required human reviewers.
Triage a deliberately failing CI check → smallest safe fix.

◆ Planned imperfection beats fake perfection

Show inspection and correction, not a flawless first take. Demo the failure path on purpose: slow indexing, a wrong agent assumption, a failing test, an unhelpful review finding, a blocked command, and a capability you can't confirm — answered with "I'll verify and follow up," said with confidence. A recovered failure builds more trust than a suspiciously perfect run.

2 · Objection fluency — the canonical seven, concede first

Answer each in ~30 seconds, and always concede what's true before you counter. The concession is what makes the counter land; skip it and you sound like a brochure.

1 · "Security / IP — where does our code go?"

Concede: legitimate, non-negotiable concern. Counter: Privacy Mode + ZDR, SOC 2 Type II, AES-256/TLS, allowlists, audit logs, PrivateLink/Tunnel; scope minimum privilege for the pilot. Offer the Trust Center doc.

2 · "Copilot is basically free in our MS bundle."

Concede: real budget logic — don't pretend it isn't. Counter: differentiate on agentic depth, rules-as-governance, Bugbot, admin/control plane, and measured outcomes — not on price. Propose a head-to-head on one workflow.

3 · "Our seniors hate AI code."

Concede: good — their skepticism is a feature. Counter: involve them as reviewers/rule-authors early; small diffs, disclosure, unchanged gates. Seniors excluded become the resistance; seniors enlisted become champions.

4 · "We tried AI and it wrote garbage."

Concede: believe them; ungoverned use does that. Counter: the difference is rules, scoped context, plan-first, and tests as targets — governed workflow, not raw prompting. Offer to reproduce a real task live.

5 · "Juniors will stop learning."

Concede: a real risk worth designing against. Counter: Ask mode as a teacher, review discipline, and pairing norms; the goal is understanding what they submit (ownership rule), not blind acceptance.

6 · "Seat cost math doesn't work."

Concede: ~$40/seat is real money at scale. Counter: translate to engineer-hours, cycle time, onboarding ramp; pilot proves it on a baseline before you expand the seat count.

7 · "Our codebase is too weird / legacy / big."

Concede: indexing a giant monolith has real friction. Counter: that's exactly where exploration + characterization tests + migration slicing shine; pick the legacy team as the pilot cohort.

The discipline under all seven

Never improvise a security, compliance, integration, or roadmap claim. "I'll verify and follow up" — recorded as a real follow-up — beats a confident guess that later proves wrong.

3 · Your 90-second point of view

Interviewers remember a thesis and forget feature recitals. Have an opinionated, specific, yours view ready. A strong default, in your own words:

▶ A 90-second POV you can adapt

"AI in the enterprise SDLC isn't about typing speed — that was never the constraint. The constraint is the system: review queues, environment waits, evidence generation, and trust. So the win isn't more code; it's smaller, better-described, better-tested changes that move through the existing gates faster and leave a richer audit trail. The teams that succeed treat it as encoded standards and governed autonomy — guardrails first, staged by risk — not as a magic autocomplete. My job is to make Cursor useful inside the customer's real engineering system, prove it on their scoreboard, and increase autonomy only as fast as the evidence and controls allow."

Demo design rules (the checklist)

Begin with the customer's workflow and problem, never a feature tour.
Narrate ownership: what the engineer owns, what Cursor assists, what the existing system validates.
Use their vocabulary: ticket, repo, required checks, environment, release record, approval.
One bounded workflow end-to-end — not ten fragments.
Show a failure and a recovery on purpose.
The test: if they mainly remember features, redesign it.

▶ Close by interviewing them

End with a field question that shows you think like the role: "Where do your enterprise pilots stall most often — security review, enablement, or champion turnover?" And tune to the posting: FE interviews stress evaluation/POC motion (Days 6–7); FDE interviews stress shipping workflows in customer environments (Days 4–5 — go deep on rules, CLI, Bugbot config, migration slicing).

Because the concession earns the right to be heard. Enterprise buyers have been pitched by people who deny every concern; conceding the true part ("yes, ~$40/seat is real money," "yes, ungoverned AI writes garbage") signals you're honest and lowers their defenses, so the counter actually lands. Skipping it makes you sound like a brochure and confirms their suspicion that you'll oversell. Concede-then-counter is also how you handle the "seniors hate AI" objection — you agree their skepticism is valuable, then enlist it.

After it ends, can the audience retell the problem, the workflow change, the preserved controls, and the next decision — without listing features? If they mainly remember features, the demo failed and you redesign it. This is why you anchor on their stack, run one bounded workflow end-to-end on the pain→workflow→guardrail→metric arc, and show a deliberate failure and recovery. Memorable thesis beats feature recital.

Mark Day 7 complete

★ EXAM

Capstone, the interview spine & self-assessment

The structures you fall back on when a question is ambiguous — and the drills that prove you're ready.

The one-page interview spine

For any ambiguous enterprise question, answer in this order. When you're nervous, this sequence is your safety rail — it forces you to diagnose before you prescribe, exactly like the job.

Figure ★ — The 7-step answer spine

The 10 capstone drills (spoken, 10–15 min each)

Have your second assistant challenge assumptions and ask follow-ups, then grade on enterprise realism, technical accuracy, honesty about limits, persona-awareness, and structure. Pass bar: 8/10 fluent with specifics — named controls, named Cursor features, named metrics. Each drill maps to the day that arms it.

1 · Skeptical VP D2·D3

"Why will Cursor improve delivery rather than overload review/QA — without weakening one SOX control?" Must name: separation of duties, audit evidence, unchanged gates, the system constraint.

2 · Regulated rollout D6

3-month plan for 400 engineers, GitHub+Jenkins+ServiceNow CAB, SOX services. Cohorts, guardrails, enablement, metrics, expand/kill criteria.

3 · Security lead eval D3

"Where does code go, what's retained, who sees it, what can agents execute, how do I audit?" Accurate, under 3 min, incl. scoping agent permissions.

4 · Discovery sim D6

30 min with eng leader + platform + security; surface enough SDLC/CI-CD to pick one credible pilot use case — and say what each question was for.

5 · "Bugbot is noise" D5

Diagnose untuned rules / wrong severities / no owner / wrong expectations; propose the recovery path.

6 · Mixed pilot results D6

High usage, flat cycle time, worse defect escape, lukewarm seniors. What do you investigate; modify/pause/stop?

7 · Enterprise demo design D7

20 min for a 15-yr Java monolith shop, low coverage, formal approvals. What do you show, in what order, why?

8 · "Copilot is free" D7

Respond honestly, concede what's true, then differentiate on depth/governance/outcomes — not price.

9 · Senior objection D5·D7

"Harder to review, juniors stop learning, lowers standards." Respond without dismissing any of it.

10 · Autonomy boundary D4·D6

"We want agents taking tickets straight to prod." Propose a maturity path; higher autonomy conditional on evidence + controls.

Final self-assessment

Score each 1–5; anything under 4 is a follow-up target for the morning. Tick the ones you can already do fluently, out loud, with specifics.

Draw a credible enterprise SDLC with owner, artifact, system, exit condition, and risk at every phase.
Trace a change through branching, CI, artifacts, environments, deployment, verification, and rollback.
Explain governance as risk and evidence — and name the Cursor control answering each customer control.
Map Cursor to roles and phases without forcing a feature into every box.
Design shared-repo AI workflows that produce small, tested, reviewable changes.
Explain how AI review complements PR controls without collapsing separation of duties.
Run discovery that uncovers constraints, stakeholders, metrics, and adoption risks.
Design a staged, measurable, governed rollout for 100–500 engineers.
Run a customer-specific demo and recover credibly when something fails.
State uncertainty and product limits without losing authority.

Portfolio artifacts — your interview evidence

Referencing these unprompted ("I built a 90-day pilot plan for a hypothetical SOX-constrained 200-dev org — here's how I structured the guardrails") beats any credential. Build them as you go.

Output	From	Use in the interview
`01-current-state-sdlc.md`	Day 1	"Who cares about what" backbone; 2-min lifecycle narration
`02-cicd-toolchain-map.md`	Day 2	90-second pipeline whiteboard; any-toolchain fluency
`03-governance-control-map.md`	Day 3	CISO/security objections; the differentiator
`04-team-ai-workflow.md`	Day 4	Product depth; rules-as-governance; prompt patterns
`05-review-test-trust-model.md`	Day 5	"How do you keep AI changes safe"; failure-mode credibility
`06-discovery-and-90-day-rollout.md`	Day 6	The core FE-motion evidence; your best asset
`07-field-interview-pack.md`	Day 7	The performance itself

▶ The mindset to walk in with

"My job is to make Cursor useful inside the customer's real engineering system. I reconstruct how work and risk move, select a bounded high-value workflow, configure the right context and controls, and prove the result against delivery and quality measures. The demo, pilot, and rollout all reflect their actual repo, pipeline, and governance." Evidence that you do the job unprompted is the whole game.