TL;DR — Every senior engineer on LinkedIn claims to "use AI daily." Almost none of them can describe what they do when Cursor generates a confident answer that's subtly wrong. This post is the framework we use at Recruo to separate signal from noise on AI fluency — four dimensions, a 45-minute interview loop, and a rubric a hiring manager can actually apply.
Why "do they use AI?" is the wrong question
In 2022, asking an engineer whether they used AI was a real question. In 2026, it's the same as asking whether they use a keyboard. 78% of engineering teams have Copilot or Cursor in their default toolchain. Everyone uses AI.
What separates a senior engineer who is genuinely leveraged by AI from one who is silently bottlenecked by it is the quality of their judgment around the model, not their exposure to it. This is the difference most interview loops miss entirely.
Two candidates for the same backend role, same seniority, same stack, same stated AI experience:
Both will tell you in an interview that they "use Cursor daily for backend work." Only one of them is actually a force multiplier with it.
Your interview needs to measure the second behaviour, not the first.
What AI fluency looks like in a senior engineer
We break AI fluency into four dimensions. A strong senior shows up on all four; a weak one usually fails on the same two.
Prompt engineering without ceremony. They talk to the model the way they would talk to a junior engineer on their team — precise, with context, with the constraints named up front. They don't do "prompt magic." They do clear briefing. The tell: they read back their own prompt before sending it and edit it down.
Hallucination detection as a reflex. They assume the model might be subtly wrong. They know which kinds of tasks the model is reliable on (syntax, well-known algorithms, boilerplate) and which it is unreliable on (anything involving up-to-date library versions, proprietary business logic, non-obvious concurrency, cross-service state). The tell: they read the output with the same skepticism they would read a stranger's PR.
Decomposition and scope control. They never ask the model to "write the feature." They decompose the feature into units small enough that the model's output is verifiable in under a minute each. They use the model as a fast autocomplete for well-scoped units, not as an autonomous agent for whole PRs. The tell: their prompts are narrow, and their total time-saved per task comes from compounding many small wins.
Knowing when not to use it. A fluent engineer reaches past the model for anything that requires architectural judgment, cross-team context, or a decision with downstream blast radius. They use the model for acceleration, not for judgment. The tell: they can articulate, per task type, whether the model helps, hurts, or is neutral.
A candidate who is strong on two of these and weak on two is common. That's the engineer who appears productive but occasionally ships quiet bugs, or who is productive on known-good patterns but gets bulldozed the first time the model confidently hallucinates an API that doesn't exist. Neither is a senior hire for a scale-up that ships.
A 4-part framework for the interview
Here is the actual structure we run, start to finish, when we evaluate AI fluency as part of a senior engineering screen. Total time: 45 minutes.
Part 1: The prompt post-mortem (8 minutes)
Ask the candidate to describe the last three times AI materially changed their output in the last two weeks. For each, ask:
A fluent engineer has specific, recent, rich answers. They can quote their own prompt roughly. They remember the moment they spotted the hallucination. A non-fluent engineer gives you three generic answers that sound like they were generated ten seconds ago.
Part 2: The live AI-assisted task (20 minutes)
Give the candidate a realistic, scoped task — fifteen minutes of work for a senior — and tell them they can use any AI tool they want, with full screen share. The task should have at least one subtle wrinkle that a model will handle wrong on the first pass (an outdated library convention, a concurrency edge, a domain-specific constraint you've briefed them on but the model doesn't know).
You are not scoring whether they finish. You are scoring:
Record what they do. The gap between engineers on this task is enormous and very visible. Fluent seniors finish slightly faster than they would without AI, with higher-quality output. Non-fluent candidates finish slightly slower with worse output, because they are fighting the model instead of driving it.
Part 3: The hallucination trap (10 minutes)
Give the candidate a one-page snippet of AI-generated code with three problems embedded. Two are subtle hallucinations — a library method that doesn't exist, a flag that was removed in the latest version, a confidently wrong error-handling path. One is a genuine bug that the model would plausibly produce.
Ask them to code-review it in real time. You are looking for:
Part 4: The meta question (7 minutes)
Close with a conversation about their team's relationship to AI. Specifically:
This last part is the senior filter. Mid-level engineers use AI well. Senior engineers use AI well and have opinions about how their team uses it. If the candidate has nothing to say here, they are probably a strong mid, not a senior.
A scorable rubric
If you want a scorable version, here is ours. Each dimension scored 1–5, with 3 as the hiring bar for a senior role.
| Dimension | 1 — Avoid | 3 — Hire bar | 5 — Exceptional |
|---|---|---|---|
| Prompting quality | Vague, single-turn, no context | Specific, gives context, iterates 2–3 times | Writes prompts like PRDs, treats context as first-class input |
| Hallucination detection | Accepts model output at face value | Spots obvious hallucinations within a minute | Has internal priors about model failure modes, checks proactively |
| Scope control | Prompts for whole features | Breaks features into verifiable chunks | Instinctively calibrates scope to "verifiable in under 60 seconds" |
| Meta-awareness | No team-level opinions | Has opinions, can articulate tradeoffs | Drives team policy on when to use vs. not use AI |
| Code ownership | Cannot modify AI-generated code under pressure | Modifies comfortably, explains choices | Rewrites AI output from scratch when faster than editing |
A candidate landing below 3 on two or more dimensions is not a senior hire in 2026, regardless of how their CV reads.
Red flags and green flags
Beyond the rubric, a few high-signal cues that tend to predict the rubric score before you get to it.
Red flags:
Green flags:
How this fits into the rest of the interview
An AI-fluency screen does not replace your existing technical loop. It augments one stage of it.
Run it after the first-round technical phone screen, before the live secure technical interview. Reason: you want to know whether the candidate is AI-fluent before you see them code under pressure. That context changes how you interpret their output in the next round. An engineer who produces clean code while visibly fighting the model is a different hire than one producing clean code as a force multiplier.
If you already run the defensible four-stage interview loop described in our AI-cheating post, the AI-fluency screen replaces or augments Stage 1 (the asynchronous AI-aware screen). In practice we run it live rather than async, because the micro-behaviours that reveal fluency are much harder to fake in real time than in a recording.
What we do at Recruo
Every candidate we present for a senior engineering role goes through an AI Skills Validation step, which runs the 4-part framework above inside Recruo Secure Browser. The output is a dedicated AI Fluency score that travels with the candidate alongside the standard technical evaluation, and it is visible on the sample scorecard we publish.
Two things that fall out of this, consistently.
First, roughly one in three candidates who pass a standard technical screen score below the senior bar on AI fluency. The hiring bar for "senior who can actually use AI" is meaningfully higher than "senior who writes good code." Most internal interview loops aren't measuring the gap, so they don't see it.
Second, clients who see the AI Fluency score often shift their own internal bar. They start re-evaluating existing team members on the same framework and surface a couple of surprising gaps — and a couple of surprising strengths among engineers they had quietly underrated.
This is not a pitch; it is the single most predictive sub-score in our 2026 placement data. Engineers who land above 4 on AI fluency are placed faster, pass client final rounds at a higher rate, and are the ones clients ask us to prioritise for future searches.
What to do this week
Two concrete moves if you are hiring senior engineers right now.
Run the 4-part framework on one of your last three senior hires. It takes 45 minutes and tells you whether your existing loop would have caught fluency gaps you are currently absorbing.
Rewrite your first-round coding task so that AI use is allowed, encouraged, and observed. Most loops still treat AI as forbidden fruit. In 2026 that tests the wrong thing — it selects for engineers who can perform without their normal tools, which is the opposite of what you want on the team.
If you want us to run the full AI Skills Validation on your open senior roles as part of a pre-validated shortlist, book a 20-minute call — we'll walk through the framework, the rubric, and a live scorecard from a recent placement.
Related reading: How AI Is Breaking Technical Interviews — And What CTOs Can Do About It in 2026 · How Recruo's 6-Step Process Works
