AI-native engineering hire
Hire ML platform engineers who have actually run GPUs under load.
Most "MLOps" candidates have run CPU batch pipelines, not GPU inference at production load. We shortlist the 3–5 who have — in 5 business days, at a 15% success fee.
Scope the role on a 30-min call and we deliver a 3-candidate shortlist in 5 business days. Every candidate pre-screened by AI + reviewed by a human recruiter. 90-day replacement guarantee.
Why this role, why now
What an ML platform engineer actually does in 2026
An ML platform engineer owns the AI-specific infrastructure layer your product teams consume: inference serving, fine-tuning pipelines, GPU orchestration, the model registry, experiment tracking, feature stores, and the cost/latency dashboards that stop a Q3 OpenAI bill from becoming a board-level incident. They are not an ML engineer (who builds models), not an SRE, and not a generalist DevOps. They own the abstraction between your ML scientists and your Kubernetes cluster — and in 2026, that is the single highest-leverage role on an AI-native team of 8 or more engineers.
The stack has consolidated. We see the same toolchain in 80%+ of Series B–D engagements: Kubernetes (EKS or GKE) for orchestration, Ray or Anyscale for distributed training and batch inference, vLLM or SGLang for serving open-weight models, NVIDIA Triton or Modal for multi-model endpoints, Docker + Helm + Terraform for packaging, Prometheus + Grafana for infra observability, Langfuse or Arize for LLM-specific traces, and SageMaker / Vertex AI for managed pieces. The vLLM team's benchmarks (v0.6 release notes, Oct 2024) show ~2.7× throughput over naive HuggingFace `generate()` on a single A100 — an ML platform engineer is the person who captures that gain in production rather than in a benchmark table.
Demand is rising for a structural reason. NVIDIA's 2024 State of AI Infrastructure report found enterprises running more than one foundation model in production spend 43% of their AI budget on inference compute, and only 17% on training. Once a company graduates from one feature on openai.com to three features on a mix of OpenAI + Anthropic + self-hosted Llama or Qwen, infra complexity compounds faster than any product team can absorb as a side quest. Our rule of thumb from 30+ founder conversations: if you have fewer than five AI-focused engineers, this is an over-hire. If you have more than eight, the role is already overdue and someone is doing it badly on top of their real job.
Production signals matter more here than for any other role we hire. A strong ML platform engineer reasons natively in GPU utilisation, tokens-per-GPU-hour, cost-per-million-tokens-served, model-deployment MTTR, and eval-pipeline wall-clock. They know continuous batching from static batching, when KV-cache reuse earns its complexity, and have strong opinions on speculative decoding at the platform layer versus the application layer. If a candidate cannot draw the lifecycle of a single inference request from load balancer to tokenizer to GPU scheduler to response stream, they have not operated this layer; they have read about it.
How we source
How Recruo sources ML platform engineers specifically
The pattern we keep seeing in inbound CVs: 'MLOps engineer, 4 years' translated in practice to 'I wrote Airflow DAGs that ran scikit-learn on CPU nodes'. Legitimate platform experience, but not the same job as running GPU inference under real load. A CPU-bound batch scheduler never thinks about KV-cache fragmentation, GPU memory pressure under concurrent requests, spot-instance preemption during a fine-tune, or the difference between Triton's dynamic batching window and vLLM's continuous batching scheduler. We filter the two populations apart in the first 12 minutes.
Our sourcing channels for this role: the OSS contributor graphs of vLLM, Ray, Triton Inference Server, SGLang, KServe, and Hugging Face `text-generation-inference`; NVIDIA GTC speaker lists and MLPerf Inference submitter teams from the last 18 months (mlcommons.org); engineers with public Modal, Anyscale, or Together.ai production usage via conference talks or blog posts; KubeCon ML track attendees; and the private network of 640+ CEE AI engineers Nikita built during his time at Neurons Lab, which over-indexes on platform talent because the shop ran a shared ML platform serving 15+ enterprise clients in parallel.
Every candidate goes through a 12-minute AI technical interview tuned to this role. Sample probes: 'walk me through the last time you saw GPU utilisation sit at 40% on an H100 serving a 7B model — what did you change to get it to 80%?', 'your fine-tune job is evicted from spot every 90 minutes, checkpoint interval is 20 minutes — design the recovery flow', 'explain continuous batching to a backend engineer who has never touched a GPU'. The AI asks adaptive follow-ups. A human recruiter reviews before shortlist. Candidates who passed our filter in 2026-Q1 had a median 4 years of post-2022 GPU production experience and a 78% client interview pass rate.
The last layer is role-specific: we require every ML platform engineer shortlisted to have personally operated at least one multi-GPU inference deployment that served >1M tokens/day for >30 days. We verify via public artifacts (GitHub, a talk, a published post-mortem, a benchmark submission) and a reference call with the engineering lead at their previous platform. "I optimised a fine-tune" is not the same thing.
Placed talent
A recent placement, anonymised
Senior ML platform engineer, Wrocław-based · Placed 2026-Q1
Outcome: Shortlisted in 7 business days. Client interview pass: first round with CTO + staff ML eng, second round a live design exercise on multi-region inference failover. Signed offer in 14 days from shortlist. Still in role (3 months in at time of writing).
- Consolidated a fragmented multi-cloud inference setup (AWS SageMaker + GCP Vertex + two self-hosted Triton clusters) onto a single Ray + vLLM platform on EKS — cut monthly inference bill by 47% and p95 tail latency by 31% over 11 weeks.
- Shipped an internal model-deployment CLI the client's 14 AI engineers now use for every rollout: canary traffic routing, automatic eval gating (wired to Langfuse traces), one-command rollback. Deployment MTTR dropped from ~45 min to under 4 min.
- Built the cost-per-million-tokens dashboard the client CFO now reviews monthly — surfaced two forgotten GPU pools burning £6,400/month on 3% utilisation.
- OSS: contributor to vLLM (paged-attention bug fix, 2024), maintainer of a small Helm chart for Triton auto-scaling used by ~200 orgs per GitHub traffic stats.
- Daily working language: English (C1, verified in our interview). Native Polish, conversational German.
- Working setup: home office in Wrocław with a 1-day-a-week co-working space in the city centre; attended client office in London quarterly for platform architecture reviews.
- B2B contractor model (JDG in Poland); total comp to client €112K/yr vs London-local £155K equivalent (~€178K). Net platform headcount cost savings after CEE delta and agency fee: ~€58K/yr on role 1 alone.
Profile composed from 2 real placements in this role in 2025-Q4–2026-Q1 plus one active shortlist. Personally identifying details anonymised per GDPR Art. 5. Salary figures are averaged across the two placed candidates.
Hiring difficulty
Benchmarks we track
ML platform engineering is one of the narrower candidate pools we source — the raw population of engineers who have run GPUs under production load is genuinely small, and the self-identified "MLOps" population is roughly 4× larger than the qualified one. Screen-out rate is high; offer rate once shortlisted is very high.
CV → AI screen pass rate
11%
Source: Recruo internal (n=148 inbound CVs for ML platform roles, 2025-Q4–2026-Q1)
AI screen → human shortlist pass rate
52%
Source: Recruo internal (n=16 AI-screen passes, 2025-Q4–2026-Q1)
Shortlist → offer rate at client
78%
Source: Recruo internal (n=9 shortlists delivered, 2025-Q4–2026-Q1)
Median time-to-shortlist
7 business days
Source: Recruo internal (n=9 engagements, 2025-Q4–2026-Q1) — one day longer than LLM engineer average, driven by narrower candidate pool
UK market median time-to-hire (ML platform / MLOps)
94 days
Source: Hays UK AI Roles Salary Guide, 2026 edition (accessed 2026-04-12); LinkedIn Economic Graph UK tech roles report, 2026-Q1
CEE salary delta vs UK-local
35–44% lower
Source: Recruo placements (n=2 ML platform roles, 1 active) cross-referenced with Pracuj.pl and DOU 2026-Q1 platform engineer surveys
The 11% CV pass rate is the lowest across our role catalogue because "MLOps" has drifted to mean almost any ML-adjacent infra work, most of which is CPU-bound. The 78% offer rate once shortlisted is the highest we see — clients are typically desperate by the time they open an ML platform requisition (it is almost always opened late, after someone has been doing it badly for a quarter). Expect a slightly longer shortlist window than a pure LLM role and very high conversion once a candidate reaches your panel.
Reviewed by

CTO & Co-founder
Nikita ran the shared ML platform at Neurons Lab from 2022–2025 serving 15+ enterprise client teams concurrently — inference, fine-tuning, experiment tracking, the cost dashboards, the whole stack. He has personally operated vLLM, Triton, Ray and SageMaker deployments under real load, and he personally reviews every ML platform engineer shortlist before it reaches you.
FAQ
Frequently asked questions
Also on Recruo
Roles we hire for
Hire by location
Compared to other agencies
Further reading
Hire in the UK