Salary: £60,000 – £80,000
Location: Remote (London)
What if your AI systems didn’t just predict clicks or optimise ads, but decided whether someone was qualified to enter a profession? This is a rare opportunity to build high-impact AI assessment systems that directly influence education, certification, and access to skilled careers. You’ll work at the intersection of LLMs, evaluation science, and real-world decision-making, owning systems where accuracy, fairness, and trust truly matter.
You’ll be joining a UK-based AI company transforming workforce certification and assessment. Their mission is to make education and professional accreditation more accessible, fair, and scalable through intelligent automation.
This is a small, highly capable, and pragmatic team where engineers have real ownership. There’s no heavy bureaucracy—just smart people solving meaningful problems. You’ll have genuine influence over technical direction, architecture, and product outcomes, while working closely with domain experts shaping the future of assessment.
What’s in it for you:
High autonomy and trust in your technical judgement
Work on socially meaningful problems with real-world impact
End-to-end ownership of production AI systems
A collaborative, low-ego engineering culture
Exposure to cutting-edge LLM evaluation, multimodal AI, and human-in-the-loop systems
As an AI Engineer, you’ll design and deploy intelligent auto-assessment systems that evaluate knowledge and skills at scale.
Your work will include:
Building AI-powered assessment pipelines using marking rubrics, example answers, SME feedback, and historical scoring data
Designing robust evaluation frameworks, golden datasets, and regression tests aligned to marking criteria
Experimenting with and optimising LLM workflows, balancing accuracy, latency, and cost
Applying statistical testing to rigorously compare model performance and validate improvements
Developing multimodal workflows that analyse text, images, and video for accurate scoring
Generating clear, actionable feedback for learners, including confidence signals and rationales
Instrumenting systems with observability, tracing, and online evaluation
Designing guardrails such as confidence thresholds and human review pathways
Building APIs and production-ready systems on cloud infrastructure
Helping shape the overall AI architecture and technical direction
You’ll own the full model lifecycle, from data preparation and experimentation through to deployment and continuous improvement.
You don’t need to tick every box, but you’ll likely bring most of the following:
4+ years’ experience in ML or AI engineering, or a relevant PhD with applied industry experience
Proven experience owning end-to-end model lifecycles in production
Strong independence and confidence making technical decisions
Hands-on experience with LLMs and agentic workflows
Excellent Python skills
Experience building APIs
Deep experience designing evaluation frameworks and tuning model performance
Familiarity with experiment tracking, observability, and human-in-the-loop systems
Cloud deployment experience (AWS preferred)
A strong understanding of fairness, auditability, privacy, and system integrity
Comfort working in a small, fast-moving environment
Nice to have: experience in EdTech, assessment systems, NLP, multimodal AI, or automated scoring.
If you’re excited by the idea of building AI systems that make real, high-stakes decisions—and you want the autonomy to shape how those systems are built, this role is well worth exploring. Apply now or get in touch for a confidential conversation to learn more.