DescriptionCiklum is looking for a Automation QA Engineer to join our team full-time in Poland.
We are a custom product engineering company that supports both multinational organizations and scaling startups to solve their most complex business challenges. With a global team of over 4,000 highly skilled developers, consultants, analysts and product owners, we engineer technology that redefines industries and shapes the way people live.
About the role:
As a Automation QA Engineer, become a part of a cross-functional development team engineering experiences of tomorrow.
The Project: We are partnering with B&R Industrial Automation to significantly upgrade the Retrieval-Augmented Generation (RAG) architecture of their AS client's system. Our goal is to drastically reduce AI hallucinations in code generation and optimize retrieval latency without re-architecting their existing platform.
Responsibilities:
- Own the evaluation lifecycle, offline acceptance testing, and KPI measurement for the AS client's RAG pipeline
- Lead the co-creation and management of the project's "golden dataset" to consistently benchmark AI performance
- Implement and manage the RAGAS evaluation harness and automated CI/CD regression testing
- Track, classify, and build root-cause taxonomies for LLM hallucinations, with a specialized focus on code-generation correctness
- Golden Dataset & Baselines: Collaborate with client domain experts and technical leads to build a robust synthetic test set (~90+ queries across multiple categories) and establish baseline metrics for Faithfulness, Context Precision, and Answer Relevance
- Evaluation Harness: Build and automate evaluation pipelines using RAGAS and custom Python scripts, enabling A/B comparisons between the baseline, MVP, and full implementation
- Regression & CI/CD Guardrails: Implement automated CI/CD regression checks within Azure DevOps, ensuring that a >5% drop in core metrics automatically blocks pipeline deployments
- Hallucination Tracking: Develop a root-cause taxonomy for hallucinations and track code-generation queries separately to ensure the AI generates functionally correct and compilable output
- Performance Benchmarking: Measure and monitor pipeline latency, rigorously validating P95 latency targets (sub-4.5s) under representative concurrent load
Requirements:
- Background: Mid-to-Senior level experience in Data Science, Machine Learning Evaluation, AI Quality Assurance, or Data Engineering
- Evaluation Frameworks: Deep, hands-on experience with LLM evaluation frameworks (e.g., RAGAS, DeepEval, TruLens) and establishing human-anchored or synthetic benchmarks
- Technical Stack: Strong proficiency in Python. Solid experience with CI/CD tools (especially Azure DevOps) and integrating complex test suites into automated deployment pipelines
- Data & Observability: Experience working with databases (PostgreSQL) and integrating custom telemetry or observability data (e.g., Azure App Insights) into evaluation reports
- Analytical Mindset: Strong attention to detail with the ability to perform rigorous error analysis, build structured taxonomies for failures, and identify embedding drift
Personal skills:
- Highly collaborative and data-driven; comfortable working directly with client SMEs to validate queries and presenting evaluation scorecards to guide engineering decisions
What`s in it for you?
- Strong community: Work alongside top professionals in a friendly, open-door environment
- Growth focus: Take on large-scale projects with a global impact and expand your expertise
- Tailored learning: Boost your skills with internal events (meetups, conferences, workshops), Udemy access, language courses, and company-paid certifications
- Endless opportunities: Explore diverse domains through internal mobility, finding the best fit to gain hands-on experience with cutting-edge technologies
- Flexibility: Enjoy flexibility – full remote working possibilities
- Care: We’ve got you covered with company-paid medical insurance, mental health support, and financial & legal consultations
About us:
At Ciklum, we are always exploring innovations, empowering each other to achieve more, and engineering solutions that matter. With us, you’ll work with cutting-edge technologies, contribute to impactful projects, and be part of a One Team culture that values collaboration and progress.
With delivery centers in Wrocław and Gdańsk, our 300+ professionals in Poland drive forward-thinking solutions for global clients. Join a community where collaboration sparks innovation—and your impact reaches millions.
Want to learn more about us? Follow us on Instagram, Facebook, LinkedIn.
Explore, empower, engineer with Ciklum!
Interested already? We would love to get to know you! Submit your application. We can’t wait to see you at Ciklum.
#LI-MP1