About the role
Antimetal is seeking a Platform Engineer specializing in Applied Evaluations to innovate and implement quality standards for our advanced agentic systems that drive our investigation and automation engine.
This pivotal role will empower you to manage both online and offline evaluation pipelines, processing petabytes of infrastructure data, while refining agent platform abstractions to guarantee that our agents are quantifiable, debuggable, and dependable. You will collaborate closely with teams across platform, product, and research, utilizing quality signals to expedite company-wide iterations.
About Antimetal
Antimetal is at the forefront of developing the next generation of infrastructure management. We are dedicated to creating a platform that investigates, resolves, and prevents issues, allowing engineers to reclaim their time and focus on delivering outstanding products.
Your Responsibilities:
Manage the evaluation ecosystem: Develop online and offline evaluation pipelines that assess agent quality across transient, large-scale MELT data, code, and unstructured documents. Establish the key metrics that define user experience.
Define quality at scale: Address production incidents that span numerous services, ephemeral, high-volume, with approximate ground truth. Create evaluations that track trajectory quality, not merely final results, and ensure your metrics effectively predict real-world outcomes.
Architect platform abstractions for agents: Design foundational agent architectures and enhance internal frameworks (e.g. sub-agents, MCPs, middleware) that empower product, platform, and research teams to iterate confidently and accelerate deployment.
Production readiness: Take ownership of system latency, observability, and uptime.
Your Qualifications:
A minimum of 3 years of experience in ML platform engineering, data engineering, or a similar role, ideally in a high-growth environment.
Experience in designing evaluation systems where ground truth is noisy, high-volume, and challenging to label (e.g., in computer vision or deep research pipelines).
Solid system design capabilities: you understand data flow in distributed systems and the compounding effects of decisions at scale.
Demonstrated proficiency in writing clean, scalable code and a commitment to quality.
