About the job
Senior Machine Learning Engineer - Model Evaluations for the Public Sector
The Public Sector Machine Learning team at Scale AI pioneers the deployment of cutting-edge AI systems, including Large Language Models (LLMs), agentic models, and comprehensive multimodal pipelines, within critical government operations. We establish robust evaluation frameworks that ensure these models function reliably, safely, and effectively in real-world scenarios. As a Senior Machine Learning Engineer, you will architect, implement, and enhance automated evaluation pipelines that empower our clients to trust and effectively utilize advanced AI systems in defense, intelligence, and federal missions.
Your Responsibilities Include:
- Creating and maintaining automated evaluation pipelines for machine learning models, focusing on functional, performance, robustness, and safety metrics, including evaluations based on LLM judges.
- Designing test datasets and benchmarks to assess generalization, bias, explainability, and potential failure modes.
- Building evaluation frameworks for LLM agents, which includes the infrastructure for scenario-based and environment-based testing.
- Conducting comparative analyses of model architectures, training procedures, and evaluation results.
- Implementing tools for continuous monitoring, regression testing, and quality assurance of machine learning systems.
- Designing and executing stress tests and red-teaming workflows to identify vulnerabilities and edge cases.
- Collaborating with operations teams and subject matter experts to generate high-quality evaluation datasets.
This position requires an active security clearance or the ability to obtain one.
