About the job
About Mentis
At Mentis, we specialize in developing evaluation datasets and reinforcement learning environments that ensure AI reliability in high-stakes sectors such as finance, healthcare, and legal. Our dedicated team curates expert training data, establishes precise rubrics, and creates verifiable task environments for AI laboratories and startups that aim to push the boundaries of model capabilities within regulated industries.
As a dynamic, close-knit team based in London, we pride ourselves on our ability to adapt quickly and tackle challenges with seriousness. Every team member plays a direct role in our projects, and we value initiative and ownership as key components of our work culture. If you're interested in influencing how cutting-edge AI learns to function in real-world scenarios, we would love to hear from you.
About the Role
As a Member of the Technical Staff within our Applied AI team, your primary focus will be on creating the tasks and environments utilized by AI labs to train and assess their agents across finance, healthcare, and legal domains.
Your daily responsibilities will include: designing reinforcement learning environments around documents, spreadsheets, and professional workflows; developing verification logic and reward functions; collaborating with domain experts to define what constitutes a correct response in various scenarios, such as an LBO model or clinical notes. This role will involve both engineering and research activities, with the common goal of producing the ground truth against which advanced models are evaluated.
What You'll Do
Develop reinforcement learning environments across finance, healthcare, and legal sectors.
Contribute to the design of tasks featuring definitive answers, calibrated rubrics, and programmatic reward signals.
Implement verification logic and reward functions to differentiate between effective and ineffective model outputs.
Engage directly with domain experts (investment analysts, physicians, attorneys) to convert intricate professional workflows into structured tasks.
Innovate new methods for evaluation, verification, and synthetic data creation.
Who We're Looking For
Hands-on experience with large language models (LLMs), including prompting, evaluation, and agent integration. You’ve developed systems that function effectively, not just theoretical models.
Demonstrated initiative and technical proficiency. You proactively identify needs, determine solutions, and execute them without waiting for directives.
Adaptability to work in diverse contexts. Your role will frequently involve shifting between engineering, evaluation design, and collaborative work with domain experts.
A commitment to delivering and refining outputs. In our small team, we prioritize action over prolonged review processes.

