About the job
LILT is at the forefront of creating a global network of domain experts dedicated to enhancing AI evaluation quality through rigorous training, benchmarking, red-teaming, and continuous model monitoring. We invite customer service and support professionals to lend their expertise to the human-in-the-loop AI evaluation workflows utilized by leading enterprises and hyperscalers.
This position is tailored for individuals who possess a deep understanding of customer support interactions within real operational environments and can leverage that knowledge to assess and enhance multilingual AI systems in customer-facing roles.
Your expertise will have a direct impact on the quality, safety, and deployment readiness of multilingual AI models.
This role features two distinct expert tracks based on experience level and responsibilities.
Track A: Customer Service & Support AI Rater
Raters will perform structured evaluation tasks following well-defined rubrics and instructions.
Responsibilities
- Evaluate AI-generated outputs related to customer service and support interactions.
- Conduct structured scoring, comparison, classification, and judgment tasks.
- Assess AI outputs for accuracy, clarity, tone, helpfulness, and adherence to support best practices.
- Identify hallucinations, misleading responses, policy violations, and unsafe guidance.
- Consistently apply domain-specific customer support guidelines across evaluations.
Ideal Background
- Customer support professionals, service operations specialists, or customer experience practitioners.
- Experience in managing customer inquiries, support workflows, or service escalations.
- Strong attention to detail and comfort with structured evaluation criteria.
Track B: Customer Service & Support AI Evaluator (Senior Track)
Evaluators will provide advanced domain oversight and influence the evaluation processes.
Responsibilities
- Validate and refine evaluation rubrics and handle edge cases.
- Resolve disputes among raters through adjudication.
- Conduct error analyses and qualitative reviews of model behavior.
- Collaborate with LILT's research, product, and engineering teams to enhance evaluation methodologies.

