About the job
About the Role and Team
As a Research Engineer on the ML Systems team at Character. AI, you will engage in pioneering work on advanced machine learning training and inference systems. Your expertise will be instrumental in optimizing the performance and efficiency of our GPU clusters, as well as creating innovative technologies that enhance leading consumer AI models. Your contributions will ensure our systems can handle over 20K queries per second in production utilizing large language models. At Character. AI, your creativity and skills will be pivotal in ushering in an era where AI technology becomes a daily companion rather than just a tool.
What You'll Do
The ML Systems team focuses on researching and deploying systems that maximize GPU usage for AI-driven products.
In this role, you will collaborate across teams and technologies to enhance our training performance and inference runtime, directly influencing the conversational experiences of millions of users daily.
Examples of projects you may work on include:
Crafting efficient Triton kernels and optimizing them for our specific models and hardware
Designing prefix-aware routing algorithms to elevate serving cache hit rates
Training and distilling large language models to reduce latency while maintaining accuracy and engagement
Building a robust and scalable distributed RLHF stack that supports model innovations
Developing systems for efficient multimodal (image and video generation) model training and inference
Who You Are
Holding a PhD (or equivalent) in a relevant field with research experience
Proficient in writing clear and maintainable production system code
Possessing a strong grasp of contemporary machine learning methodologies (including reinforcement learning and transformers)
Demonstrating a proven track record of remarkable research or innovative ML systems projects
Comfortable developing model code (in PyTorch) for training or inference tasks
Nice to Have
Experience in training large models in distributed environments utilizing PyTorch distributed, DeepSpeed, or Megatron
Familiarity with GPUs and collectives (training, serving, debugging) and writing kernels
