About the job
At Databricks, we are dedicated to empowering data teams to tackle some of the world’s most challenging issues, ranging from security threat detection to breakthroughs in cancer drug development. We achieve this by creating and operating the premier data and AI platform, allowing our clients to concentrate on the high-value challenges central to their missions.
The Mosaic AI division equips organizations to develop AI models and systems utilizing their own data, utilizing technologies that span from fine-tuning large language models (LLMs) for specific enterprise domains to building complex AI systems that incorporate retrieval and agents. We believe that a company's AI models are as valuable as any other intellectual property and that high-quality AI models should be accessible to all.
Job Summary:
Our research team is focused on advancing the boundaries of “domain adaptation” — discovering how to create LLMs and AI systems that excel in specialized domains. We are investigating open research challenges across a variety of themes, including scaling and automating evaluation, fine-tuning with synthetic data, retrieval augmentation, and optimizing inference speed and efficiency.
As a PhD GenAI Research Scientist Intern, you will collaborate with our research team on projects that aim to adapt LLMs and AI systems for enterprise settings. Your tasks may include:
- Enhancing, refining, and assessing methodologies from existing literature.
- Designing novel approaches for effective domain adaptation.
- Combining various methods to formulate innovative strategies for efficient post-training.
- Conducting evaluations of LLMs and AI systems.

