About the job
Are you ready to redefine the potential of frontier AI models? Join Flower Labs as a founding member of the Flower Frontier Model Team, a pioneering group dedicated to creating groundbreaking models that integrate cutting-edge techniques with Flower’s innovative decentralized learning methods. This position offers a unique opportunity to explore a transformative approach that not only facilitates GPU scaling but also accesses new data silos previously untapped for frontier model training.
Our mission is to develop models with superhuman capabilities across various domains, including science, health, finance, and drug discovery. This is your chance to contribute to the foundational training paradigms that will shape the next decade of AI, working on technologies that will be widely studied and emulated.
About the Role
We welcome all passionate individuals, especially those with post-training expertise, to apply.
As a founding Machine Learning Engineer, you will be vital in constructing state-of-the-art large language models (LLMs) and foundation models within a compact, high-impact team of both research and engineering professionals. This role merges rapid development with disciplined software engineering, enabling you to create a reliable, maintainable, and scalable software stack that produces industry-leading open-source models and integrates them into new Flower Lab products.
Your responsibilities will encompass the design, implementation, and optimization of essential components throughout all stages of frontier model development: from data curation and evaluations to pre-training and post-training. While experience in these areas is advantageous, a strong aptitude for problem-solving and collaborative learning is crucial for success. Familiarity with ML distributed and scaling strategies, as well as experience with GPU clusters for multi-node training, will be essential. You will address GPU and kernel issues, resolve memory and storage bottlenecks, and tackle multi-node failures, collaborating on debugging training instabilities and related challenges.
