Machine Learning Engineer - Multimodal Foundation Models
The Bot Company
Full-time|On-site|San Francisco The Bot CompanyAt The Bot Company, we are on a mission to create an innovative robotic assistant for every household.Our dynamic team, composed of talented engineers, designers, and operators, is based in San Francisco. We have a rich background from industry leaders such as Tesla, Cruise, OpenAI, Google, and Pixar, and we have successfully delivered products to hundreds of millions of users, honing our ability to create exceptional products and experiences.We pride ourselves on maintaining a streamlined team structure that fosters swift decision-making and minimizes bureaucracy. Each member is considered an Individual Contributor, granted substantial autonomy, ownership, and accountability. Our culture enables us to work across the technology stack with an emphasis on rapid iteration and execution.What We Seek in CandidatesCandidates for all positions at The Bot Company must exhibit remarkable sharpness and the capacity to thrive in high-pressure environments. We expect candidates to showcase:Exceptional Cognitive Abilities: You possess quick thinking, instant learning capabilities, and the ability to reason across diverse domains.Engineering Curiosity: You demonstrate an innate desire to understand how systems function, even beyond your area of expertise.Performance-Driven Attitude: You excel in fast-paced settings, effectively navigate ambiguity, and thrive under demanding circumstances.Machine Learning: Multimodal Foundation ModelsWe are developing unified foundation models capable of reasoning across text, images, video, and kinematics to inform intelligent robotic behaviors.You will engage with large-scale multimodal networks, overseeing the complete process from data handling to model training and deployment.Your ResponsibilitiesConstruct Native Multimodal Policies: Create architectures where vision, language, and other modalities are represented in a unified manner.Enhance Cross-Modal Reasoning: Explore and implement strategies to ensure that the model not only correlates modalities but also comprehends them (e.g., linking visual physics to kinematic constraints).Manage the Training Loop from Start to Finish: Design, execute, troubleshoot, and refine large-scale training experiments; identify failure points, enhance data mixtures, and tighten evaluations to achieve measurable improvements.Deploy and Refine Real Systems: Integrate models into practical robotic frameworks, enhance robot code for model deployment, and optimize performance for edge inference.
Feb 25, 2026