Tailoring 0 resumes…

We'll move completed jobs to Ready to Apply automatically.

Distributed Training Engineer at Liquid AI | San Francisco | RoboApply Jobs

Distributed Training Engineer - Member of Technical Staff

Liquid AISan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

QualificationsMust possess a strong understanding of distributed systems, experience with GPU clusters, and a passion for building scalable infrastructure. Familiarity with performance optimization and debugging in complex environments is essential.

About the job

About Liquid AI

Originating from MIT CSAIL, Liquid AI specializes in the development of general-purpose AI systems designed to operate seamlessly across various platforms, including data center accelerators and on-device hardware. Our focus is on delivering low latency, efficient memory usage, privacy, and reliability. We collaborate with organizations in diverse sectors such as consumer electronics, automotive, life sciences, and financial services. As we experience rapid growth, we seek outstanding talent to join our mission.

The Opportunity

The Training Infrastructure team is at the forefront of building the distributed systems that empower our next-generation Liquid Foundation Models. As our operations expand, we aim to innovate, implement, and enhance the infrastructure crucial for large-scale training.

This role is centered around high ownership of training systems, emphasizing runtime, performance, and reliability rather than a typical platform or SRE function. You will collaborate within a small, agile team, creating vital systems from the ground up instead of working with pre-existing infrastructure.

While San Francisco and Boston are preferred, we are open to other locations.

What We're Looking For

We are seeking an individual who:

Embraces the complexity of distributed systems: Our team is dedicated to maintaining stability during extensive training runs, troubleshooting training failures across GPU clusters, and enhancing overall performance.
Is passionate about building: We value team members who take pride in developing robust, efficient, and reliable infrastructure.
Excels in uncertain environments: Our systems are designed to support evolving model architectures. You will be making decisions based on incomplete information and rapidly iterating.
Aligns with team goals and delivers results: The best engineers on our team align with collective priorities while providing data-driven feedback when challenges arise.

The Work

Design and develop core systems that ensure quick and reliable large training runs.
Create scalable distributed training infrastructure for GPU clusters.
Implement and refine parallelism and sharding strategies for evolving architectures.
Optimize distributed efficiency through topology-aware collectives, communication/compute overlap, and straggler mitigation.
Develop data loading systems to eliminate I/O bottlenecks for multimodal datasets.

About Liquid AI

Liquid AI, a pioneering company emerging from MIT CSAIL, is dedicated to crafting versatile AI systems that excel across diverse deployment platforms. Our commitment to innovation and collaboration with leading enterprises positions us as a key player in the AI landscape.

Distributed Training Engineer - Member of Technical Staff

Liquid AISan Francisco

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

About the job

About Liquid AI

The Opportunity

While San Francisco and Boston are preferred, we are open to other locations.

What We're Looking For

We are seeking an individual who:

Embraces the complexity of distributed systems: Our team is dedicated to maintaining stability during extensive training runs, troubleshooting training failures across GPU clusters, and enhancing overall performance.
Is passionate about building: We value team members who take pride in developing robust, efficient, and reliable infrastructure.
Excels in uncertain environments: Our systems are designed to support evolving model architectures. You will be making decisions based on incomplete information and rapidly iterating.
Aligns with team goals and delivers results: The best engineers on our team align with collective priorities while providing data-driven feedback when challenges arise.

The Work

Design and develop core systems that ensure quick and reliable large training runs.
Create scalable distributed training infrastructure for GPU clusters.
Implement and refine parallelism and sharding strategies for evolving architectures.
Optimize distributed efficiency through topology-aware collectives, communication/compute overlap, and straggler mitigation.
Develop data loading systems to eliminate I/O bottlenecks for multimodal datasets.

Distributed Training Engineer - Member of Technical Staff

Unlock Your Potential

Experience Level

Qualifications

About the job

About Liquid AI

The Opportunity

What We're Looking For

The Work

About Liquid AI

Direct Appointment Setter at Southern National Roofing | Columbia, MD

Project Superintendent

Community Support Lead Care Manager at Pacific Health Group | Remote

Physical Therapist at Performance Optimal Health | New Canaan

Part-Time In-Home Veterinarian

Sales Support Specialist at Golden Lighting | Tallahassee, FL

New Home Sales Consultant at LGI Homes | Lebanon, TN

Medical Director - Licensed Psychiatrist

Recruiting Coordinator - Join Our Innovative Team

Experienced Litigation Paralegal - Remote

Senior Director of Digital Communications

Nutritional Cook for Early Childhood Center

FMS Analyst at ACT1 Federal | Patuxent River, MD

Automotive Technician Opportunity at Citrus Kia

Software Security Analyst at TP-Link Systems Inc. | Irvine, California

Network Intrusion Detection Engineer - Active TS/SCI with CI Poly

Tax Associate - Private Client

Lead Behavior Technician - Full-Time Position

Local Roofing Sales Representative - Roof Restoration Specialist

Senior Director of Inventory and Merchandise Planning

Distributed Training Engineer - Member of Technical Staff

Unlock Your Potential

Experience Level

Qualifications

About the job

About Liquid AI

The Opportunity

What We're Looking For

The Work

About Liquid AI

Distributed Training Engineer - Member of Technical Staff

Unlock Your Potential

Experience Level

Qualifications

About the job

About Liquid AI

The Opportunity

What We're Looking For

The Work

About Liquid AI

Distributed Training Engineer - Member of Technical Staff

Unlock Your Potential

Experience Level

Qualifications

About the job

About Liquid AI

The Opportunity

What We're Looking For

The Work

About Liquid AI