Cerebras Systems logoCerebras Systems logo

AI Infrastructure Operations Engineer at Cerebras Systems | Sunnyvale, CA / Toronto, Canada

Cerebras SystemsSunnyvale CA or Toronto Canada
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

Key Qualifications:Proven experience in managing Linux-based systems. Expertise in containerization technologies such as Docker and Kubernetes. Strong troubleshooting skills for complex distributed systems. Proven track record in optimizing compute clusters and ensuring high availability. Excellent communication and collaboration skills.

About the job

Cerebras Systems revolutionizes the AI landscape with the creation of the world’s largest AI chip, a remarkable 56 times larger than conventional GPUs. Our innovative wafer-scale architecture delivers the computational power of numerous GPUs on a single chip, simplifying programming efforts for users. This unique approach enables Cerebras to achieve unparalleled training and inference speeds, empowering machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing hundreds of GPUs or TPUs.

Our clientele includes leading model laboratories, global enterprises, and pioneering AI-native startups. Notably, OpenAI recently announced a multi-year partnership with Cerebras to deploy 750 megawatts of scale, significantly enhancing key workloads with ultra-high-speed inference.

Thanks to our groundbreaking wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, exceeding the performance of GPU-based hyperscale cloud inference services by over ten times. This significant speed enhancement transforms the user experience of AI applications, facilitating real-time iterations and augmented intelligence through additional agentic computation.

About The Role

We are on the lookout for a highly skilled and experienced AI Infrastructure Operations Engineer to oversee and manage our state-of-the-art machine learning compute clusters. In this role, you will have the unique opportunity to work with the world’s largest computer chip, the Wafer-Scale Engine (WSE), and the systems that leverage its extraordinary power.

You will play a pivotal role in ensuring the health, performance, and availability of our infrastructure, maximizing compute capacity, and supporting our expanding AI initiatives. This position requires an in-depth understanding of Linux-based systems, expertise in containerization technologies, and experience in monitoring and troubleshooting complex distributed systems. The ideal candidate is a proactive problem-solver with a strong background in large-scale compute infrastructure who is reliable and committed to customer success.

About Cerebras Systems

Cerebras Systems is at the forefront of AI development, creating the largest AI chip in the world that offers unmatched performance and simplicity. Our innovative technology is reshaping how organizations approach machine learning and inference, enabling them to achieve remarkable efficiencies and capabilities.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.