About the job
Cerebras Systems is revolutionizing the AI industry by developing the world’s largest AI chip, 56 times the size of traditional GPUs. Our innovative wafer-scale architecture delivers the computational power of dozens of GPUs on a single chip, while simplifying programming to the ease of a single device. This unique approach enables Cerebras to achieve unmatched training and inference speeds, empowering machine learning professionals to seamlessly operate large-scale ML applications without the complexities of managing multiple GPUs or TPUs.
Our esteemed clientele includes leading model labs, global corporations, and pioneering AI-native startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aiming to leverage 750 megawatts of scale to transform critical workloads through ultra high-speed inference.
With our groundbreaking wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, achieving speeds over 10 times faster than GPU-based hyperscale cloud inference services. This unprecedented speed enhances the user experience of AI applications, enabling real-time iterations and increased intelligence through advanced computation capabilities.
About The Role
The AI Infrastructure Operations Engineer (SiteOps) is an entry-level position focusing on the deployment, initialization, monitoring, and first-response troubleshooting of Cerebras AI infrastructure within data center settings. This role plays a critical part in supporting Cerebras systems, cluster server hardware, networking hardware, and monitoring tools.
Your responsibilities will include ensuring the reliable operation and scalability of Cerebras AI clusters by executing established hardware initialization and validation protocols, monitoring telemetry data, performing initial troubleshooting, and escalating issues according to predefined workflows.
