Periodic Labs logo

High-Performance Computing Engineer

Periodic LabsMenlo Park
On-site Full-time $350K/yr - $450K/yr

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

QualificationsThe ideal candidate will possess a strong background in high-performance computing and a deep understanding of system architecture. Proficiency in scripting languages (e.g., Python, Bash) and experience with cloud platforms will be beneficial. A Bachelor's degree in Computer Science, Engineering, or a related field is required.

About the job

About Periodic Labs

Periodic Labs is at the forefront of scientific innovation, leveraging artificial intelligence and advanced physical sciences to foster breakthroughs in materials, energy, and much more. As a rapidly growing company backed by leading investors, we operate with the urgency and agility required to tackle the challenges of tomorrow. Our team is characterized by deep expertise and an unrelenting commitment to pushing the boundaries of scientific achievement.

About the Role

In the role of HPC Engineer, you will be pivotal in designing, constructing, and managing high-performance computing infrastructures that drive our AI and scientific research initiatives. Our models necessitate extensive computational power , involving large GPU and CPU clusters, high-speed interconnects, low-latency parallel storage, and sophisticated workload schedulers to maximize efficiency. You will collaborate closely with researchers and infrastructure engineers to ensure that our computing environment is both fast and reliable, optimized for cutting-edge scientific discovery.

This hands-on position will require you to architect and fine-tune systems, automate provisioning, troubleshoot performance bottlenecks, and design resilient solutions at scale. You will work alongside research and machine learning teams to understand their computational needs and create an HPC environment that enhances productivity and accelerates scientific progress.

What You’ll Do

  • Design, deploy, and manage large-scale GPU and CPU clusters tailored for AI training, scientific simulation, and research workloads

  • Optimize high-speed interconnect fabrics (InfiniBand, RoCE) and parallel filesystems (Lustre, GPFS, WEKA, or similar) for peak performance

  • Oversee workload scheduling and resource management using Slurm, Kubernetes, or comparable systems, focusing on throughput and fairness

  • Implement and maintain automated cluster provisioning and configuration management using tools like Ansible and Terraform

  • Monitor cluster health and performance; develop dashboards and alerts to proactively address issues

  • Collaborate with research and ML teams to analyze workloads, resolve performance challenges, and optimize systems

  • Create and manage backup, disaster recovery, and fault-tolerance strategies for critical research data and computational infrastructure

  • Assess and integrate new hardware, including GPUs and accelerators, to enhance computational capabilities

About Periodic Labs

Periodic Labs is an innovative company dedicated to reshaping the landscape of scientific research through the power of artificial intelligence and advanced computational techniques. We are committed to creating an environment where groundbreaking discoveries can flourish, backed by a team of passionate experts.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.