About the job
About Our Team
At OpenAI, our Hardware organization is dedicated to pioneering silicon and system-level solutions tailored to meet the intricate demands of advanced AI workloads. Our team is at the forefront of developing next-generation AI-native silicon, collaborating closely with software engineers and research partners to co-design hardware that seamlessly integrates with sophisticated AI models. In addition to producing industry-leading silicon for our supercomputing infrastructure, we also create innovative design tools and methodologies that drive advancements and enable hardware specifically optimized for AI applications.
About the Position
We are on the lookout for a talented Networking Operating System Firmware Engineer to spearhead the development and scaling of the switching layer for our AI supercomputers. In this pivotal role, you will be responsible for designing and maintaining custom SONiC NOS images from the ground up, engaging with the Linux kernel, switch ASIC SAI/SDKs, platform drivers, control-plane services, and orchestration layers.
Your expertise will be critical in validating, configuring, and optimizing switch platforms utilized across our high-bandwidth cluster fabric. You will ensure top-notch performance, reliability, and availability while guaranteeing smooth integration with fleet automation. Collaboration with hardware and systems teams, as well as guiding vendors to meet stringent technical standards, will be essential to your success.
This position is based in San Francisco, CA, and follows a hybrid work model, requiring three days in the office each week. We also offer relocation assistance for new team members.
Key Responsibilities:
Design, develop, and maintain custom SONiC NOS images for expansive, cutting-edge AI fabrics.
Integrate and configure Linux kernel components, device drivers, switch ASIC SDKs, and SAI layers efficiently.
Bring up new switch platforms, overseeing thermal/fan control, power monitoring, transceiver management, watchdogs, OSFP CMIS, LEDs, CPLDs, and more.
Extend and customize SONiC services for routing, telemetry, control-plane state, and distributed automation.
Collaborate with hardware teams to validate ASIC configurations, manage link bring-up, SerDes tuning, buffer profiles, and establish performance baselines.
Analyze switch silicon SDK releases, monitor vendor deliverables, and define platform requirements alongside vendors and ASIC partners.
Diagnose complex issues spanning kernel, platform drivers, SONiC dockers, routing agents, orchestration services, hardware signals, and network topology.
Integrate switches into fleet-wide monitoring, remote diagnostics, telemetry pipelines, and automated lifecycle workflows.

