About the role
Join the Nebius Revolution
Nebius is at the forefront of a transformative era in cloud computing, dedicated to powering the global AI economy. Our mission is to provide our clients with the essential tools and resources they need to tackle real-world challenges and revolutionize their industries, all while avoiding exorbitant infrastructure costs and the necessity of assembling large in-house AI/ML teams. At Nebius, you will work alongside some of the most talented and innovative leaders and engineers in the cutting-edge field of AI cloud infrastructure.
A Global Footprint
Headquartered in Amsterdam and publicly listed on Nasdaq, Nebius boasts a diverse and expansive presence, with R&D hubs across Europe, North America, and Israel. Our team comprises over 1,400 employees, including more than 400 highly skilled engineers specializing in both hardware and software engineering, complemented by a dedicated in-house AI R&D team.
Your Role
As a Senior Hardware Support Engineer, you will be accountable for ensuring the reliability of production hardware within large-scale, mission-critical data center environments. This role uniquely blends hardware engineering, operational excellence, and vendor collaboration to guarantee fleet stability, prompt root cause analysis, and ongoing enhancement of server and platform reliability.
You will serve as a senior escalation point for intricate hardware and firmware challenges impacting production systems, spearheading investigations that progress from identifying symptoms to determining root causes, while coordinating resolutions with engineering teams, vendors, and on-site personnel. This position demands exceptional analytical skills, a structured problem-solving approach, and extensive hardware expertise within high-density, performance-critical infrastructure settings.
Key Responsibilities
- Lead root cause investigations for complex hardware and firmware failures affecting production fleets.
- Aggregate recurring problems and error patterns to pinpoint systemic reliability challenges.
- Act as the senior escalation point for hardware-related incidents that impact system availability or performance.
