Nebius logoNebius logo

Senior Site Reliability Engineer at Nebius | Amsterdam

NebiusAmsterdam, Netherlands
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Senior

Qualifications

Required Qualifications:Proficiency in Linux systems with strong skills in Python and Bash scripting. Ability to troubleshoot complex issues across hardware, software, and networking. Analytical mindset with a focus on optimizing system performance. Fluency in English. Preferred Qualifications:Interest in backend development. Experience in high-load distributed systems design and management.

About the job

Why Join Nebius?
Nebius is at the forefront of a transformative era in cloud computing, designed to empower the global AI economy. We provide innovative tools and resources that enable our clients to tackle real-world challenges and revolutionize industries, all while minimizing infrastructure costs and eliminating the necessity for extensive in-house AI/ML teams. Our workforce operates at the cutting edge of AI cloud infrastructure, collaborating with some of the industry’s most experienced and pioneering leaders and engineers.

Where We Operate
Based in Amsterdam and publicly listed on Nasdaq, Nebius boasts a worldwide presence with research and development hubs in Europe, North America, and Israel. Our team of over 1,400 professionals includes more than 400 highly skilled engineers, proficient in both hardware and software engineering, alongside a dedicated in-house AI research and development team.

The Role

Nebius is seeking a talented Senior Site Reliability Engineer to join our Hardware Infrastructure team. You will have the opportunity to work from our vibrant office in Amsterdam.

The Hardware Infrastructure team is responsible for designing, developing, and maintaining systems integral to the data center lifecycle:

  • Functional and load testing systems.
  • Monitoring engineering equipment in our data centers (power supply, air and water cooling, etc.).
  • Monitoring IT assets: racks, servers, JBODs, JBOGs, power shelves, network devices, etc.
  • Asset management and tracking.
  • Tracking hardware repair tasks.
  • Server production oversight.

Your Responsibilities Will Include:

  • Ensuring fault tolerance, scalability, and uninterrupted service operation.
  • Utilizing state-of-the-art technologies to address various infrastructure challenges.
  • Implementing and refining CI/CD processes.

We Expect You to Have:

  • Expertise in Linux systems, alongside proficiency in Python and Bash scripting for automation.
  • A proven track record of troubleshooting complex system issues, encompassing hardware, software, and networking.
  • Strong analytical skills and adept problem-solving capabilities, aimed at optimizing system performance.
  • Proficiency in English.

Bonus Skills:

  • An interest in backend development.
  • Experience in designing, developing, and managing high-load distributed systems.

About Nebius

Nebius is revolutionizing cloud computing to support the global AI economy, offering affordable solutions that streamline operations and reduce infrastructure costs. With a diverse team of over 1,400 professionals, including a robust in-house AI R&D team, we are committed to innovation and excellence in technology.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.