About the job
At Crusoe, our mission is to accelerate the availability of energy and intelligence. We are developing the systems that empower a world where individuals can ambitiously leverage AI without compromising on scale, speed, or sustainability.
Join us at the forefront of the AI revolution with sustainable technology. Here, you will have the opportunity to drive significant innovation, make a real difference, and be part of a team that is leading the way in responsible, transformative cloud infrastructure.
About This Role:
As a Senior Cloud Support Engineer at Crusoe Cloud, you will be pivotal in transforming high-performance computing with our sustainable, low-cost GPU compute solutions. Your role will involve empowering our clients to harness this technology for revolutionary advancements across various fields, including AI/ML, physics simulations, and computational biology. You will serve as the primary point of contact for technical support, ensuring that our customers can effectively utilize Crusoe Cloud to meet their objectives. This position directly contributes to Crusoe's mission by enabling clients to expedite their research and development efforts, furthering a sustainable future. You will engage in exciting projects, work with cutting-edge technologies, and collaborate with a talented team to tackle complex challenges. We are looking for a highly motivated and experienced technical professional who is passionate about customer success, possesses a profound understanding of cloud technologies, and aligns with Crusoe's values. This is a full-time position.
What You’ll Be Working On:
Customer Support: Deliver exceptional technical support to customers via Zendesk while consistently meeting SLAs and maintaining a high customer satisfaction score (CSAT of 95%+).
On-Call Rotation: Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues.
Troubleshooting: Diagnose and resolve issues related to virtual machines (VMs), hardware failures, and scaling tests using command-line interface (CLI) and internal tools.
Alert Triage and Maintenance: Manage alert triage, prepare for maintenance windows, and conduct node delivery testing.
Collaboration: Collaborate closely with Site Reliability Engineering (SRE), Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery.
Global Teamwork: Adhere to global team collaboration and handoff processes for ticketing and on-call procedures.
Knowledge Sharing: Develop onboarding and training materials, knowledge base documentation, and standard operating procedures.
