PurposeNVIDIA, in collaboration with Deutsche Telekom, is pioneering an industrial AI cloud platform for Europe. This advanced AI factory, situated in Germany, will be equipped with 10,000 GPUs utilizing NVIDIA DGX B200 systems and RTX Pro Servers. Deutsche Telekom is committed to providing a secure, sovereign, and rapid infrastructure, encompassing data centers, operational support, security, and innovative AI solutions.As a Senior DevOps Engineer (Principal) and AI Consultant, you will play a pivotal role in guiding enterprise clients through the onboarding process, delivering training, and facilitating the early adoption of our AI platform. Your expertise will be crucial in understanding and addressing customer requirements, supporting solution design initiatives, executing Proofs of Concept (PoCs), and ensuring the seamless integration of customer workloads, including Large Language Models (LLMs), GPU computing, and AI workflows.In this influential position, you will collaborate closely with clients to comprehend their unique technical and business needs, aid in designing customized architectures, and lead PoCs that demonstrate tangible real-world value. You will ensure the efficient integration of complex workloads—from LLM deployment and GPU performance optimization to constructing end-to-end AI/ML pipelines. As a trusted technical advisor, you will empower clients to maximize the utilization of their GPU clusters and AI toolchains, troubleshoot challenges, and adopt best practices that accelerate their AI journey.Key Responsibilities:Provide expert consultation on all technical aspects related to GPU infrastructure, AI/ML model training, and platform utilization.Lead onboarding and training sessions, mentoring client specialists on optimal use of their GPU clusters and AI environments.Design and implement PoCs, including environment setups, data processing pipelines, and deployment workflows.Conduct requirements engineering, translating business needs into technical specifications.Assist clients with performance optimization, troubleshooting, fine-tuning, and validating delivered solutions.Serve as the primary technical liaison, coordinating cross-functional teams across infrastructure, networking, automation, security, and AI services.Propose and develop automation concepts to enhance service delivery, processes, and operational models.Ensure best practices in reliability, scalability, responsible AI, and security are applied throughout the customer lifecycle.Support monitoring, observability, and capacity planning for AI workloads.
Mar 12, 2026