Qualifications
Candidates should possess a minimum of 5 years of experience in large-scale Linux systems engineering, specifically in HPC, AI, or distributed infrastructure roles. Extensive knowledge in Linux system installation, performance tuning, and troubleshooting is essential, along with expertise in diagnosing distributed GPU workloads. A deep understanding of GPU optimization and performance enhancement is required, as well as proficiency in Python scripting and automation frameworks. Familiarity with NVIDIA technologies such as NCCL, GPUDirect RDMA, and NVLink is highly desirable. Experience with configuration management tools (e.g., Salt, Ansible, Puppet, Chef) and the ability to troubleshoot complex system issues at the hardware, OS, and network levels are essential. Strong communication and organizational skills are crucial for collaboration across diverse technical teams, and a passion for high-impact work in a fast-paced environment is a must.
About the job
Hudson River Trading (HRT) is seeking a talented Senior GPU Systems Engineer to join our dynamic Research and Development team. In this pivotal role, you will play a crucial part in scaling and advancing our sophisticated HPC/AI research infrastructure. Collaborating with a diverse team of experts, you will ensure our trading and research operations run seamlessly 24/7 across the globe. Your responsibilities will include the design and optimization of extensive GPU compute clusters, identifying and resolving performance bottlenecks, and automating system deployment and monitoring across thousands of nodes. This is a high-impact opportunity that requires innovative thinking and technical expertise in a fast-paced environment.
About Hudson River Trading (HRT)
Hudson River Trading (HRT) is a leading quantitative trading firm that leverages the power of technology and data to drive trading strategies in various financial markets. Our innovative approach combines cutting-edge research with advanced technology to create an environment where creativity and technical excellence thrive. We are committed to fostering a collaborative culture that empowers our team members to push boundaries and redefine what's possible in high-performance computing and artificial intelligence.