About H:At H, we are at the forefront of advancing superintelligence through agentic AI. Our mission is to automate complex, multi-step tasks traditionally performed by humans, empowering individuals to realize their full potential.We are on the lookout for exceptional AI talent who is committed to building technology that is both innovative and responsibly developed. Our culture encourages openness, continuous learning, and collaboration, ensuring that every team member’s voice is valued.About the Team: The Inference team is dedicated to developing and refining the inference stack that powers our cutting-edge agent technology. We prioritize optimizing hardware utilization to achieve high throughput, low latency, and cost-effectiveness, ensuring a smooth user experience.Key Responsibilities:Build scalable, cost-effective inference pipelines with low latency.Enhance model performance focusing on memory usage, throughput, and latency through advanced techniques, including distributed computing, model compression, quantization, and caching.Create specialized GPU kernels for critical tasks, including attention mechanisms and matrix multiplications.Work closely with H's research teams to optimize model architectures for improved inference efficiency.Review and implement insights from state-of-the-art research papers to enhance memory utilization and reduce latency (e.g., Flash Attention, Paged Attention, Continuous Batching).Prioritize and implement leading-edge inference techniques.Requirements:Technical Skills:Master's or PhD in Computer Science, Machine Learning, or a related field.Proficiency in at least one programming language: Python, Rust, or C/C++.Experience in GPU programming, including CUDA, OpenAI Triton, or Metal.Familiarity with model compression and quantization techniques.Soft Skills:A collaborative mindset, thriving in dynamic and multidisciplinary teams.Excellent communication and presentation skills.A passion for exploring new challenges.Bonuses:Experience in related fields and a passion for AI advancements could be advantageous.
Nov 13, 2025