Perplexity logoPerplexity logo

Engineering Manager - AI Inference at Perplexity | San Francisco

PerplexitySan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Manager

Qualifications

Qualifications5+ years of engineering experience, with a minimum of 2 years in a technical leadership or management role. Expertise in programming languages and frameworks such as Python, PyTorch, Rust, and C++. Experience with Kubernetes and cloud-based infrastructure. Strong knowledge of machine learning model deployment and optimization techniques. Excellent problem-solving abilities and effective communication skills.

About the job

About the Role

We are seeking a talented Inference Engineering Manager to spearhead our AI Inference team at Perplexity. This is a remarkable opportunity to design and expand the infrastructure that drives Perplexity's innovative products and APIs, catering to millions of users with cutting-edge AI capabilities.

You will take charge of the technical direction and implementation of our inference systems while cultivating and leading a high-caliber team of inference engineers. Our technology stack encompasses Python, PyTorch, Rust, C++, and Kubernetes. You will play a crucial role in architecting and scaling the large-scale deployment of machine learning models for Perplexity's Comet, Sonar, Search, and Deep Research products.

Why Perplexity?

  • Develop state-of-the-art systems that are among the fastest in the industry using leading-edge technology.

  • Engage in high-impact work within a smaller team, enjoying considerable ownership and autonomy.

  • Seize the chance to create infrastructure from the ground up instead of maintaining outdated systems.

  • Work across the entire spectrum: minimizing costs, scaling traffic, and advancing the capabilities of inference.

  • Make a significant impact on the technical roadmap and team culture at a rapidly expanding company.

Responsibilities

  • Lead and nurture a high-performing team of AI inference engineers.

  • Develop APIs for AI inference utilized by both internal and external clients.

  • Design and scale our inference infrastructure for enhanced reliability and efficiency.

  • Benchmark and resolve bottlenecks across our inference stack.

  • Drive large sparse/MoE model inference at rack scale, including sharding strategies for extensive models.

  • Innovate by developing inference systems that support sparse attention and disaggregated pre-fill/decoding serving.

  • Enhance the reliability and observability of our systems and lead incident response efforts.

  • Make technical decisions regarding batching, throughput, latency, and GPU utilization.

  • Collaborate with ML research teams on model optimization and deployment.

  • Recruit, mentor, and develop engineering talent.

  • Establish team processes, engineering standards, and operational excellence.

Qualifications

  • 5+ years of engineering experience, with at least 2 years in a technical leadership or management capacity.

  • Proficiency in programming languages and tools such as Python, PyTorch, Rust, and C++.

  • Experience with Kubernetes and cloud infrastructure.

  • Strong understanding of machine learning model deployment and optimization.

  • Exceptional problem-solving and communication skills.

About Perplexity

Perplexity is at the forefront of AI technology, committed to delivering state-of-the-art solutions that empower users. With a dynamic and innovative work environment, we prioritize growth, collaboration, and the development of cutting-edge products that drive our success.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.