Featherless AI logoFeatherless AI logo

AI Researcher - Inference Optimization

Featherless AIRemote (world)
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

QualificationsStrong foundation in machine learning, deep learning, or AI systems. Hands-on experience optimizing inference for large-scale models. Proficient in Python and modern ML frameworks (e.g., PyTorch). Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime). Ability to design experiments and communicate results clearly.

About the job

Join us at Featherless AI as an AI Researcher specializing in inference optimization. In this pivotal role, you will design, assess, and implement top-tier inference systems for expansive machine learning models. Your expertise will bridge model architecture, systems engineering, and hardware-aware optimization, focused on enhancing latency, throughput, and cost efficiency in real-world production settings.

Key Responsibilities

  • Investigate and develop strategies to enhance inference performance for substantial neural networks.

  • Boost latency, throughput, memory efficiency, and cost per inference.

  • Design and assess model-level optimizations such as quantization, pruning, KV-cache optimization, and architecture-aware simplifications.

  • Execute systems-level optimizations like dynamic batching, kernel fusion, multi-GPU inference, and prefill versus decode optimization.

  • Conduct benchmarking of inference workloads across diverse hardware accelerators.

  • Collaborate with engineering teams to launch optimized inference pipelines.

  • Transform research findings into production-ready enhancements.

Required Qualifications

  • Extensive background in machine learning, deep learning, or AI systems.

  • Proven experience in optimizing inference for large-scale models.

  • Proficiency in Python and contemporary ML frameworks (e.g., PyTorch).

  • Familiarity with inference tools such as Triton, TensorRT, vLLM, or ONNX Runtime.

  • Ability to design experiments and communicate results effectively.

Preferred Qualifications

  • Experience in deploying production inference systems at scale.

  • Understanding of distributed and multi-GPU inference.

  • Contributions to open-source ML or inference frameworks are a plus.

  • Authorship or co-authorship of peer-reviewed papers in machine learning, systems, or related domains.

About Featherless AI

Featherless AI is at the forefront of artificial intelligence research, focusing on innovative solutions to optimize machine learning systems. We pride ourselves on a collaborative culture that fosters creativity and innovation.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.