About the job
Take ownership of the complete lifecycle for memory features, from initial research to final production. You will enhance models for extraction, updates, consolidation/forgetting, and conflict resolution. Transform customer challenges into actionable research hypotheses, implement and evaluate innovative ideas from academic papers, and collaborate with Engineering to achieve state-of-the-art latency, reliability, and cost. Additionally, you will develop large-scale evaluation strategies (including offline metrics and online A/B testing) and integrate real-world feedback to continuously elevate quality.
Your Responsibilities:
Refine and train models focused on memory extraction, updates, consolidation/forgetting, and conflict resolution; iterate based on data analysis and results.
Engage with research: swiftly prototype concepts from papers, benchmark against standards, and operationalize successful approaches.
Establish large-scale evaluations: create automated relevance, accuracy, and consistency metrics, gold sets, online A/B tests, and user-friendly dashboards.
Collaborate with customers to identify pain points, convert them into research hypotheses, and validate solutions through field trials.
Work with Engineering for deployment: design APIs and data contracts, plan safe rollouts, and maintain state-of-the-art latency, reliability, and cost across operations.
Minimum Qualifications:
Proven experience in RAG or information retrieval (retrieval, ranking, and query understanding) for real-world applications.
Model training and fine-tuning expertise (LLMs/encoders) with a solid background in experimental design and iterative processes.
Proficient in Python, with extensive experience in PyTorch and familiarity with vLLM and modern model serving frameworks.
Experience in evaluation of complex vision-and-language tasks (gold sets, offline metrics, online testing).
Ability to orchestrate data pipelines for running models in production with low-latency service-level agreements (both batch and streaming).
Exceptional communication skills with stakeholders (engineering, product, go-to-market teams, and clients).
Preferred Qualifications:
Publications in reputable venues such as NeurIPS, ICML, or ACL.
