Research Engineer Entry Level At Genmo San Francisco jobs in San Francisco – Browse 11,542 openings on RoboApply Jobs

Research Engineer Entry Level At Genmo San Francisco jobs in San Francisco

Open roles matching “Research Engineer Entry Level At Genmo San Francisco” with location signals for San Francisco. 11,542 active listings on RoboApply Jobs.

11,542 jobs found

1 - 20 of 11,542 Jobs
Apply
Genmo logo
Full-time|On-site|San Francisco HQ

At Genmo, we are pioneers in developing cutting-edge models for video generation, aiming to unlock the full potential of Artificial General Intelligence (AGI). Join our innovative team and play a vital role in redefining the landscape of AI technology.About the RoleWe are on the lookout for a talented Research Engineer to enhance our research team dedicated …

Feb 22, 2026
Apply
Genmo logo
Full-time|On-site|San Francisco HQ

Genmo is a pioneering research laboratory dedicated to advancing cutting-edge models for video generation, with the mission of unlocking the creative potential of Artificial General Intelligence (AGI). We invite you to be a part of our innovative team, where you can contribute to shaping the future of AI and expanding the horizons of video generation technology.Role Overview:We are on the lookout for a talented Research Scientist to join our dynamic team, specializing in alignment and post-training methodologies for large-scale video generation models. In this pivotal role, you will be instrumental in ensuring our diffusion-based video models consistently deliver high-quality, physically accurate, and safe outputs that align with human values and preferences.Key Responsibilities:Lead groundbreaking research initiatives in alignment and post-training strategies for video generation models, prioritizing enhanced quality, reliability, and alignment with human intent.Design and implement supervised fine-tuning and reinforcement learning from human feedback (RLHF) pipelines for video generation models.Establish robust evaluation frameworks to assess model alignment, safety, and output quality.Create and optimize data collection pipelines for capturing human feedback and preferences.Conduct experiments to validate alignment techniques and their scalability.Collaborate with cross-functional teams to incorporate alignment enhancements into our production workflow.Stay abreast of the latest developments by reviewing academic literature in generative AI and alignment.Mentor junior researchers and promote a culture of responsible AI development.Partner closely with product teams to ensure that alignment methods enhance model capabilities.Qualifications:Ph.D. in Computer Science, Artificial Intelligence, Machine Learning, or a closely related field.Demonstrated excellence with a strong publication record in top-tier conferences (e.g., NeurIPS, ICML, ICLR) focusing on reinforcement learning, alignment, or generative models.Extensive experience in implementing and optimizing large-scale training pipelines utilizing PyTorch.In-depth understanding of reinforcement learning techniques, especially RLHF.Proficient in distributed training systems and conducting large-scale experiments.Proven ability to design and implement robust evaluation strategies for models.

Feb 22, 2026
Apply
Genmo logo
Full-time|On-site|San Francisco HQ

At Genmo, we are pioneering advancements in video generation technology through our state-of-the-art research lab. Our mission is to develop open models that contribute to the evolution of Artificial General Intelligence (AGI). Join us as we redefine the capabilities of AI and explore the vast potential of video generation.Role Overview:We are on the lookout for an outstanding Research Scientist specializing in diffusion models to be a part of our innovative team. Your primary focus will be on creating advanced diffusion models aimed at transforming text into captivating video content. This role places you at the cutting edge of AI research, where you will devise new architectures and algorithms to generate visually appealing and coherent videos from textual descriptions.

Feb 22, 2026
Apply
Mercor logo
Full-time|On-site|San Francisco

About MercorMercor sits at the forefront of labor markets and artificial intelligence research, collaborating with premier AI laboratories and enterprises to harness the human intelligence crucial for AI evolution.Our expansive talent network empowers the training of cutting-edge AI models, akin to how educators impart knowledge to students—sharing insights, experiences, and contexts that transcend mere code. Currently, our network comprises over 30,000 experts, generating collective earnings exceeding $2 million daily.At Mercor, we are pioneering a unique category of work where expertise fuels AI progress. Realizing this vision necessitates a bold, fast-paced, and deeply dedicated team. You will collaborate with researchers, operators, and AI firms that are at the vanguard of transforming systems that redefine society.As a profitable Series C company, Mercor is valued at $10 billion and maintains an in-office presence five days a week at our new headquarters in San Francisco.About the RoleIn your capacity as a Research Engineer at Mercor, you will operate at the intersection of engineering and applied AI research. You will play a pivotal role in post-training and Reinforcement Learning from Human Feedback (RLVR), synthetic data generation, and large-scale evaluation workflows essential for advancing frontier language models.Your contributions will help train large language models to adeptly utilize tools, exhibit agentic behavior, and engage in real-world reasoning within production environments. You will be instrumental in shaping rewards, conducting post-training experiments, and constructing scalable systems to enhance model performance. Your responsibilities will also include designing and evaluating datasets, creating scalable data augmentation pipelines, and developing rubrics and evaluators that expand the learning potential of LLMs.

Dec 29, 2025
Apply
HUD logo
Full-time|On-site|San Francisco

HUD builds infrastructure for generating and evaluating reinforcement learning (RL) training data for advanced AI agents. The team is also developing a marketplace to connect leading labs with high-quality training data. HUD's platform serves frontier labs, Fortune 500 companies, and startups. The company is backed by $15M in funding from top venture capital firms and is part of Y Combinator's W25 cohort. Role overview HUD is seeking Research Engineers in San Francisco to help strengthen quality assurance for training data produced by partner organizations. This position centers on building systems that maintain and improve data quality as demand increases. What you will do Set and uphold quality standards for training datasets. Develop tools and workflows for auditing datasets from suppliers, including sampling methods, validation pipelines (using rules and models), and feedback systems. Assess and refine human-in-the-loop review processes to support quality assurance. Collaborate with data vendors to resolve quality issues, share insights, and encourage better data generation practices. Integrate QA findings into internal tools and the data vendor portal to reduce anomalies, inconsistencies, and edge cases. Requirements Strong skills in Python, Docker, and Linux environments. Experience working with large datasets. Ability to learn quickly and adapt in technical contexts, such as programming competitions. Background in early-stage tech startups and ability to work independently. Familiarity with modern AI tools and large language models (LLMs). Clear communication skills for collaborating remotely across time zones. Preferred qualifications Understanding of common issues in training data. Background in building data validation pipelines or human-in-the-loop review systems. Strong attention to detail, with the ability to identify subtle data inconsistencies or edge cases. Experience designing metrics, experiments, and QA processes, not just executing them.

Apr 24, 2026
Apply
magic.dev logo
Full-time|$225K/yr - $550K/yr|On-site|San Francisco

At magic.dev, we are committed to advancing humanity by developing safe artificial general intelligence (AGI) that tackles the world's most pressing challenges. Our unique approach focuses on automating research and code generation to enhance model performance and alignment more effectively than traditional methods. By leveraging cutting-edge pre-training, domain-specific reinforcement learning, ultra-long context processing, and efficient inference-time computation, we aim to redefine the capabilities of AGI.Role OverviewAs a Research Engineer, you will play a pivotal role in training, evaluating, and deploying large-scale AI models alongside innovative inference-time computing methods. You will contribute to the creation of extensive internet-scale datasets and support the prototyping of groundbreaking research and product initiatives.Key ResponsibilitiesEnhance inference throughput for cutting-edge model architecturesDevelop and refine frameworks that underpin our research and production processesTrain trillion-parameter models using large GPU clustersCurate post-training datasets to bolster specific capabilitiesConstruct internet-scale data pipelines and web crawlersDesign, prototype, and optimize innovative model architecturesContribute to cutting-edge research in long-context, inference-time computation, reinforcement learning, and additional domainsQualificationsProven software engineering expertiseIn-depth understanding of deep learning literatureExperience with both pre-training and post-training of large language models (LLMs)Strong capability to generate and assess research ideasFamiliarity with large distributed systemsProficient in managing substantial ETL workloadsCompensation and BenefitsAnnual salary ranging from $225,000 to $550,000 based on experienceEquity is a significant component of total compensation401(k) plan with a 6% salary matchComprehensive health, dental, and vision insurance for you and your dependentsUnlimited paid time offVisa sponsorship and relocation assistance availableBe part of a small, dynamic, and focused team

Jan 24, 2024
Apply
Center for AI Safety logo
Full-time|On-site|San Francisco, CA

The Center for AI Safety (CAIS) is at the forefront of research and advocacy dedicated to addressing the societal-scale challenges posed by artificial intelligence. Our mission is to mitigate the risks associated with AI through innovative technical research, initiatives to foster the field, and strategic policy engagement. Together with our sister organization, the Center for AI Safety Action Fund, we tackle some of the most pressing issues in AI today. In the role of Senior Research Engineer, you will immerse yourself in the dynamic intersection of pioneering machine learning research and dependable engineering practices. You will own research projects from inception to publication, working autonomously with guidance from an advisor. Your responsibilities include designing and conducting experiments on large language models, developing the necessary tools for large-scale model training and evaluation, and transforming findings into research publications. You will collaborate closely with CAIS researchers, as well as external academic and commercial partners, utilizing our compute cluster for extensive training and evaluation. Your work will cover critical areas such as AI honesty, robustness, transparency, and the investigation of trojan/backdoor behaviors, all aimed at reducing the real-world risks posed by advanced AI systems.

Mar 31, 2026
Apply
Resolve AI logo
Full-time|On-site|San Francisco

About Resolve AIAt Resolve AI, we are redefining the role of software maintenance and production troubleshooting by creating a revolutionary, fully autonomous AI Production Engineer. Our technology is designed to diagnose and resolve intricate system issues from start to finish.Founded by industry leaders Spiros Xanthos and Mayank Agarwal, who are the masterminds behind OpenTelemetry and have previously spearheaded initiatives at Splunk Observability, our team boasts two successful exits to Splunk and VMware.Having successfully secured over $150M in funding from prestigious investors like Lightspeed, Greylock, and Unusual Ventures, alongside notable individuals such as Jeff Dean (Chief Scientist, Google DeepMind) and Fei-Fei Li (Professor, Stanford), we are well-positioned for growth.Joining Resolve AI now presents a unique opportunity to be part of an AI-driven company that is at the forefront of transforming engineering workflows.

Sep 9, 2024
Apply
Preference Model logo
Full-time|On-site|San Francisco

Preference Model creates new types of training data to help artificial intelligence systems improve beyond their current limits. The team specializes in building reinforcement learning environments that test both research and engineering abilities, giving models the chance to learn from realistic feedback. Founded by former members of Anthropic’s data division, Preference Model draws on experience building data infrastructure, tokenizers, and datasets for Claude. The company partners with top AI labs and is backed by a16z. Role overview This entry-level machine learning engineer position is based in San Francisco and is intended for recent graduates. The focus is on building and maintaining the infrastructure that powers Preference Model’s reinforcement learning training pipeline. The team is small, so each engineer takes responsibility for their projects. Deep production experience is not required, but strong technical fundamentals, curiosity about reinforcement learning, and the ability to learn quickly are essential. What you will do Develop and scale distributed training systems with PyTorch Design automation for monitoring, debugging, and recovery during large-scale training runs Collaborate with researchers to turn RL training experiments into dependable infrastructure Enhance performance and reliability for GPU and TPU workloads Requirements Recent graduate (BS, MS, or PhD) in Computer Science, Machine Learning, or a related field Interest in reinforcement learning and AI infrastructure

Apr 21, 2026
Apply
Center for AI Safety (CAIS) logo
Full-time|On-site|San Francisco, CA

Join the Center for AI Safety (CAIS), a premier research and advocacy institution dedicated to minimizing large-scale societal risks associated with artificial intelligence. We tackle the most pressing challenges in AI through innovative technical research, community-building initiatives, and active policy engagement, alongside our sister organization, the Center for AI Safety Action Fund.As a Research Engineer, you will operate at the forefront of advanced machine learning research and dependable engineering practices. Your role will involve designing and executing experiments on large language models, developing the necessary tools for extensive model training and evaluation, and translating findings into publishable research. You will work collaboratively with CAIS researchers and external academic and commercial partners, utilizing our compute cluster to conduct large-scale training and evaluations. Your work will focus on critical areas such as AI honesty, robustness, transparency, and the identification of trojan/backdoor behaviors, all aimed at mitigating real-world risks posed by sophisticated AI systems.

Oct 7, 2022
Apply
liquid-ai logo
Full-time|Remote|San Francisco

About Liquid LabsAt Liquid AI, research has always been at the forefront of our mission. Liquid Labs serves as a dedicated internal research accelerator, facilitating groundbreaking advancements in the development of intelligent, personalized, and adaptive machines.Our roots extend back to MIT CSAIL, where pioneering work on Liquid Neural Networks established a new category of efficient sequence-processing architectures. This research laid the groundwork for our Liquid Foundation Models (LFMs), which are scalable, multimodal models designed for real-world applications in resource-constrained settings.In Liquid Labs, we continue this legacy by advancing the realm of efficient, adaptive intelligence through both fundamental research and practical engineering efforts.We collaborate closely with Liquid’s core foundation model and systems teams to turn theoretical concepts into deployable capabilities, setting the stage for a new era of powerful and efficient intelligent systems.About The Role:As a Research Engineer at Liquid Labs, you will be part of a dynamic, high-impact team pushing the boundaries of adaptive intelligence. You will be responsible for designing and implementing innovative architectures, training methodologies, and inference strategies to expand the potential of efficient AI.Your work will blend research and engineering, as you translate scientific concepts into functional systems, publish findings that advance the field, and deploy solutions that redefine what is achievable.While we prefer candidates from San Francisco and Boston, we welcome applications from other locations within the United States.

Dec 3, 2025
Apply
Anthropic logo
Full-time|On-site|San Francisco, CA

Join Anthropic as a Research Engineer focusing on Economic Research. In this role, you will leverage your analytical skills to conduct in-depth economic analysis and contribute to innovative projects aimed at enhancing our understanding of economic models and their implications.

Mar 12, 2026
Apply
Eragon logo
Full-time|On-site|San Francisco

Job DescriptionEmbrace the future of competitive advantage with Eragon, where we create bespoke AI systems that are meticulously tailored to understand your unique business landscape.At Eragon, we focus on developing AI models that leverage proprietary data, deployed directly within customer environments and continuously refined through real-world interactions. Our models not only respond but evolve, improving with each user engagement.We utilize a cutting-edge reinforcement learning framework known as RLQF (Reinforcement Learning from Query Feedback) that transforms user interactions into valuable training signals, establishing a cycle of ongoing enhancement that surpasses traditional fine-tuning methods.The RoleAs an Applied Research Engineer, you will be responsible for designing, training, and deploying advanced models that drive real business operations.This position is not about theoretical research; you will engage directly with customer data, constraints, and feedback, crafting solutions that excel in production settings. You will manage the entire lifecycle of the project, from defining the problem and designing data structures to training, evaluating, and iterating based on live performance.What You’ll DoTrain and adapt models: Fine-tune and post-train models on customer-specific data utilizing RLQF among other techniques.Close the loop: Convert real user interactions, corrections, and workflows into actionable training signals.Own end-to-end systems: Oversee the process from data ingestion and curation through to training, evaluation, and deployment.Evaluate in production: Create evaluation frameworks that accurately reflect real-world performance, rather than relying solely on benchmarks.Work with customers: Collaborate closely with users to comprehend their workflows and translate these into model functionalities.Ship and iterate: Focus on the continuous improvement of models based on live feedback and measurable outcomes.What We’re Looking ForExtensive hands-on experience in training, fine-tuning, or post-training machine learning models.Proficiency in handling messy, real-world data as opposed to only clean benchmarks.Familiarity with reinforcement learning techniques, feedback-driven training such as RLHF or RLAIF, and evaluation systems.Adeptness at quickly transitioning from problem identification to data management, model development, and iterative improvement.Strong engineering instincts with a comfort level in managing systems end-to-end.A proactive approach to shipping and enhancing systems, rather than solely focusing on research.

Mar 25, 2026
Apply
Cartesia logo
Full-time|On-site|*HQ - San Francisco, CA

About CartesiaAt Cartesia, our vision is to create the future of artificial intelligence—intelligent systems that are seamlessly integrated into daily life. We aim to overcome current limitations by enabling models to continuously understand and analyze vast streams of audio, video, and text data—ranging from 1 billion text tokens to 1 trillion video tokens—right on your device.Our pioneering team, comprised of PhDs from the Stanford AI Lab, has developed State Space Models (SSMs), a groundbreaking approach to training efficient, large-scale foundation models. With a rich blend of expertise in model innovation and systems engineering, alongside a product-focused engineering team, we are committed to developing and delivering cutting-edge AI models and user experiences.Supported by prominent investors including Index Ventures and Lightspeed Venture Partners, as well as many esteemed advisors and over 90 angel investors from diverse industries, we are at the forefront of AI advancements.About The RoleIn our quest to create truly global AI, we must train our models using datasets that represent the vast diversity of languages and cultures around the world. We are looking for a Research Engineer to take charge of the quality and comprehensiveness of the data that drives our models. As our in-house expert in global data, you will ensure that our models excel across multiple languages, leveraging your keen understanding of linguistic subtleties and your enthusiasm for building inclusive, large-scale datasets.Your ImpactDesign and construct extensive datasets for model training, conducting controlled experiments to evaluate their effect on model performance.Develop assessments for speech models through both manual annotation and automated evaluation metrics.Utilize data generation techniques to enhance model intelligence and reduce biases.Create automated quality control systems to validate and filter the generated data.Collaborate with product teams to ensure optimal support for key languages and markets.What You BringProven experience in developing or working with extensive multilingual datasets.Familiarity with generative models, including speech, text, or multimodal systems.Ability to guide human annotation and evaluation across various languages.Strong analytical skills and a passion for data-driven decision-making.

Jan 6, 2026
Apply
Center for AI Safety (CAIS) logo
Full-time|On-site|San Francisco, CA

Join the Center for AI Safety (CAIS), a pioneering research and advocacy organization dedicated to addressing the societal-scale risks posed by artificial intelligence. We tackle the most pressing challenges in AI through rigorous technical research, innovative field-building initiatives, and proactive policy engagement, in collaboration with our sister organization, the Center for AI Safety Action Fund.As a Research Scientist, you will spearhead and conduct transformative research aimed at enhancing the safety and dependability of cutting-edge AI systems. Your responsibilities will include designing and executing experiments on large language models, developing the necessary tools for training and evaluating models at scale, and converting your findings into publishable research. You will work closely with CAIS researchers and external partners from academia and industry, utilizing our compute cluster for large-scale model training and evaluation. Your research will focus on critical areas such as AI honesty, robustness, transparency, and the detection of trojan/backdoor behaviors, all aimed at mitigating real-world risks associated with advanced AI technologies.

Nov 14, 2023
Apply
Thinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our mission is to empower humanity by advancing collaborative general intelligence. We aspire to create a future where everyone can access the knowledge and tools necessary to harness AI for their individual needs and aspirations.Our team consists of scientists, engineers, and innovators who have developed some of the most renowned AI products, including ChatGPT and Character.ai, as well as open-weight models such as Mistral. We are also contributors to popular open-source initiatives like PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking talented engineers to develop the libraries and tools that will expedite research at Thinking Machines. You will take charge of our internal infrastructure, which includes evaluation libraries, reinforcement learning training libraries, and experiment tracking platforms, all aimed at enhancing research velocity over time.This position emphasizes collaboration; you will engage directly with researchers to pinpoint bottlenecks and challenges. Your success will be measured by the trust researchers place in your systems and their enjoyment of using them.What You'll DoDesign, develop, and manage research infrastructure, including evaluation frameworks, RL training systems, experiment tracking platforms, visualization tools, and shared utilities.Create high-throughput, scalable pipelines for distributed evaluation, reward modeling, and multimodal assessments.Establish systems for reproducibility, traceability, and stringent quality control throughout research experiments and model training processes. Implement monitoring and observability.Collaborate closely with researchers to identify obstacles and unlock new capabilities. Manage research tools like a product manager, actively seeking feedback and tracking user adoption.Work alongside infrastructure, data, and product teams to ensure seamless integration of tools across the technical stack.

Feb 3, 2026
Apply
xdof logo
Full-time|On-site|San Francisco On-site

Join xdof at a pivotal moment as we lead the charge in the development of general-purpose robotics. With frontier labs racing to create advanced robotic systems, high-quality training data is a critical challenge. Our mission is to build the essential infrastructure that supports foundational models – from data collection systems and operational capabilities to an exabyte-scale data warehouse and innovative software toolchains. This will empower our partners to advance the field of robotics.As a Research Engineer, you will be at the forefront of designing, constructing, and deploying real-world robotic learning systems. Your work will encompass manipulation, locomotion, and control, transitioning robots from raw hardware to fully operational systems.This hands-on position requires you to take ownership of systems from inception to deployment on actual robots. You will play a crucial role in establishing the technical foundations that facilitate large-scale robotic learning.

Dec 10, 2025
Apply
Prime Intellect logo
Full-time|On-site|San Francisco

Be Your Own LabAt Prime Intellect, we are dedicated to constructing the foundational infrastructure that leading AI laboratories utilize internally, making it accessible to all. Our advanced platform, Lab, integrates environments, evaluations, sandboxes, and high-performance training into a cohesive full-stack system for post-training at the forefront of AI development. From Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) to tool utilization and agent workflows, we ensure every aspect is validated through our own rigorous testing, training cutting-edge models on the same robust stack we offer to our users. We seek individuals who are passionate about contributing at the intersection of pioneering research and tangible infrastructure.Recently, we secured $15 million in funding (with a total of $20 million raised) led by Founders Fund, along with contributions from Menlo Ventures and esteemed investors such as Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao (Chief Scientific Officer of Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), Emad Mostaque (Stability AI), and many others.About the RoleWe are in search of a Forward-Deployed Research Engineer (FDRE) who will act as the key technical liaison between Prime Intellect and our most valued clients: AI companies, research institutions, and enterprises implementing post-training and agentic RL on our platform.This role transcends traditional research; you will primarily engage directly with customers to gain insights into their models, workflows, and objectives. Your responsibility will be to convert these insights into actionable training runs, environment designs, evaluation harnesses, and deployment strategies using the Lab stack. You will be the catalyst for making our platform operate effectively for real-world applications.Collaboration with our research, product, and infrastructure teams will be essential, as you will provide valuable field insights to inform future developments, ensuring we align our offerings with actual customer needs.What You'll DoCustomer Engagement & Technical DeliveryWork directly with key customers to comprehend their agent architectures, identify failure modes, and clarify product goalsCreate and develop tailored RL environments, evaluation tools, and verification methods that define success for each specific domainDesign agent scaffolding — including tool usage, multi-step reasoning, memory functions, and sandbox execution — customized to match client workflowsSet up and initiate training sessions on Lab, refining reward functions, rollout strategies, and evaluation standardsLead technical engagements from inception to deployment, ensuring seamless integration and functionality.

Feb 20, 2026
Apply
Thinking Machines Lab logo
Full-time|$350K/yr - $475K/yr|On-site|San Francisco

At Thinking Machines Lab, our ambition is to enhance human potential by advancing collaborative general intelligence. We envision a future where individuals have the tools and knowledge to harness AI for their distinct requirements and aspirations.Our team comprises dedicated scientists, engineers, and innovators who have contributed to some of the most renowned AI products, including ChatGPT and Character.ai, along with open-weight models like Mistral, and influential open-source projects such as PyTorch, OpenAI Gym, Fairseq, and Segment Anything.About the RoleWe are seeking an Infrastructure Research Engineer to architect, optimize, and sustain the computational frameworks that facilitate large-scale language model training. You will create high-performance machine learning kernels (e.g., CUDA, CuTe, Triton), enable effective low-precision arithmetic operations, and enhance the distributed computing infrastructure essential for training expansive models.This position is ideal for an engineer who thrives in close collaboration with hardware and research disciplines. You will partner with researchers and systems architects to merge algorithmic design with hardware efficiency. Your responsibilities will include prototyping new kernel implementations, evaluating performance across various hardware generations, and helping to establish the numerical and parallelism strategies crucial for scaling next-generation AI systems.Note: This is an evergreen role that remains open continuously for expressions of interest. We receive numerous applications, and there may not always be an immediate opportunity that aligns with your qualifications. However, we encourage you to apply, as we regularly assess applications and will reach out as new positions become available. You are also welcome to reapply after gaining additional experience, but please refrain from applying more than once every six months. Additionally, you may notice postings for specific roles catering to particular projects or team needs. In such cases, you are encouraged to apply directly alongside this evergreen listing.What You’ll DoDesign and develop custom ML kernels (e.g., CUDA, CuTe, Triton) for key LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for contemporary GPU and accelerator architectures.Conceptualize compute primitives aimed at alleviating memory bandwidth bottlenecks and enhancing kernel compute efficiency.Collaborate with research teams to synchronize kernel-level optimizations with model architecture and algorithmic objectives.Create and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training.Contribute to the stability and scalability of our infrastructure, ensuring it meets the growing demands of AI development.

Nov 27, 2025
Apply
Judgment Labs logo
Full-time|On-site|San Francisco

At Judgment Labs, we are revolutionizing the monitoring of agent behavior through our innovative infrastructure for Agent Behavior Monitoring (ABM). Unlike traditional observability metrics focused solely on logging exceptions and latency, our approach identifies behavioral anomalies including instruction drifts and context retrieval losses within scaled production environments.Numerous teams developing autonomous agents depend on Judgment Labs to gain insights into their systems' performance after deployment. Rather than merely reacting to incidents, they can cluster patterns across conversations and workflows, correlate regressions with specific interaction types, and accurately identify where reliability falters in their operational contexts.We are proud to announce that we have raised over $30 million in two funding rounds over the last five months. Our esteemed investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, and notable individuals like Chris Manning and Michael Ovitz.The Role:We seek passionate Research Engineers to help us develop AI systems that utilize agent interaction data to enhance our understanding of agent behavior, facilitate large-scale evaluations, and drive improvements through iterative learning and feedback.Your research will have a tangible impact. You will engage directly with real-world agent data, implement cutting-edge methodologies in production, and witness your contributions being deployed in real-time. By enhancing the measurability and debuggability of agent behavior, your work will empower teams across finance, legal, operations, and other critical domains. You will lead projects from inception to completion, enjoying substantial autonomy while collaborating closely with our team to create self-improving agent systems.What You'll Do:Develop systems that aggregate, index, and analyze extensive agent interaction data to derive valuable evaluation metrics.Create agent-based systems for the analysis and evaluation of complex, long-term behaviors.Design and execute post-training and optimization workflows aimed at enhancing agent performance.Build internal tools and infrastructure that promote rapid experimentation, analysis, and training.What We're Looking For:You should resonate with at least one of the following:A strong focus on data quality, evaluation, and benchmarking, with a hands-on approach to working with complex datasets.Experience in developing agent systems and applying them in real-world or production environments.A robust background in machine learning or related fields, with an eagerness to advance agent technology.

Jan 11, 2026

Sign in to browse more jobs

Create account — see all 11,542 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.