About the job
Join us as we seek to expand our team with an innovative Audio and Multimodal ML Engineer at connecthum, a thriving AI infrastructure startup dedicated to developing the safety and control framework for large-scale AI systems.
About connecthum:
An AI-native product company focused on enhancing AI safety and infrastructure.
Supported by leading international investors.
Expertly managing substantial AI traffic across diverse enterprise environments.
Specializing in training and fine-tuning proprietary models for superior performance and reliability.
A compact, highly skilled team that embraces a fast-paced development cycle.
We are committed to creating a robust control and evaluation layer for AI systems, empowering organizations to define, test, and enforce AI behavior in real-world scenarios.
Your Role:
Train and enhance large-scale audio and multimodal models.
Design and execute experiments focused on architecture, data mixtures, and training strategies.
Develop and optimize audio data pipelines to ensure efficiency.
Enhance inference speed, latency, and production readiness for models.
Deploy models end-to-end in low-latency environments.
Establish substantial evaluation metrics that extend beyond standard benchmark scores.
Collaborate closely with research and engineering teams to drive innovation.
This is a dynamic, hands-on position where research and production converge.
Technical Environment:
Utilizing PyTorch-based training pipelines.
Engaging in large-scale distributed training techniques.
Implementing speech and audio modeling architectures.
Integrating multimodal models for comprehensive solutions.
Optimizing models through quantization, distillation, and streaming inference.
Overseeing production deployment and serving systems.
(Full technical stack details will be shared during the interview process.)
Qualifications:
Minimum of 3 years of experience in training deep learning models, particularly in audio or speech domains.
Strong expertise in distributed training frameworks.
Deep understanding of audio signal processing fundamentals.
Proven experience in deploying models to production, with a focus on latency and performance.
Exceptional problem-solving skills and collaborative spirit.

