Hark logoHark logo

Technical Staff Member - Multimodal Speech

HarkSan Jose
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Mid to Senior

Qualifications

Responsibilities Lead research and development efforts to enhance speech and audio capabilities in multimodal models, encompassing speech recognition, synthesis, and comprehension. Create and refine large-scale speech and audio data pipelines, focusing on data collection, filtering, alignment, and synthetic data generation. Design and implement advanced models for speech and audio, including end-to-end multimodal architectures and real-time systems. Establish evaluation frameworks and internal benchmarks to assess speech quality, latency, robustness, and overall user experience. Optimize models and systems for real-time performance, scalability, and deployment in production environments. Work closely with product and engineering teams to translate research innovations into impactful, user-facing AI solutions. Requirements Demonstrated expertise in advancing speech or audio models through innovative data, modeling, or training approaches. Extensive experience in speech/audio domains such as Automatic Speech Recognition (ASR), Text-to-Speech (TTS), speech-to-speech translation, or audio foundation models. Proficient in large-scale machine learning systems and distributed training methodologies.

About the job

About Hark

Hark is at the forefront of artificial intelligence, creating cutting-edge, personalized intelligence systems that are proactive and multimodal. Our technology interacts naturally with the world through speech, text, visual input, and persistent memory.

We are integrating this intelligence with next-generation hardware to establish a universal interface between humans and machines. While current AI primarily relies on outdated chat interfaces and devices, Hark is focused on pioneering the future: agentic systems capable of seamless interaction with individuals and their environments.

Our mission involves developing multimodal models alongside next-gen AI hardware, designed as a cohesive interface for a new era of intelligent systems.

About the Role

As a vital member of Hark's Omni team, you will contribute to the development of innovative AI experiences that transcend text, enabling models to comprehend and produce content across various modalities, including audio. Our objective is to forge real-time multimodal intelligence that facilitates intuitive and immersive user experiences.

Your role will entail advancing speech and audio functionalities within multimodal foundation models. You will engage in comprehensive tasks, from data and modeling to training, evaluation, and real-time deployment, pushing the frontiers of speech intelligence and enhancing human-computer interaction.

About Hark

Hark is a pioneering artificial intelligence company dedicated to developing sophisticated, personalized intelligence solutions. Our focus on multimodal communication and next-generation hardware positions us as leaders in creating a seamless human-machine interface.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.