About the job
About Hark
Hark is at the forefront of artificial intelligence, dedicated to creating sophisticated, personalized solutions that are proactive and multimodal. Our technology interacts with the world through speech, text, vision, and persistent memory.
We are integrating this intelligence with cutting-edge hardware to establish a universal interface for human-machine interaction. Unlike existing AI that primarily relies on chat boxes and outdated devices, Hark is pioneering the future with agentic systems that engage naturally with users and their environments.
To realize this vision, we are innovating multimodal models alongside next-generation AI hardware, purposefully designed as a cohesive interface for a new era of intelligent systems.
About the Role
The Omni team at Hark is developing the next generation of AI experiences that extend beyond traditional text-based interactions. Our aim is to enable models that comprehend and generate content across diverse modalities, including text and vision, fostering seamless and immersive user experiences.
As a member of the Omni team, you will play a pivotal role in advancing real-time audio, video, and multimodal world models. This position encompasses working across the full technology stack, from data management and modeling to training, serving, and product integration. You will be instrumental in both pretraining and posttraining initiatives, collaborating closely with product teams to enhance model capabilities and deliver outstanding, end-to-end user experiences.
