About the job
Join Our Innovative Team
At 42dot, we are at the forefront of autonomous driving technology. Our Machine Learning (ML) Platform Engineers are instrumental in developing a robust data platform and machine learning training and evaluation systems that support advanced algorithms for autonomous driving. We focus on creating a scalable, distributed system capable of handling vast datasets, comprising millions of scenes, while also delivering high-performance SDKs for ML model training and evaluation. Our platforms significantly enhance the ML model development lifecycle, covering training, evaluation, deployment, and monitoring within cloud environments.
Key Responsibilities
Design and develop a highly scalable and reliable data platform to effectively manage, visualize, search, and serve extensive datasets for ML model training, fine-tuning, and validation.
Create advanced autonomous driving data SDK functionalities, including scene data search, dataset preparation, and dataset loading.
Establish a data lakehouse for autonomous driving scene datasets, integrating sensor data, calibration data, and annotation data.
Identify and resolve performance bottlenecks across data processing pipelines, addressing data processing latency, search latency, and Test Procedure (TP) coverage.
Set up and maintain infrastructure components for the data platform, including data processing pipelines, databases, data lakehouses, and data serving mechanisms.
Work collaboratively with cross-functional teams, including ML algorithm, ML application, and Cloud Infrastructure teams, to ensure alignment of ML platforms with the overall architecture of the autonomous driving system.
Qualifications
A Bachelor's degree or higher in Computer Science, Engineering, Robotics, or a related technical discipline.
A minimum of 5 years of experience in Data Engineering or ML Platform roles.
Proficiency in Python with substantial experience in Python SDK development.
Solid experience with databases such as MongoDB and PostgreSQL.
Hands-on experience orchestrating data pipeline jobs using Databricks Workflows or Apache Airflow, along with integrating data pipelines with machine learning models.
Extensive knowledge of data technologies and architectures, including Data Warehouses (e.g., Hive) and Lakehouses (e.g., Delta Lake).
Experience with Apache Spark or other big data computing engines.
