About the job
At sensmore, we are revolutionizing the automation of the world's largest machines, infusing them with unparalleled intelligence. Our proprietary Physical AI technology empowers heavy machinery, such as wheel loaders, to seamlessly adapt to dynamic environments and perform new tasks without prior training.
We are integrating state-of-the-art robotics into a comprehensive platform that drives intelligence and automation products, enhancing productivity and safety for clients in mining, construction, and related sectors.
Join us to play a crucial role in redefining the automation landscape across heavy industries.
Role Overview:
In the position of Data Engineer, you will design, construct, and maintain the data infrastructure that supports Sensmore’s embodied AI and Vision-Language-Action Models (VLAMs). You will work closely with Robotics, Machine Learning, and Software Engineering teams to ensure seamless and reliable data flows from our sensor arrays (including radar, LiDAR, cameras, and IMUs) into training and inference pipelines. This role integrates traditional data engineering practices (such as ETL/ELT, warehouse design, and monitoring) with ML Ops methodologies, focusing on model versioning, data drift detection, and automated retraining.
Key Responsibilities:
Construct and manage data pipelines: Ingest, process, and convert multi-sensor telemetry (including radar point-clouds, video frames, and log streams) into formats suitable for analytics and machine learning.
Design scalable storage solutions: Develop high-throughput, low-latency data lakes and warehouses (e.g., S3, Delta Lake, Redshift/Snowflake).
Facilitate ML Ops workflows: Integrate tools like DVC or MLflow, automate model training/retraining triggers, and track the lineage of data and models.
Ensure data integrity: Implement validation, monitoring, and alerting mechanisms to identify anomalies and schema changes promptly.
Collaborate effectively across teams: Work alongside Embedded Systems, Robotics, and Software teams to align on data schemas, APIs, and real-time data requirements.
Optimize system performance: Fine-tune distributed processing, queries, and storage configurations for cost-efficiency and throughput.
Document and advocate best practices: Maintain comprehensive documentation for data schemas, pipeline architectures, and ML Ops practices to elevate team performance.
