About the job
About Us
At Hitachi Digital Services, we are pioneers in the realm of digital solutions and transformation. Our vision is to unlock the immense potential of our world, and we are driven by a people-centric approach that aims to create positive change. Every day, we innovate to future-proof urban spaces, conserve natural resources, protect vital ecosystems, and enhance lives. Our unique blend of innovation, technology, and expertise empowers us to lead both our company and clients into the future.
We believe that diverse experiences and perspectives are invaluable. We value your character, life experiences, and passion just as much as your qualifications.
Join Our Team
We are seeking a dedicated MLOps L2 Support Engineer who will play a crucial role in providing 24/7 production support for our machine learning (ML) and data pipelines. This role involves on-call support, including weekends, to ensure the high availability and reliability of our ML workflows. You will work with technologies such as Dataiku, AWS, CI/CD pipelines, and containerized deployments to maintain and troubleshoot ML models in production.
Key Responsibilities:
- Deliver L2 support for MLOps production environments, ensuring maximum uptime and reliability.
- Troubleshoot issues related to ML pipelines, data processing jobs, and APIs.
- Monitor logs, alerts, and performance metrics using tools like Dataiku, Prometheus, Grafana, or AWS CloudWatch.
- Conduct root cause analysis (RCA) and resolve incidents within agreed SLAs.
- Escalate unresolved issues to L3 engineering teams as necessary.
Dataiku Platform Management:
- Manage Dataiku DSS workflows, troubleshoot job failures, and optimize performance.
- Monitor and support Dataiku plugins, APIs, and automation scenarios.
- Collaborate with Data Scientists and Data Engineers to debug ML model deployments.
- Perform version control and ensure proper documentation.

