About the job
Join our dynamic team as a Senior/Lead Platform Engineer, where you will spearhead the design, implementation, and management of our cutting-edge data, analytics, and machine learning infrastructure. This multifaceted role encompasses platform architecture, DevSecOps, DataOps, and ML infrastructure, blending strategic vision with hands-on execution. You will leverage AWS and Databricks to develop, integrate, and maintain platforms that facilitate scalable, secure, and production-ready ML/AI solutions.
Key Responsibilities
- Design and implement comprehensive data and ML platforms, including data lakes, warehouses, and both streaming and batch pipelines, utilizing AWS and Databricks.
- Lead the adoption of DevSecOps and DataOps methodologies, focusing on infrastructure as code (IaC), CI/CD pipelines for data & ML workflows, and secure multi-account/multi-region cloud operations.
- Integrate AWS services (such as S3, Redshift, Kinesis, Lambda, EKS/ECS) with Databricks runtime, Delta Lake, and Unity Catalog to create efficient and scalable data pipelines.
- Establish and manage ML infrastructure, including training clusters, model versioning, and MLOps toolchains (like MLflow), as well as monitoring and observability tools for automated retraining workflows.
- Set standards for data governance, lineage, quality, and observability across data pipelines and ML workflows.
- Mentor engineering teams, establish architectural best practices, and oversee the implementation of high-scale data/ML systems.
- Enhance system performance, optimize costs, and ensure scalability while diagnosing and resolving large-scale production issues.
- Continuously assess and integrate new tools and technologies in cloud, data platform, DevSecOps, and ML infrastructure to drive innovation.
