About the job
Join our innovative team at Inetum Polska as a Data Engineer, where you will utilize your data engineering expertise in a fast-paced environment. Your role will be pivotal in ensuring smooth data migration and optimization for cutting-edge AI and ML projects. Don't miss out on the opportunity to contribute to our groundbreaking initiatives!
Key Responsibilities
Data Pipeline Development:
- Craft, develop, and implement Python-based ETL/ELT pipelines to facilitate data migration from on-premises MS SQL Server to our Databricks instance,
- Ensure effective ingestion of historical parquet datasets into Databricks.
Data Quality & Validation:
- Establish validation, reconciliation, and quality assurance protocols to guarantee the accuracy and completeness of migrated data,
- Manage schema mapping, field transformations, and metadata enrichment to standardize datasets,
- Integrate data governance, quality assurance, and compliance into all migration processes.
Performance Optimization:
- Optimize pipelines for enhanced speed and efficiency, leveraging Databricks capabilities, including Delta Lake when applicable,
- Oversee resource utilization and scheduling for large dataset transfers.
Collaboration:
- Coordinate closely with AI engineers, data scientists, and business stakeholders to outline data access patterns needed for upcoming AI POCs,
- Work alongside infrastructure teams to ensure secure connections between legacy systems and Databricks.
Documentation & Governance:
- Maintain comprehensive technical documentation for all data pipelines,
- Adhere to best practices for data governance, compliance, and security throughout the migration process.

