About the job
Join Our Innovative Team at Constructor
Constructor is revolutionizing search and discovery in ecommerce with our next-generation platform designed to optimize key performance metrics such as revenue, conversion rates, and profitability. Our proprietary search engine, developed entirely in-house, leverages cutting-edge transformers and generative LLMs to enhance everything from search functionality to personalized recommendations and shopping agents.
Our engineering department, the largest in our company, is dedicated to maintaining the superiority of our engine, which consistently outperforms competitors in A/B testing. We are passionate about pushing the boundaries of AI technology to achieve remarkable results.
With the capability to handle more than 1 billion queries daily in 150 languages across approximately 100 countries, our engine is trusted by leading ecommerce brands like Sephora, Under Armour, and Petco.
At Constructor, we are a team of problem solvers committed to enhancing the experiences of both our customers and colleagues. We prioritize values such as empathy, transparency, curiosity, and continuous improvement, believing that empowering individuals leads to outstanding achievements.
Founded in 2019 by Eli Finkelshteyn and Dan McCormick, Constructor continues to thrive as a U. S.-based company.
About the Role
We are looking for a Senior Data Engineer to join our Data Lake Team, a crucial part of the Constructor Data Platform that supports all internal data and machine learning teams. This role involves managing the ingestion of over 2 TB of compressed events daily and overseeing more than 6 PB of data in our data lake.
The Data Platform Includes:
- A comprehensive suite of tools and infrastructure utilized daily by our data scientists and ML engineers.
- Public-facing APIs for event ingestion (FastAPI) and real-time analytics (ClickHouse, Cube).
- Data storage management in formats such as S3, ClickHouse, and Delta.
- Data processing capabilities using technologies like Python, Spark/Databricks, ClickHouse, AWS Lambda, and Kinesis.
- Robust monitoring solutions (Prometheus, OpenTelemetry, PagerDuty, Sentry).
- Automated testing for pipelines and data quality assurance.
- Cost observability and optimization functionalities.
- Tools for developers for creating, running, testing, and scheduling data pipelines, complemented by comprehensive support and documentation.
Your Responsibilities Will Include:
- Maintaining and enhancing our data pipeline job framework.
- Developing a Data Quality framework for validating internal and external data sources.
- Continuously improving our data processing and storage capabilities.
