About the job
About Us
Constructor is a cutting-edge platform revolutionizing search and discovery in e-commerce, specifically designed to enhance key metrics such as revenue, conversion rates, and profitability. Our proprietary search engine, developed entirely in-house, leverages advanced transformers and generative LLMs to optimize functionalities from search capabilities to personalized recommendations and shopping agents. As the largest department within Constructor, our engineering team has crafted an unparalleled engine that consistently outperforms competitors in A/B testing. We are dedicated to staying at the forefront of AI technology.
Built to handle extreme scalability, our engine processes over 1 billion queries daily across multiple languages, serving clients in diverse countries, including industry giants like Sephora, Under Armour, and Petco.
Our team thrives on solving complex problems and enhancing the experiences of our clients and colleagues alike. We prioritize values such as empathy, openness, curiosity, and continuous improvement, believing that empowering every team member leads to remarkable achievements.
Established in 2019 by Eli Finkelshteyn and Dan McCormick, Constructor is a U. S.-based company committed to innovation and excellence.
Job Description
The Constructor Data Platform is essential for our internal data and ML teams, managing the ingestion of over 1 TB of compressed events each day and overseeing more than 6 PB of data within our data lake.
Key Responsibilities:
- Develop and maintain comprehensive tools and infrastructure utilized daily by our data scientists and ML engineers.
- Create public-facing APIs for event ingestion (FastAPI) and real-time analytics (ClickHouse, Cube).
- Manage data storage in optimal formats (S3, ClickHouse, Delta).
- Facilitate data processing using technologies such as Python, Spark/Databricks, ClickHouse, AWS Lambda, and Kinesis.
- Implement robust monitoring solutions (Prometheus, OpenTelemetry, PagerDuty, Sentry).
- Ensure automated testing of data pipelines and maintain data quality.
- Provide cost observability and optimization capabilities.
- Offer comprehensive tools for developers to design, execute, test, and schedule data pipelines, along with necessary support and documentation.
This platform is developed by our dedicated Data Lake Team and Data Infrastructure Team.
About the Data Infrastructure Team
Join us as a Senior Data Engineer on our Data Infrastructure Team, responsible for:
- Job scheduling and orchestration for data pipelines.
- Deployment and management of BI tools.
- Real-time analytics infrastructure (ClickHouse, AWS services).
