About the job
DataHub is a pioneering AI & Data Context Platform utilized by over 3,000 leading enterprises, including giants like Apple, CVS Health, Netflix, and Visa. Developed in collaboration with a vibrant open-source community comprising over 13,000 members, DataHub's metadata graph offers profound context about AI and data assets, ensuring unparalleled scalability and extensibility.
Our enterprise SaaS product, DataHub Cloud, provides a fully managed solution equipped with AI-driven discovery, observability, and governance capabilities. Organizations depend on DataHub's offerings to expedite their data investments' return, guarantee AI system reliability, and enforce unified governance, allowing AI and data to harmonize and restore order to data chaos.
The Challenge
As AI and data products become integral to business operations, enterprises grapple with a metadata crisis:
- A lack of a unified approach to tracking the intricate data supply chain supporting AI systems.
- Engineering teams face hurdles in data discovery, lineage, and governance.
- Organizations require machine-scale metadata management beyond mere human-browsable catalogs.
Why This Matters
This is where infrastructure meets impact. The metadata layer you create will directly fuel the next generation of AI systems at an unprecedented scale. Your contributions will determine how securely and efficiently thousands of organizations deploy AI, influencing millions of users globally.
The Role
We seek an outstanding Backend Engineer to spearhead the development of DataHub's Platform framework – the backbone that connects various data systems and drives our metadata collection functionalities.
You’ll Build
- Scalable, fault-tolerant ingestion systems for enterprise-grade metadata.
- Clean, intuitive APIs for our connector ecosystem.
- Event-driven architectures for real-time metadata processing.
- Schema mapping between diverse systems and DataHub's unified model.
- Versioning systems for AI assets (training data, model weights, embeddings).
