About the job
P-186
At Databricks, we are passionate about empowering data teams to tackle some of the world’s most challenging problems, from security threat detection to cancer drug development. Our mission is to build and operate the leading data and AI infrastructure platform, enabling our customers to concentrate on the high-value challenges that are integral to their own objectives.
Founded in 2013 by the original creators of Apache Spark™, Databricks has rapidly evolved from a small office in Berkeley, California, to a global powerhouse with over 1000 employees. Trusted by thousands of organizations, from startups to Fortune 100 companies, we are recognized as one of the fastest-growing SaaS companies worldwide.
Our engineering teams create highly sophisticated products that address significant needs in the industry. We continuously push the limits of data and AI technology while maintaining the resilience, security, and scalability essential for our customers' success on our platform.
We manage one of the largest-scale software platforms, consisting of millions of virtual machines that generate terabytes of logs and process exabytes of data daily. At this scale, we frequently encounter cloud hardware, network, and operating system faults, and our software must effectively shield our customers from these challenges.
Modern data analysis leverages advanced techniques, such as machine learning, that far exceed the capabilities of traditional SQL query engines. As a Software Engineer on the Runtime team at Databricks, you will be instrumental in developing the next generation of distributed data storage and processing systems that outshine specialized SQL query engines in relational query performance, while providing the flexibility and programming abstractions to support a variety of workloads, from ETL to data science.
Examples of projects you may work on include:
- Apache Spark™: Contributing to the de facto open-source framework for big data.
- Data Plane Storage: Developing reliable, high-performance services and client libraries for storing and accessing vast amounts of data on cloud storage backends like AWS S3 and Azure Blob Store.
- Delta Lake: A storage management system that merges the scalability and cost-effectiveness of data lakes with the performance and reliability of data warehouses, featuring low latency streaming. Its higher-level abstractions and guarantees, including ACID transactions and time travel, significantly reduce the complexity of real-world data engineering architectures.
- Delta Pipelines: Aiming to simplify the management of data engineering pipelines.

