About the job
CloudLinux is revolutionizing the Linux infrastructure landscape by providing robust security and reliability for more than 500,000 servers globally. Our flagship products, CloudLinux OS, TuxCare, and Imunify360, set the standard in the hosting and Enterprise sectors.
We are in search of an innovative engineer to spearhead the advancement of our data platform. As we transition to an Internal Database-as-a-Service (DBaaS) model by 2025, we require a professional who goes beyond conventional database administration. Your role will involve architecting resilient distributed systems, automating infrastructure through code, and transforming databases into dependable services for our product teams.
If you are ready to move beyond mundane ticket management and instead focus on building platforms capable of processing petabytes of data, this opportunity is perfect for you.
Your Responsibilities & Challenges:
- DBaaS Architecture: Design and implement a self-service platform utilizing Terraform and Ansible to deploy HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) across diverse environments (Bare Metal, OpenNebula, Kubernetes, and Public Clouds). You will be transforming infrastructure into a scalable product.
- Scaling ClickHouse: Manage rapidly expanding analytics clusters (12+ clusters, handling tens of terabytes of data). You'll address sharding, optimize table engines (ReplicatedMergeTree), and create reliable S3 backup pipelines under high load.
- Data Platform & Analytics Support: Maintain and expand the infrastructure for Apache Airflow and Redash. Ensure the reliability of ETL pipelines and visualization tools, serving as a bridge between raw infrastructure and the data analytics team.
- Reliability as Code: Implement Site Reliability Engineering (SRE) practices in data management. Transition from manual incident response to automated self-healing systems. Define and implement SLO/SLI for all databases.
- Stack Modernization: Lead the migration from legacy systems to contemporary cloud architectures. Participate in strategic decisions on implementing Kubernetes operators for stateful workloads.
- Expertise & Mentorship: Act as the technical authority for product teams, assisting them in optimizing data schemas and SQL queries for high-load environments.
Our Technological Ecosystem:
- Databases: PostgreSQL 15+ (Patroni, PgBouncer), ClickHouse (Sharded/Replicated), MongoDB, Redis, Kafka
- Data & Analytics: Apache Airflow, Redash (Infrastructure & Integration).
- Infrastructure: Oversee 3+ DC colocation (OpenNebula, Kubernetes, Bare Metal), AWS, Google Cloud, Azure, DigitalOcean – Hybrid Cloud.
- Automation & IaC: Terraform, Ansible, Python/Go, GitLab, Jenkins, Gerrit.
- Observability: Victoriametrics, Grafana, Prometheus.
