CloudLinux logo

Senior Database Reliability Engineer (DBRE) & Architect - Remote

CloudLinuxRemote — Podgorica, Podgorica Municipality, Montenegro
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Senior

Qualifications

Strong experience in database administration, specifically with PostgreSQL, ClickHouse, and MongoDB; proficiency in Infrastructure as Code (IaC) tools such as Terraform and Ansible; experience with cloud environments including AWS and Google Cloud; knowledgeable in data pipeline management and analytics tools; expertise in automation frameworks and self-healing systems; strong coding skills in Python or Go; and excellent problem-solving abilities.

About the job

CloudLinux is revolutionizing the Linux infrastructure landscape by providing robust security and reliability for more than 500,000 servers globally. Our flagship products, CloudLinux OS, TuxCare, and Imunify360, set the standard in the hosting and Enterprise sectors.

We are in search of an innovative engineer to spearhead the advancement of our data platform. As we transition to an Internal Database-as-a-Service (DBaaS) model by 2025, we require a professional who goes beyond conventional database administration. Your role will involve architecting resilient distributed systems, automating infrastructure through code, and transforming databases into dependable services for our product teams.

If you are ready to move beyond mundane ticket management and instead focus on building platforms capable of processing petabytes of data, this opportunity is perfect for you.

Your Responsibilities & Challenges:

  • DBaaS Architecture: Design and implement a self-service platform utilizing Terraform and Ansible to deploy HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) across diverse environments (Bare Metal, OpenNebula, Kubernetes, and Public Clouds). You will be transforming infrastructure into a scalable product.
  • Scaling ClickHouse: Manage rapidly expanding analytics clusters (12+ clusters, handling tens of terabytes of data). You'll address sharding, optimize table engines (ReplicatedMergeTree), and create reliable S3 backup pipelines under high load.
  • Data Platform & Analytics Support: Maintain and expand the infrastructure for Apache Airflow and Redash. Ensure the reliability of ETL pipelines and visualization tools, serving as a bridge between raw infrastructure and the data analytics team.
  • Reliability as Code: Implement Site Reliability Engineering (SRE) practices in data management. Transition from manual incident response to automated self-healing systems. Define and implement SLO/SLI for all databases.
  • Stack Modernization: Lead the migration from legacy systems to contemporary cloud architectures. Participate in strategic decisions on implementing Kubernetes operators for stateful workloads.
  • Expertise & Mentorship: Act as the technical authority for product teams, assisting them in optimizing data schemas and SQL queries for high-load environments.

Our Technological Ecosystem:

  • Databases: PostgreSQL 15+ (Patroni, PgBouncer), ClickHouse (Sharded/Replicated), MongoDB, Redis, Kafka
  • Data & Analytics: Apache Airflow, Redash (Infrastructure & Integration).
  • Infrastructure: Oversee 3+ DC colocation (OpenNebula, Kubernetes, Bare Metal), AWS, Google Cloud, Azure, DigitalOcean – Hybrid Cloud.
  • Automation & IaC: Terraform, Ansible, Python/Go, GitLab, Jenkins, Gerrit.
  • Observability: Victoriametrics, Grafana, Prometheus.

About CloudLinux

At CloudLinux, we are committed to enhancing the security and reliability of Linux servers worldwide. Our innovative solutions cater to a diverse range of clients, from small hosting companies to large enterprises, making us a trusted partner in the industry.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.