About the job
About Us
At TherapyNotes, we pride ourselves on being the leading provider of behavioral health Practice Management and EHR software. Our cutting-edge SaaS solution streamlines scheduling, billing, documentation, telehealth, and more, allowing clinicians to dedicate their time to providing exceptional patient care.
We are a vibrant team of innovators committed to advancing our software capabilities. Join us in transforming the behavioral health software landscape while making a significant impact!
About the Position
We are looking for a Senior Database Site Reliability Engineer with expertise in managing PostgreSQL. In this pivotal role, you will ensure the reliability and operability of our PostgreSQL services that power a 24x7 SaaS platform, focusing on availability, performance, observability, incident response, and automation. You will collaborate closely with cross-functional teams—developers, operations, and infrastructure—to ensure our database services operate seamlessly and efficiently. If you are driven by a passion for operational excellence and continuous improvement, we want to hear from you.
What You'll Do
- Design, implement, and maintain high-availability, high throughput, and data-intensive PostgreSQL database systems that support a 24x7 SaaS platform.
- Enhance database service reliability through effective monitoring/alerting, SLO-oriented metrics, and operational readiness.
- Lead incident response efforts, conduct root cause analyses, and implement corrective actions for database-related production events.
- Collaborate with technical leadership to ensure new systems are both supportable and maintainable by development and operations teams.
- Provide advanced technical support and guidance to other technology teams within the organization.
- Participate in on-call support duties as required.
- Ensure compliance with HIPAA security policies within the database platform.
- Adhere to the security and operational policies set forth by the organization.
- Continuously improve database observability using Datadog, creating actionable dashboards and alerts with tools like Prometheus, Grafana, or New Relic. Familiarity with PGAnalyze or Percona is a plus.
- Automate system maintenance tasks using Bash, Powershell, Python, or Ansible, and manage infrastructure as code (IaC) by writing Ansible playbooks. Some experience with Terraform is advantageous.

