Qualifications
What You'll Do
Proactively identify new audio data sources and integrate them into our ingestion pipeline.
Manage and enhance the cloud infrastructure for our ingestion pipeline, currently operating on Google Cloud Platform (GCP) and managed via Terraform.
Work alongside our Scientists to optimize cost, throughput, and quality, enabling richer data access for our next-generation models.
Collaborate with the AI Team and Speechify Leadership to develop the dataset roadmap that will empower our upcoming consumer and enterprise products.
An Ideal Candidate Should Have
A Bachelor's, Master's or PhD in Computer Science or a related field.
A minimum of 5 years of experience in software development.
Expertise in bash/Python scripting within Linux environments.
Proficient in Docker and Infrastructure-as-Code practices, with professional experience using at least one major Cloud Provider (GCP preferred).
Experience with web crawlers and large-scale data processing workflows is advantageous.
Strong multitasking abilities and adaptability to shifting priorities.
Excellent communication skills, both verbal and written.
About the job
At Speechify, our mission is to eliminate barriers to learning by ensuring that reading is accessible to everyone.
With over 50 million users, Speechify’s innovative text-to-speech technology transforms various reading materials—including PDFs, books, Google Docs, news articles, and websites—into engaging audio formats. Our suite of products includes applications for iOS, Android, Mac, a Chrome Extension, and a Web App. Recently, Google recognized Speechify as the Chrome Extension of the Year, while Apple awarded us the 2025 Design Award for Inclusivity.
We are a fully distributed team of nearly 200 talented individuals from diverse backgrounds, including former employees of Amazon, Microsoft, Google, and graduates from prestigious institutions such as Stanford, as well as founders of successful startups like Stripe, Vercel, and Bolt.
Overview
We are seeking a passionate Software Engineer to join our AI team, focusing on data infrastructure and acquisition. This role will be critical in enhancing our data collection processes to support model training operations. You will work on building high-quality datasets at petabyte scale and low cost through close collaboration between infrastructure, engineering, and research.
About Speechify
Speechify is dedicated to making reading an effortless and inclusive experience for all. Our award-winning text-to-speech technology empowers millions to read faster and retain more information through audio formats. With a global team of experts and innovators, we strive to continuously enhance our offerings and reach even more users.