About the job
Speechify builds tools that remove barriers to reading and learning. More than 50 million people use our text-to-speech products to turn PDFs, books, Google Docs, news articles, and websites into audio. Our suite includes iOS, Android, Mac, Chrome extension, and web apps. Google named us Chrome Extension of the Year, and Apple awarded us the 2025 Design Award for Inclusivity.
Our fully remote team includes nearly 200 professionals from a range of backgrounds. Engineers and AI researchers at Speechify have experience at Amazon, Microsoft, and Google, and many hold degrees from Stanford and other top universities. We value innovation and inclusivity in everything we do.
Role Overview
Speechify is hiring a Software Engineer focused on data infrastructure and acquisition for our AI team. This position centers on building and scaling high-quality datasets, integrating infrastructure, engineering, and research to support model training at the petabyte level.
What You Will Do
- Identify and source new audio data to strengthen our ingestion pipeline.
- Manage and expand cloud infrastructure on Google Cloud Platform (GCP) using Terraform.
- Work with scientists to improve data processing cost, throughput, and quality, supporting next-generation model development.
- Partner with the AI team and leadership to shape a strategic dataset roadmap for future consumer and enterprise products.
Qualifications
- BS, MS, or PhD in Computer Science or a related field.
- At least 5 years of professional software development experience.
- Strong skills in bash and Python scripting within Linux environments.
- Hands-on experience with Docker, Infrastructure-as-Code, and at least one major cloud provider (GCP preferred).
- Familiarity with web crawlers and large-scale data processing workflows is a plus.
- Comfortable multitasking and adjusting to changing priorities.
- Excellent written and verbal communication skills.
Location
This position is based in Eugene, OR, USA.
