Zyphra logoZyphra logo

Data Engineer - Multimodal Systems

ZyphraSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Entry Level

Qualifications

To excel as a Data Engineer - Multimodal Systems at Zyphra, candidates should possess strong implementation skills, a collaborative spirit, and the ability to learn rapidly in a fast-paced environment. Candidates should be adept at transforming concepts into experiments, have a meticulous attention to detail, and demonstrate effective communication skills.

About the job

Zyphra is a cutting-edge artificial intelligence firm located in the heart of San Francisco, California, dedicated to advancing technology across various modalities.

About the Position:

We are seeking a Data Engineer - Multimodal Systems to play a pivotal role in the enhancement and expansion of Zyphra's datasets and data pipelines. This position offers a unique opportunity to collaborate with diverse teams and contribute to innovative data solutions. You will engage in the collection of extensive datasets and the development and optimization of high-performance parallel data pipelines.

Your Responsibilities Will Include:

  • Executing large-scale data collection across multiple modalities, including text, audio, and image.

  • Designing and implementing highly efficient, parallelized data processing pipelines that integrate various modalities.

  • Conducting rigorous experimental ablations to evaluate the effectiveness of new data enhancements.

Candidate Requirements:

  • Proven ability in implementation and prototyping.

  • Capability to transform ideas into experimental frameworks swiftly.

  • Strong collaborative skills, thriving in a dynamic research environment.

  • Eagerness to learn and apply new concepts effectively.

  • Exceptional communication and teamwork skills, capable of contributing to both research and large-scale engineering projects.

Preferred Qualifications:

  • Experience in the collection, management, and processing of large datasets.

  • Familiarity with parallel programming frameworks in Python, such as Dask.

  • In-depth understanding of state-of-the-art dataset curation practices.

  • A detail-oriented mindset with a passion for data integrity and verification.

  • Strong foundation in experimental methodologies for conducting thorough ablation studies and hypothesis testing.

  • Knowledge and interest in large-scale, highly parallel data processing systems.

  • Proficiency in PyTorch and Python.

  • Experience with large, complex codebases and the ability to quickly become productive within them.

  • Published research in respected machine learning venues.

  • Postgraduate degree in a relevant field is a plus.

About Zyphra

Zyphra is at the forefront of artificial intelligence innovation, striving to harness the power of data to drive advancements across various industries. Our San Francisco headquarters serves as a hub for creativity and technological development, where passionate individuals come together to push the boundaries of what’s possible.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.