About the job
Join our team at the Allen Institute for AI in Seattle, where we are dedicated to pioneering advancements in artificial intelligence research. This position requires on-site collaboration, with specific arrangements varying by team. Inquire with your recruiter for details.
Our competitive salary range for this role is between $126,000 and $189,000, supplemented by an attractive bonus structure.
About You:
We are seeking a talented Senior Data Engineer to enhance the data infrastructure that powers our AI research initiatives. You will significantly contribute to the Semantic Scholar corpus by expanding its scope and elevating the quality of existing data. This role involves creating scalable APIs and tools that support our AI agents in their exploration of scholarly literature.
Your work will bridge data engineering with applied machine learning, allowing you to manage data pipelines, design schemas, and deploy production services while implementing practical machine learning techniques, such as entity resolution and text classification, to refine data quality and enrich metadata.
About Us:
The Agentic Applications team at the Allen Institute for AI is dedicated to building robust, open-source systems that facilitate scientific discovery and large-scale AI research. We focus on developing high-quality structured datasets, integrating diverse content types, and enabling applications for search, citation analysis, and model training. Our team emphasizes strong engineering practices and close collaboration with Ai2’s product and research organizations to deliver tools and infrastructure utilized by millions of researchers and developers around the globe.
Your Responsibilities:
- Enhance the coverage and quality of the Semantic Scholar corpus, including academic papers, patents, and specialized datasets.
- Develop and maintain scalable data pipelines for corpus integration, citation resolution, and metadata enhancement.
- Implement and launch machine learning models for tasks such as entity disambiguation, author linking, and topic classification.
- Design and improve APIs that provide structured scholarly data for academic researchers and AI workflows.
- Contribute to the development of dashboards and tools that assess data quality and model performance.
- Work collaboratively with engineering and research teams to ensure code maintainability, test coverage, and reliable deployment.
