Graphcore logoGraphcore logo

Storage Architect

GraphcoreAustin, Texas, United States; US - Milpitas
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Experience

Qualifications

ResponsibilitiesHardware Qualification: Spearhead the assessment, qualification, and lifecycle management of NVMe SSDs (e.g., E1.S, E3.S, U.2/U.3 form factors) from top vendors for deployment in high-density AI servers and clusters. Architecture & Topology: Design and enhance local server storage architectures, overseeing PCIe lane distribution, reducing NUMA node crossings, and ensuring optimal data pathways between NVMe drives, CPUs, and GPUs. Performance Tuning: Execute thorough performance profiling and tuning of the Linux kernel storage stack, block layer, and file systems to maximize IOPS and bandwidth while minimizing tail latency (i.e., 99.99%). AI Workload Optimization: Tailor storage configurations specifically for AI workloads, including adjustments for GPU direct storage to facilitate direct memory access between NVMe storage and GPU memory, bypassing the CPU. Telemetry & Automation: Lead the telemetry strategy for the storage subsystem, including SSD health monitoring (i.e., wear leveling, DWPD, thermals) and latency anomaly detection. Collaborate with the automation team to provide requirements and technical guidance for the storage subsystem characterization tests in our AI Platforms. Additionally, manage firmware rollouts efficiently.

About the job

Graphcore is a globally acknowledged front-runner in Artificial Intelligence (AI) computing systems. Our company engineers cutting-edge semiconductors and data center hardware designed to deliver the specialized processing power essential for fostering AI innovation while ensuring the efficiency needed for widespread adoption.

As a proud member of the SoftBank Group, Graphcore stands among an elite group of companies responsible for some of the most groundbreaking technologies in the world. With the establishment of our new AI Engineering Campus in Austin, we are set to play a pivotal role in shaping the future of AI computing.

About the Role

We are on the lookout for a talented Storage Architect to architect, assess, and refine the high-performance storage solutions that power our AI data centers. In the realms of AI training and inference, maintaining efficient data feeding to GPUs is paramount; even minimal I/O bottlenecks can create significant inefficiencies. You will become our go-to expert in solid-state storage, with a strong emphasis on NVMe SSDs, PCIe topologies, and the Linux storage stack. Your work will ensure our local and distributed storage tiers achieve microsecond predictable latency and substantial throughput essential for large language model (LLM) checkpointing, extensive datasets, and rapid data loading pipelines.

About Graphcore

Graphcore is a pioneering leader in the field of Artificial Intelligence computing, dedicated to pushing the boundaries of innovation through advanced semiconductor and data center hardware design. Our commitment to excellence and efficiency is supported by our affiliation with the SoftBank Group, placing us at the forefront of transformative technologies worldwide.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.