About the job
Axiado, a leading manufacturer of Trusted Control/Compute Unit (TCU) solutions, is seeking a Senior QA Engineer – Performance & Reliability to spearhead the performance characterization and reliability validation of our Secure TCU System. The ideal candidate will ensure alignment with stringent data center standards.
In this pivotal role, you will take ownership of test design, execution, and in-depth analysis for performance and reliability, collaborating closely with development teams to pinpoint bottlenecks and address complex system-level challenges.
Key Responsibilities:
Performance & Reliability Strategy
- Test Design & Execution: Craft and implement extensive test plans for performance benchmarking, stress testing, longevity/endurance testing, and thermal/power characterization of TCU/BMC systems.
- Workload Analysis: Evaluate system behavior under heavy workloads to detect performance bottlenecks in throughput, latency, and resource utilization (CPU, Memory, PCIe).
- Reliability Validation: Execute Mean Time Between Failures (MTBF) predictions, long-duration stability tests, and error injection campaigns to confirm system robustness.
Deep Dive & Issue Resolution
- Root Cause Analysis: Lead comprehensive investigations into performance degradation and reliability failures. Utilize advanced debugging tools (oscilloscopes, logic analyzers, firmware traces) to isolate issues.
- Developer Collaboration: Partner with firmware and hardware engineers to reproduce complex bugs, analyze crash dumps, and validate fixes.
- Infrastructure Enhancement: Create and maintain automated performance testing frameworks and reporting dashboards to monitor regression and trends over time.
Reporting & Leadership
- Reporting: Generate thorough performance assessment reports and reliability analysis metrics for stakeholders.
- Mentorship: Guide junior engineers on performance testing methodologies and system debugging techniques.
