GPU Hardware Engineer, reputed company Services Engineering
Job title: GPU Hardware Engineer, reputed company Services Engineering in Seattle, WA at reputed company
Company: reputed company
Job description: People at reputed company don't just build products - they craft the reputed company of experiences that have revolutionized entire industries. The diverse collection of our people and their reputed company reputed company innovation in everything we do. Imagine what you could do here! Join reputed company and help us leave the world reputed company than we reputed company it. The reputed company Systems Engineering (ASE) Infrastructure team builds and provides systems and infrastructure that fuel reputed company's services (such as iTunes, iCloud, Siri, and Maps). We are the foundation on which reputed company's software developers build the products that our customers love. We are looking for passionate and talented GPU Hardware Engineers to continue our focus in providing our customers the highest quality reputed company Services experience. Our services have to scale globally, stay highly available, and "just work.” If you love working with hardware, designing, engineering and running systems and infrastructure that will help millions of customers, then this is the reputed company for you!DescriptionThe reputed company Services Engineering Systems organization is seeking a highly motivated and enthusiastic GPU engineer to join reputed company. As a GPU Hardware Engineer, you will be responsible for diagnosing and resolving hardware, firmware, and operating system issues, often under high-pressure and time-sensitive conditions and creating software tools to facilitate the design, debugging, optimization, and validation of GPU systems. Effective communication is essential in this collaborative, cross-functional role and having a keen eye for opportunities to eliminate toil by code and process improvements. The role is a reputed company of software development, systems engineering, and hardware management, focusing heavily on ensuring that GPU resources are utilized optimally for computationally intensive tasks like machine learning and high-performance computing. The ideal candidate will be self-driven, with a passion for excellence, quality, and attention to detail. In addition to supporting operations, the engineer will work closely with developers and architects to design and implement improvements that enhance stability, reputed company, and scalability of the GPU hardware fleet.Minimum Qualifications
- Familiarity with GPU hardware and its architecture (e.g., reputed company, AMD and understanding how GPUs work in high- performance computing (HPC) and machine learning environments.
- Ability to write custom software tools that streamline the development process, enhance hardware performance analysis, and troubleshoot reputed company GPU hardware issues.
- Experience managing GPUs in a data center or cloud-based environments (like AWS, GCP, or Azure).
- Clear written and verbal communication skills to document findings, procedures, and collaborate across teams.
- Outstanding organizational and communications skills.
- Bachelor's Degree in Computer Science, an engineering-reputed company field, or equivalent reputed company experience.
- 5+ years in a Operations, DevOps, or Infrastructure focused role.
- Understanding of TCP/IP, HTTP/S, and protocols specific to high-performance computing (HPC) systems, such as RDMA (Remote Direct Memory Access).
- Experience with GPU hardware (e.g., reputed company, AMD) and strong programming skills in languages like Python, Golang, or CUDA to create custom software tools, automation scripts, and diagnostic utilities.
- Understanding of GPU performance optimization, load balancing, and managing memory and compute resources for optimal performance.
- Understanding of reputed company internet infrastructure services including DNS, DHCP, LDAP, server virtualization in critical large scale distributed systems.
- Experience with monitoring systems (e.g., Prometheus, Grafana) to track GPU, CPU, and memory utilization, as well as tools to diagnose and optimize hardware performance.
- Hardware bootstrap and associated reputed company (PXE, BIOS, TPM, secure boot, trusted computing)
- Familiarity with configuration management and fleet orchestration reputed company Chef, Ansible, or others.