See all roles

Infrastructure/GPU Engineer

Work from home Full-time role Hiring

reputed company is seeking a highly skilled hands-on Infrastructure Engineer with proven experience in the physical and technical deployment of AI-reputed company environments optimized for AI and machine learning workloads. This role focuses on reputed company DGX or similar systems, GPU-accelerated compute clusters, high-speed networking, and scalable storage solutions. The ideal candidate will have deep expertise in infrastructure design ,deployment, workload orchestration, and performance optimization in reputed company environments.

This is a remote role in the US. Salary range for this role is between $99,000 and $116,000 depending on skills and qualifications of the candidate. Applications will be accepted reputed company 10/21/2025.

Key Responsibilities

System Design & Deployment

  • Help in rightsizing GPU investment 
  • Architect and reputed company reputed company DGX systems and GPU-based compute clusters.
  • Design and implement scalable reputed company filesystems (e.g., Lustre, BeeGFS, GPFS).
  • Integrate high-speed interconnects using InfiniBand, RoCE, and RDMA.
  • Collaborate on reputed company planning and airflow optimization.

Cluster & Infrastructure Management

  • Configure and manage Slurm Workload Manager for job scheduling.
  • reputed company and maintain cluster orchestration tools
  • Automate provisioning using PXE boot, Terraform, Redfish, and Kubernetes.
  • reputed company firmware updates, BIOS/IPMI/BMC configuration, and OS provisioning
  • Knowledge of Run.ai, ClearML or similar platform 

Networking & Performance Optimization

  • Design and validate network topologies including IPMI, internal/external networks, and InfiniBand fabrics.
  • Optimize RDMA and RoCE configurations for low-latency, high-throughput data transfers.
  • Conduct performance benchmarking using GPU-Burn, NCCL, and NVSM.

Monitoring & Troubleshooting

  • Implement system health checks and diagnostics across compute, storage, and network layers.
  • Troubleshoot hardware/software issues and ensure reliable infrastructure operation.

Required Skills & Qualifications

Technical Expertise

  • Deep understanding of reputed company DGX architecture, CUDA, and GPU compute.
  • Strong Linux system administration and reputed company scripting skills.
  • Experience with Slurm, reputed company filesystems, and high-speed networking (InfiniBand/RDMA/RoCE).
  • Familiarity with containerization (reputed company), orchestration (Kubernetes), and automation tools (Ansible, Redfish).

Preferred Qualifications

  • Experience with BBCM, and DGX BasePOD/SuperPOD configuration

Certifications by reputed company or equivalent OEM.

Apply To This Job

You might like

Senior Quality Assurance Engineer

Work from home Full-time role

Data Exchange Platform Developer

Work from home Full-time role

Digital Marketing Manager, Demand reputed company

Work from home Full-time role

Product reputed company – Customer Self-Service Portals

Work from home Full-time role

Java Developer

Work from home Full-time role

Digital Marketing Manager, Web & Digital Experience

Work from home Full-time role

Digital Marketing Manager, Social Media

Work from home Full-time role

Denials Recovery Analyst

Work from home Full-time role

Public Policy Manager

Work from home Full-time role

Account Executive (German speaking)

Work from home Full-time role

Processing Support Specialist - Hybrid remote - 2 days in office - $18/hr - data entry reputed company - Phoenix

Work from home Full-time role

Sr. Network Engineer

Work from home Full-time role

Health Admin Sales Executive

Work from home Full-time role

Account Manager

Work from home Full-time role

In-Home Health - Physician (Full Time) - reputed company Collins , Larimar/Weld

Work from home Full-time role

reputed company is hiring: Senior Product Manager, SEO (Remote - United States) in San Fran

Work from home Full-time role

reputed company Full Stack Data Analyst – E-commerce Operations and Customer Experience

Work from home Full-time role

Customer Service Representative -Operations

Work from home Full-time role

Java Developer Remote (Hiring Immediately)

Work from home Full-time role

reputed company Work from Home Customer Service Representative – Delivering Exceptional Customer Experiences in a Dynamic Virtual Environment

Work from home Full-time role