Regions
Location
  • Remote
Job types
  • Remote Work
Industry
  • Technology, Media, & Telecommunication
Salary

Market related

Functions
  • Engineer
Seniority
  • Mid-level
  • Senior
Technologies
  • C
Job reference

113424

Senior GPU Cluster Engineer

Location: remote (Europe / North America / Israel)

Industry: AI | HPC | Cloud Computing

 

Why Work With Us

We’re building the next generation of cloud infrastructure purpose-built for the global AI economy. Our platform enables organizations to tackle complex, real-world challenges-without the cost of legacy infrastructure or the need to maintain large in-house AI/ML teams.

You’ll join a team of experienced engineers and innovators working at the forefront of high-performance computing, distributed systems, and AI cloud infrastructure.

Our Global Presence

Headquartered in Amsterdam and listed on Nasdaq, we operate R&D hubs across Europe, North America, and Israel. Our global team includes over 800 employees-more than 400 of whom are highly skilled engineers working across hardware design, systems software, networking, and AI/ML infrastructure.

The Role

We’re looking for a Senior HPC Cluster Engineer to join our GPU & InfiniBand Engineering Team. You’ll work on the core components of our hyperscale platform, with a focus on GPU computing, InfiniBand networking, and hardware virtualization technologies like KVM/QEMU.

This role is highly technical and hands-on. You’ll be responsible for integrating new hardware, tuning performance, resolving complex system issues, and building automated monitoring and fault-resolution systems for large-scale GPU clusters.

Responsibilities

  • Optimize GPU cluster and InfiniBand network performance for HPC and AI workloads
  • Analyze and resolve low-level hardware/software issues in GPU and InfiniBand environments
  • Integrate new GPU hardware and support it through the Kubernetes, QEMU, and KVM stacks
  • Build and enhance automation tools for system monitoring, diagnostics, and fault recovery
  • Configure and maintain GPU devices and InfiniBand fabrics for reliability and scale

Requirements

  • 5+ years in system-level software engineering (performance, infrastructure, low-level development)
  • 3+ years hands-on with Linux systems (tuning, debugging, admin)
  • Solid understanding of server hardware architecture, including PCIe, NICs, and Linux internals
  • Proficiency in performance-oriented languages such as C/C++, Go, or Python

Preferred Qualifications

  • Experience with GPU cluster validation and testing over InfiniBand
  • Proven performance tuning of HPC or AI/ML workloads
  • Knowledge of RDMA, RoCE, and InfiniBand protocols
  • Familiarity with Software-Defined Networking (SDN) and high-performance networking
  • Understanding of QEMU/KVM, virtualization technologies, and driver integration
  • Experience with PyTorch, TensorFlow, or other deep learning frameworks
  • Familiarity with MPI, NCCL, or other collective communication libraries

What We Offer

  • Competitive compensation and full benefits package
  • Clear technical career path and professional development support
  • A collaborative, high-impact engineering environment focused on innovation and growth

     

Apply for job

You can apply to this job and others using your online CV. Click the link below to start