Regions

Remote

Location

Remote

Disciplines

Embedded Software & Electronics

Job types

Remote Work

Industry

Technology, Media, & Telecommunication

Salary

Market related

Functions

Engineer

Seniority

Mid-level
Senior

Technologies

Job reference

113424

Senior GPU Cluster Engineer

Location: remote (Europe / North America / Israel)

Industry: AI | HPC | Cloud Computing

Why Work With Us

We’re building the next generation of cloud infrastructure purpose-built for the global AI economy. Our platform enables organizations to tackle complex, real-world challenges-without the cost of legacy infrastructure or the need to maintain large in-house AI/ML teams.

You’ll join a team of experienced engineers and innovators working at the forefront of high-performance computing, distributed systems, and AI cloud infrastructure.

Our Global Presence

Headquartered in Amsterdam and listed on Nasdaq, we operate R&D hubs across Europe, North America, and Israel. Our global team includes over 800 employees-more than 400 of whom are highly skilled engineers working across hardware design, systems software, networking, and AI/ML infrastructure.

The Role

We’re looking for a Senior HPC Cluster Engineer to join our GPU & InfiniBand Engineering Team. You’ll work on the core components of our hyperscale platform, with a focus on GPU computing, InfiniBand networking, and hardware virtualization technologies like KVM/QEMU.

This role is highly technical and hands-on. You’ll be responsible for integrating new hardware, tuning performance, resolving complex system issues, and building automated monitoring and fault-resolution systems for large-scale GPU clusters.

Responsibilities

Optimize GPU cluster and InfiniBand network performance for HPC and AI workloads
Analyze and resolve low-level hardware/software issues in GPU and InfiniBand environments
Integrate new GPU hardware and support it through the Kubernetes, QEMU, and KVM stacks
Build and enhance automation tools for system monitoring, diagnostics, and fault recovery
Configure and maintain GPU devices and InfiniBand fabrics for reliability and scale

Requirements

5+ years in system-level software engineering (performance, infrastructure, low-level development)
3+ years hands-on with Linux systems (tuning, debugging, admin)
Solid understanding of server hardware architecture, including PCIe, NICs, and Linux internals
Proficiency in performance-oriented languages such as C/C++, Go, or Python

Preferred Qualifications

Experience with GPU cluster validation and testing over InfiniBand
Proven performance tuning of HPC or AI/ML workloads
Knowledge of RDMA, RoCE, and InfiniBand protocols
Familiarity with Software-Defined Networking (SDN) and high-performance networking
Understanding of QEMU/KVM, virtualization technologies, and driver integration
Experience with PyTorch, TensorFlow, or other deep learning frameworks
Familiarity with MPI, NCCL, or other collective communication libraries

What We Offer

Competitive compensation and full benefits package
Clear technical career path and professional development support
A collaborative, high-impact engineering environment focused on innovation and growth

Apply for job