Fallstudie | NIO Improves GPU Utilization for Autonomous Driving Workloads with HAMi

Discover how NIO adopted a hybrid GPU sharing strategy using HAMi to achieve 10× GPU utilization improvement in CI pipelines and 30% reduction in GPU hours for simulation workloads across 600 GPUs supporting autonomous driving AI infrastructure.

600

GPUs across ~80 nodes

10×

GPU utilization improvement in CI

30%

reduction in GPU hours for simulation

Unternehmensuebersicht

NIO operates large-scale cloud infrastructure to support autonomous driving workloads, including model training, simulation, CI/testing, and online inference. The team focuses on GPU performance optimization and participates in GPU and compute resource planning decisions for their comprehensive autonomous driving platform.

Large-scale GPU cluster: 600 GPUs across ~80 nodes

Diverse autonomous driving workloads: training, simulation, CI/testing, inference

Focus on GPU performance optimization and resource planning

Hybrid GPU sharing strategy for different workload types

NIO

Leading electric vehicle company with autonomous driving AI platform

Challenge: Low GPU Utilization Across Diverse Workloads

NIO's large-scale cloud infrastructure supports diverse autonomous driving workloads. This diversity led to persistent efficiency challenges due to workload-resource mismatch, with several specific pain points.

CI and Testing Tasks

Most execution time spent on CPU-intensive operations with 5–10% GPU utilization

Simulation Workloads

Low compute requirements suitable for concurrent execution on shared GPUs

Online Inference

Many services require only ¼ or ½ of a GPU

Limited GPU Cluster

Time-based billing from public cloud providers makes low utilization costly

Hybride GPU-Sharing-Strategie mit HAMi

NIO adopted a hybrid GPU sharing strategy, selecting different GPU allocation mechanisms based on workload characteristics rather than enforcing a single approach. HAMi extends Kubernetes with fine-grained GPU sharing capabilities through scheduler extensions and device plugin integration.

Evaluated Approaches

NVIDIA MIG

Strong isolation but predefined partition sizes

Time-Slicing

Minimal overhead but lacks strict limits

HAMi (CNCF Incubating)

Fine-grained control over memory and compute

Production Strategy: Combining Multiple Approaches

Rather than replacing existing mechanisms, NIO combined them for optimal efficiency across different workload types.

MIG: Algorithm development and strong isolation environments

HAMi: CI tasks and selected inference and simulation workloads

Time-slicing: Workloads that can tolerate resource contention

Implementation Details

HAMi deployed on approximately 50–70 active nodes with 400–560 GPUs

Resource Allocation Design

For simulation workloads, NIO treated GPU memory (VRAM) as the primary constraint since insufficient compute capacity can be mitigated through scheduling delay, but memory exhaustion immediately triggers OOM failure.

Finer partitioning is not always better. For certain simulation workloads, allocating approximately 1/6 of a GPU provided optimal efficiency. Smaller fractions (such as 1/8) introduced additional scheduling and virtualization overhead, reducing overall throughput.

Impact: Dramatic Efficiency Gains

The hybrid GPU sharing strategy delivered measurable improvements across multiple workload types, increasing overall system throughput while directly reducing GPU costs.

CI Workloads Utilization

5% → 30-50%

4× improvement in CI pipelines

Simulation GPU Hours

-30%

Reduction in GPU hours

Simulation Task Duration

3 days → 2 days

End-to-end time reduction

Deployment Scale

400-560 GPUs

Across 50-70 nodes using HAMi

Total Infrastructure

600 GPUs

Across ~80 nodes

Strategy

Hybrid

HAMi + MIG + Time-slicing

Erkenntnisse

NIO's deployment of HAMi provided valuable insights for organizations implementing GPU virtualization at scale.

GPU Partitioning Optimization

Each workload has an optimal partition size. Over-fragmentation can reduce efficiency.

Performance Validation

Version upgrades require performance validation with phased upgrade strategy.

Production Safety

Device plugin upgrades follow blue–green deployment process for online inference.

Toolchain Compatibility

Certain compiler features may conflict with GPU interception mechanisms.

“NIO's deployment of HAMi demonstrates how fine-grained GPU sharing can significantly improve infrastructure efficiency for autonomous driving workloads. By combining HAMi with Kubernetes and existing GPU allocation mechanisms such as MIG and time-slicing, NIO increased GPU utilization, reduced overall GPU hours, and improved workload throughput without compromising stability.”

NIO Infrastructure Team

Hybrid Strategy Success

NIO's hybrid resource management strategy enables the company to support diverse AI workloads—from CI pipelines to simulation and inference—more efficiently within the same Kubernetes environment. This approach demonstrates how combining HAMi with other GPU allocation mechanisms can deliver optimal results for different workload types.

HAMi erkunden Kontakt