Case Study | NIO Improves GPU Utilization for Autonomous Driving Workloads with HAMi

Discover how NIO adopted a hybrid GPU sharing strategy using HAMi to achieve 10× GPU utilization improvement in CI pipelines and 30% reduction in GPU hours for simulation workloads across 600 GPUs supporting autonomous driving AI infrastructure.

600

GPUs across ~80 nodes

10×

GPU utilization improvement in CI

30%

reduction in GPU hours for simulation

Company Overview

NIO operates large-scale cloud infrastructure to support autonomous driving workloads, including model training, simulation, CI/testing, and online inference. The team focuses on GPU performance optimization and participates in GPU and compute resource planning decisions for their comprehensive autonomous driving platform.

Large-scale GPU cluster: 600 GPUs across ~80 nodes

Diverse autonomous driving workloads: training, simulation, CI/testing, inference

Focus on GPU performance optimization and resource planning

Hybrid GPU sharing strategy for different workload types

NIO

Leading electric vehicle company with autonomous driving AI platform

Challenge: Low GPU Utilization Across Diverse Workloads

NIO's large-scale cloud infrastructure supports diverse autonomous driving workloads. This diversity led to persistent efficiency challenges due to workload-resource mismatch, with several specific pain points.

CI and Testing Tasks

Most execution time for CI tasks spent on CPU-intensive operations (compilation, file fetching, preprocessing). GPUs used intermittently with only 5–10% average utilization under full-GPU allocation.

Simulation Workloads

Simulation pipelines process video streams, radar data, and inference validation. Individual tasks have relatively low compute requirements, making them suitable for concurrent execution on shared GPUs.

Online Inference

Many inference services require only ¼ or ½ of a GPU. Allocating an entire GPU is both inefficient and costly for these small workloads.

Limited GPU Cluster

The GPU cluster size is limited, with part of capacity sourced from public cloud providers with time-based billing. Low GPU utilization wastes GPU hours and increases operational costs.

Hybrid GPU Sharing Strategy with HAMi

NIO adopted a hybrid GPU sharing strategy, selecting different GPU allocation mechanisms based on workload characteristics rather than enforcing a single approach. HAMi extends Kubernetes with fine-grained GPU sharing capabilities through scheduler extensions and device plugin integration.

Evaluated Approaches

NVIDIA MIG

Provides strong isolation but supports only predefined partition sizes, making it difficult to match finer-grained requirements (e.g., 1/6 or 1/8 of a GPU).

Time-Slicing

Allows workloads to compete freely for GPU resources with minimal overhead. However, it lacks strict limits on memory and compute usage, making it unsuitable for certain production workloads.

HAMi (CNCF Sandbox)

Supports fine-grained control over both GPU memory and compute allocation, enabling proportional allocation based on actual workload requirements with minimal overhead.

Production Strategy: Combining Multiple Approaches

Rather than replacing existing mechanisms, NIO combined them for optimal efficiency across different workload types.

MIG: Used for algorithm development and environments requiring strong isolation

HAMi: Used for CI tasks and selected inference and simulation workloads

Time-slicing: Used for workloads that can tolerate resource contention

Implementation Details

HAMi deployed on approximately 50–70 active nodes with 400–560 GPUs

Resource Allocation Design

For simulation workloads, NIO treated GPU memory (VRAM) as the primary constraint since insufficient compute capacity can be mitigated through scheduling delay, but memory exhaustion immediately triggers OOM failure.

Finer partitioning is not always better. For certain simulation workloads, allocating approximately 1/6 of a GPU provided optimal efficiency. Smaller fractions (such as 1/8) introduced additional scheduling and virtualization overhead, reducing overall throughput.

Impact: Dramatic Efficiency Gains

The hybrid GPU sharing strategy delivered measurable improvements across multiple workload types, increasing overall system throughput while directly reducing GPU costs.

CI Workloads Utilization

5% → 30-50%

By partitioning GPUs into ¼ or smaller fractions, effective GPU utilization in CI pipelines increased by approximately 4×

Simulation GPU Hours

-30%

Fine-grained GPU sharing reduced overall GPU hours by approximately 30%

Simulation Task Duration

3 days → 2 days

End-to-end simulation tasks reduced from about 3 days to about 2 days

Deployment Scale

400-560 GPUs

Across 50-70 nodes using HAMi

Total Infrastructure

600 GPUs

Across ~80 nodes supporting autonomous driving workloads

Strategy

Hybrid

HAMi + MIG + Time-slicing for optimal results

Lessons Learned

NIO's deployment of HAMi provided valuable insights for organizations implementing GPU virtualization at scale.

GPU Partitioning Optimization

GPU partitioning is not 'the finer, the better.' Each workload has an optimal partition size. Over-fragmentation can reduce efficiency.

Performance Validation

Version upgrades require performance validation. New GPU virtualization components may introduce performance regressions. NIO adopted a phased upgrade strategy, validating performance benchmarks before allowing different HAMi components to run at different versions.

Production Safety

Operational changes must ensure production safety. For online inference workloads, device plugin upgrades follow a blue–green deployment–like process: traffic is migrated first, new Pods are deployed, and old instances are gradually decommissioned.

Toolchain Compatibility

Toolchain compatibility is critical. Certain compiler or library features (such as pointer or address analysis) may conflict with GPU interception mechanisms. Careful trade-offs between functionality and stability are required.

“NIO's deployment of HAMi demonstrates how fine-grained GPU sharing can significantly improve infrastructure efficiency for autonomous driving workloads. By combining HAMi with Kubernetes and existing GPU allocation mechanisms such as MIG and time-slicing, NIO increased GPU utilization, reduced overall GPU hours, and improved workload throughput without compromising stability.”

NIO Infrastructure Team

Hybrid Strategy Success

NIO's hybrid resource management strategy enables the company to support diverse AI workloads—from CI pipelines to simulation and inference—more efficiently within the same Kubernetes environment. This approach demonstrates how combining HAMi with other GPU allocation mechanisms can deliver optimal results for different workload types.

Explore HAMi Contact Us