Fallstudie | NIO Improves GPU Utilization for Autonomous Driving Workloads with HAMi
Discover how NIO adopted a hybrid GPU sharing strategy using HAMi to achieve 10× GPU utilization improvement in CI pipelines and 30% reduction in GPU hours for simulation workloads across 600 GPUs supporting autonomous driving AI infrastructure.
Unternehmensuebersicht
NIO operates large-scale cloud infrastructure to support autonomous driving workloads, including model training, simulation, CI/testing, and online inference. The team focuses on GPU performance optimization and participates in GPU and compute resource planning decisions for their comprehensive autonomous driving platform.
Large-scale GPU cluster: 600 GPUs across ~80 nodes
Diverse autonomous driving workloads: training, simulation, CI/testing, inference
Focus on GPU performance optimization and resource planning
Hybrid GPU sharing strategy for different workload types
NIO
Leading electric vehicle company with autonomous driving AI platform
Challenge: Low GPU Utilization Across Diverse Workloads
NIO's large-scale cloud infrastructure supports diverse autonomous driving workloads. This diversity led to persistent efficiency challenges due to workload-resource mismatch, with several specific pain points.
CI and Testing Tasks
Most execution time spent on CPU-intensive operations with 5–10% GPU utilization
Simulation Workloads
Low compute requirements suitable for concurrent execution on shared GPUs
Online Inference
Many services require only ¼ or ½ of a GPU
Limited GPU Cluster
Time-based billing from public cloud providers makes low utilization costly
Hybride GPU-Sharing-Strategie mit HAMi
NIO adopted a hybrid GPU sharing strategy, selecting different GPU allocation mechanisms based on workload characteristics rather than enforcing a single approach. HAMi extends Kubernetes with fine-grained GPU sharing capabilities through scheduler extensions and device plugin integration.
Evaluated Approaches
NVIDIA MIG
Strong isolation but predefined partition sizes
Time-Slicing
Minimal overhead but lacks strict limits
HAMi (CNCF Sandbox)
Fine-grained control over memory and compute
Production Strategy: Combining Multiple Approaches
Rather than replacing existing mechanisms, NIO combined them for optimal efficiency across different workload types.
MIG: Algorithm development and strong isolation environments
HAMi: CI tasks and selected inference and simulation workloads
Time-slicing: Workloads that can tolerate resource contention
Implementation Details
HAMi deployed on approximately 50–70 active nodes with 400–560 GPUs
Resource Allocation Design
For simulation workloads, NIO treated GPU memory (VRAM) as the primary constraint since insufficient compute capacity can be mitigated through scheduling delay, but memory exhaustion immediately triggers OOM failure.
Finer partitioning is not always better. For certain simulation workloads, allocating approximately 1/6 of a GPU provided optimal efficiency. Smaller fractions (such as 1/8) introduced additional scheduling and virtualization overhead, reducing overall throughput.
Impact: Dramatic Efficiency Gains
The hybrid GPU sharing strategy delivered measurable improvements across multiple workload types, increasing overall system throughput while directly reducing GPU costs.
CI Workloads Utilization
5% → 30-50%
4× improvement in CI pipelines
Simulation GPU Hours
-30%
Reduction in GPU hours
Simulation Task Duration
3 days → 2 days
End-to-end time reduction
Deployment Scale
400-560 GPUs
Across 50-70 nodes using HAMi
Total Infrastructure
600 GPUs
Across ~80 nodes
Strategy
Hybrid
HAMi + MIG + Time-slicing
Erkenntnisse
NIO's deployment of HAMi provided valuable insights for organizations implementing GPU virtualization at scale.
GPU Partitioning Optimization
Each workload has an optimal partition size. Over-fragmentation can reduce efficiency.
Performance Validation
Version upgrades require performance validation with phased upgrade strategy.
Production Safety
Device plugin upgrades follow blue–green deployment process for online inference.
Toolchain Compatibility
Certain compiler features may conflict with GPU interception mechanisms.
“NIO's deployment of HAMi demonstrates how fine-grained GPU sharing can significantly improve infrastructure efficiency for autonomous driving workloads. By combining HAMi with Kubernetes and existing GPU allocation mechanisms such as MIG and time-slicing, NIO increased GPU utilization, reduced overall GPU hours, and improved workload throughput without compromising stability.”
Hybrid Strategy Success
NIO's hybrid resource management strategy enables the company to support diverse AI workloads—from CI pipelines to simulation and inference—more efficiently within the same Kubernetes environment. This approach demonstrates how combining HAMi with other GPU allocation mechanisms can deliver optimal results for different workload types.