Case Study | NIO Improves GPU Utilization for Autonomous Driving Workloads with HAMi
Discover how NIO adopted a hybrid GPU sharing strategy using HAMi to achieve 10× GPU utilization improvement in CI pipelines and 30% reduction in GPU hours for simulation workloads across 600 GPUs supporting autonomous driving AI infrastructure.
Company Overview
NIO operates large-scale cloud infrastructure to support autonomous driving workloads, including model training, simulation, CI/testing, and online inference. The team focuses on GPU performance optimization and participates in GPU and compute resource planning decisions for their comprehensive autonomous driving platform.
Large-scale GPU cluster: 600 GPUs across ~80 nodes
Diverse autonomous driving workloads: training, simulation, CI/testing, inference
Focus on GPU performance optimization and resource planning
Hybrid GPU sharing strategy for different workload types
NIO
Leading electric vehicle company with autonomous driving AI platform
Challenge: Low GPU Utilization Across Diverse Workloads
NIO's large-scale cloud infrastructure supports diverse autonomous driving workloads. This diversity led to persistent efficiency challenges due to workload-resource mismatch, with several specific pain points.
CI and Testing Tasks
Most execution time for CI tasks spent on CPU-intensive operations (compilation, file fetching, preprocessing). GPUs used intermittently with only 5–10% average utilization under full-GPU allocation.
Simulation Workloads
Simulation pipelines process video streams, radar data, and inference validation. Individual tasks have relatively low compute requirements, making them suitable for concurrent execution on shared GPUs.
Online Inference
Many inference services require only ¼ or ½ of a GPU. Allocating an entire GPU is both inefficient and costly for these small workloads.
Limited GPU Cluster
The GPU cluster size is limited, with part of capacity sourced from public cloud providers with time-based billing. Low GPU utilization wastes GPU hours and increases operational costs.
Hybrid GPU Sharing Strategy with HAMi
NIO adopted a hybrid GPU sharing strategy, selecting different GPU allocation mechanisms based on workload characteristics rather than enforcing a single approach. HAMi extends Kubernetes with fine-grained GPU sharing capabilities through scheduler extensions and device plugin integration.
Evaluated Approaches
NVIDIA MIG
Provides strong isolation but supports only predefined partition sizes, making it difficult to match finer-grained requirements (e.g., 1/6 or 1/8 of a GPU).
Time-Slicing
Allows workloads to compete freely for GPU resources with minimal overhead. However, it lacks strict limits on memory and compute usage, making it unsuitable for certain production workloads.
HAMi (CNCF Sandbox)
Supports fine-grained control over both GPU memory and compute allocation, enabling proportional allocation based on actual workload requirements with minimal overhead.
Production Strategy: Combining Multiple Approaches
Rather than replacing existing mechanisms, NIO combined them for optimal efficiency across different workload types.
MIG: Used for algorithm development and environments requiring strong isolation
HAMi: Used for CI tasks and selected inference and simulation workloads
Time-slicing: Used for workloads that can tolerate resource contention
Implementation Details
HAMi deployed on approximately 50–70 active nodes with 400–560 GPUs
Resource Allocation Design
For simulation workloads, NIO treated GPU memory (VRAM) as the primary constraint since insufficient compute capacity can be mitigated through scheduling delay, but memory exhaustion immediately triggers OOM failure.
Finer partitioning is not always better. For certain simulation workloads, allocating approximately 1/6 of a GPU provided optimal efficiency. Smaller fractions (such as 1/8) introduced additional scheduling and virtualization overhead, reducing overall throughput.
Impact: Dramatic Efficiency Gains
The hybrid GPU sharing strategy delivered measurable improvements across multiple workload types, increasing overall system throughput while directly reducing GPU costs.
CI Workloads Utilization
5% → 30-50%
By partitioning GPUs into ¼ or smaller fractions, effective GPU utilization in CI pipelines increased by approximately 4×
Simulation GPU Hours
-30%
Fine-grained GPU sharing reduced overall GPU hours by approximately 30%
Simulation Task Duration
3 days → 2 days
End-to-end simulation tasks reduced from about 3 days to about 2 days
Deployment Scale
400-560 GPUs
Across 50-70 nodes using HAMi
Total Infrastructure
600 GPUs
Across ~80 nodes supporting autonomous driving workloads
Strategy
Hybrid
HAMi + MIG + Time-slicing for optimal results
Lessons Learned
NIO's deployment of HAMi provided valuable insights for organizations implementing GPU virtualization at scale.
GPU Partitioning Optimization
GPU partitioning is not 'the finer, the better.' Each workload has an optimal partition size. Over-fragmentation can reduce efficiency.
Performance Validation
Version upgrades require performance validation. New GPU virtualization components may introduce performance regressions. NIO adopted a phased upgrade strategy, validating performance benchmarks before allowing different HAMi components to run at different versions.
Production Safety
Operational changes must ensure production safety. For online inference workloads, device plugin upgrades follow a blue–green deployment–like process: traffic is migrated first, new Pods are deployed, and old instances are gradually decommissioned.
Toolchain Compatibility
Toolchain compatibility is critical. Certain compiler or library features (such as pointer or address analysis) may conflict with GPU interception mechanisms. Careful trade-offs between functionality and stability are required.
“NIO's deployment of HAMi demonstrates how fine-grained GPU sharing can significantly improve infrastructure efficiency for autonomous driving workloads. By combining HAMi with Kubernetes and existing GPU allocation mechanisms such as MIG and time-slicing, NIO increased GPU utilization, reduced overall GPU hours, and improved workload throughput without compromising stability.”
Hybrid Strategy Success
NIO's hybrid resource management strategy enables the company to support diverse AI workloads—from CI pipelines to simulation and inference—more efficiently within the same Kubernetes environment. This approach demonstrates how combining HAMi with other GPU allocation mechanisms can deliver optimal results for different workload types.