Case Study | Ke Holdings Scales ML Infrastructure with GPU Virtualization using Kubernetes and HAMi
Discover how Ke Holdings built AIStudio based on HAMi and Kubernetes, achieving nearly 3x GPU utilization improvement (13% → 37%) while supporting 10,000+ pods and 10M+ daily requests across hybrid cloud environments.
Company Overview
Ke Holdings Inc. is an integrated online and offline platform for housing transactions and related services based in China. The centralized infrastructure team operates a shared machine learning platform used across all business units, providing end-to-end compute services for model development, training, and large-scale inference.
Leading housing transaction platform in China
Centralized machine learning platform for all business units
End-to-end compute services for AI workloads
Massive GPU infrastructure across hybrid clouds
Ke Holdings
Leading housing transaction platform in China
Challenges in Scaling ML Infrastructure
As machine learning initiatives scaled, the infrastructure team faced significant challenges in GPU resource management across a complex hybrid-cloud environment.
Scale and Complexity
5 clusters across public and private clouds, thousands of GPU cards including diverse models (H200, H20, V100, 4090, H100, A100)
Hybrid-cloud Environment
Managing GPU resources across public cloud (Volcano Engine, Tencent Cloud, Ali Cloud) and private cloud with ~1,000 NVIDIA GPUs
Diverse Workload Requirements
Large-scale model training requiring full GPU access vs. small model inference needing minimal GPU memory (1-2GB)
Low GPU Utilization
Only 13% initial utilization rate due to multi-cloud complexity and diverse workload requirements
AIStudio Platform Built on Kubernetes and HAMi
Using CNCF projects HAMi and Kubernetes as foundation, Ke Holdings designed and implemented AIStudio, a smart computing platform serving as the basis for the organization's machine learning infrastructure.
Leveraging Kubernetes and HAMi for GPU virtualization, AIStudio provides a unified platform bridging upper-layer SaaS services with underlying compute resources.
Multi-scenario Support
Simultaneously supports inference services, A/B testing tasks, and training tasks on same infrastructure
Advanced Optimization
Acceleration capabilities for inference frameworks, datasets, images, checkpoints, and models with fault tolerance
Multi-framework Support
PyTorch, DeepSpeed, Megatron, VLLM, RLHF, and SGLang
AI Asset Management
Centralized management of resource pools, model repositories, image repositories, queues, CubeFS volumes, and monitoring
Dual-Cluster Architecture for Different Workloads
GPU Clusters
Managed by native NVIDIA device plugin for training workloads requiring complete GPU resources:
vGPU Clusters
Managed by HAMi for GPU memory virtualization for small model inference:
Significant Results: 3x GPU Utilization Improvement
By leveraging open-source technologies including HAMi and Kubernetes, AIStudio has achieved remarkable results at massive scale.
GPU Utilization
13% → 37%
Nearly 3x improvement
Platform Scale
10,000+ pods
Running simultaneously
Daily Requests
10M+
Processed per day
Cluster Coverage
5 clusters
Public and private cloud
Zero Downtime
100%
During transition and operation
Workload Types
Unified
Training and inference on same platform
HAMi Enables GPU Multiplexing and Heterogeneous Scheduling
The successful integration of HAMi demonstrates how open-source technologies enable organizations to achieve remarkable infrastructure efficiency.
Kubernetes serves as the foundation for stable operations with robust scheduling and management capabilities
HAMi enables GPU multiplexing and heterogeneous scheduling optimization, increasing cluster GPU utilization by nearly 3x
Dual-cluster approach separates workloads based on resource requirements for optimal efficiency
Seamless integration between public and private cloud environments enables unified platform management
Future Innovation Plans
Ke Holdings' infrastructure team continues to innovate and expand their platform on top of HAMi and Kubernetes.
Adopting heterogeneous devices: Plans to incorporate Huawei Ascend and other non-NVIDIA accelerators
Cloud expansion: Integration with Alibaba Cloud to complement existing Volcano Engine and Tencent Cloud deployments
Advanced scheduling policies: Network topology-awareness, card type specification, and UUID-based allocation
Open-Source Success Story
Ke Holdings has successfully demonstrated how leveraging HAMi and Kubernetes can dramatically improve GPU utilization while supporting massive-scale AI workloads. The AIStudio platform serves as a model for organizations seeking to optimize their machine learning infrastructure.