Fallstudie | Ke Holdings Scales ML Infrastructure with GPU Virtualization using Kubernetes and HAMi
Discover how Ke Holdings built AIStudio based on HAMi and Kubernetes, achieving nearly 3x GPU utilization improvement (13% → 37%) while supporting 10,000+ pods and 10M+ daily requests across hybrid cloud environments.
Unternehmensuebersicht
Ke Holdings Inc. is an integrated online and offline platform for housing transactions and related services based in China. The centralized infrastructure team operates a shared machine learning platform used across all business units, providing end-to-end compute services for model development, training, and large-scale inference.
中国领先的房产交易服务平台
集中化机器学习平台
跨业务单元的 AI 基础设施
大规模 GPU 集群需求
Ke Holdings
Fuehrende Immobilientransaktionsplattform in China
Challenges in Scaling ML Infrastructure
As machine learning initiatives scaled, the infrastructure team faced significant challenges in GPU resource management across a complex hybrid-cloud environment.
Scale and Complexity
5 clusters across public and private clouds, thousands of GPU cards
Hybrid-cloud Environment
Managing GPU resources across multiple cloud providers
Diverse Workload Requirements
Training vs inference with different resource needs
Low GPU Utilization
Only 13% initial utilization rate
AIStudio Platform Built on Kubernetes and HAMi
Using CNCF projects HAMi and Kubernetes as foundation, Ke Holdings designed and implemented AIStudio, a smart computing platform serving as the basis for the organization's machine learning infrastructure.
Leveraging Kubernetes and HAMi for GPU virtualization, AIStudio provides a unified platform bridging upper-layer SaaS services with underlying compute resources.
Multi-scenario Support
Supports inference, A/B testing, and training tasks on same infrastructure
Advanced Optimization
Acceleration for inference frameworks, datasets, models, and fault tolerance
Multi-framework Support
PyTorch, DeepSpeed, Megatron, VLLM, RLHF, SGLang
AI Asset Management
Unified management of resource pools, models, images, queues, and monitoring
Dual-Cluster-Architektur fuer verschiedene Workloads
GPU-Cluster
Verwaltet durch das native NVIDIA Device Plugin fuer Trainings-Workloads:
vGPU-Cluster
Verwaltet durch HAMi fuer GPU-Speichervirtualisierung:
Significant Results: 3x GPU Utilization Improvement
By leveraging open-source technologies including HAMi and Kubernetes, AIStudio has achieved remarkable results at massive scale.
GPU Utilization
13% → 37%
Nearly 3x improvement
Platform Scale
10,000+ pods
Running simultaneously
Daily Requests
10M+
Processed per day
Cluster Coverage
5 clusters
Public and private cloud
Zero Downtime
100%
During transition and operation
Workload Types
Unified
Training and inference on same platform
HAMi Enables GPU Multiplexing and Heterogeneous Scheduling
The successful integration of HAMi demonstrates how open-source technologies enable organizations to achieve remarkable infrastructure efficiency.
Kubernetes serves as the foundation for stable operations with robust scheduling
HAMi enables GPU multiplexing and heterogeneous scheduling optimization
Dual-cluster approach separates workloads based on resource requirements
Seamless integration between public and private cloud environments
Future Innovation Plans
Ke Holdings' infrastructure team continues to innovate and expand their platform on top of HAMi and Kubernetes.
Adopting heterogeneous devices: Huawei Ascend and other non-NVIDIA accelerators
Cloud expansion: Integration with Alibaba Cloud
Advanced scheduling policies: network topology-awareness, card type specification, UUID-based allocation
Open-Source Success Story
Ke Holdings has successfully demonstrated how leveraging HAMi and Kubernetes can dramatically improve GPU utilization while supporting massive-scale AI workloads. The AIStudio platform serves as a model for organizations seeking to optimize their machine learning infrastructure.