Fallstudie | Ke Holdings Scales ML Infrastructure with GPU Virtualization using Kubernetes and HAMi

Discover how Ke Holdings built AIStudio based on HAMi and Kubernetes, achieving nearly 3x GPU utilization improvement (13% → 37%) while supporting 10,000+ pods and 10M+ daily requests across hybrid cloud environments.

GPU utilization improvement

10,000+

pods running simultaneously

10M+

daily requests processed

Unternehmensuebersicht

Ke Holdings Inc. is an integrated online and offline platform for housing transactions and related services based in China. The centralized infrastructure team operates a shared machine learning platform used across all business units, providing end-to-end compute services for model development, training, and large-scale inference.

中国领先的房产交易服务平台

集中化机器学习平台

跨业务单元的 AI 基础设施

大规模 GPU 集群需求

Ke Holdings

Fuehrende Immobilientransaktionsplattform in China

Challenges in Scaling ML Infrastructure

As machine learning initiatives scaled, the infrastructure team faced significant challenges in GPU resource management across a complex hybrid-cloud environment.

Scale and Complexity

5 clusters across public and private clouds, thousands of GPU cards

Hybrid-cloud Environment

Managing GPU resources across multiple cloud providers

Diverse Workload Requirements

Training vs inference with different resource needs

Low GPU Utilization

Only 13% initial utilization rate

AIStudio Platform Built on Kubernetes and HAMi

Using CNCF projects HAMi and Kubernetes as foundation, Ke Holdings designed and implemented AIStudio, a smart computing platform serving as the basis for the organization's machine learning infrastructure.

Leveraging Kubernetes and HAMi for GPU virtualization, AIStudio provides a unified platform bridging upper-layer SaaS services with underlying compute resources.

Multi-scenario Support

Supports inference, A/B testing, and training tasks on same infrastructure

Advanced Optimization

Acceleration for inference frameworks, datasets, models, and fault tolerance

Multi-framework Support

PyTorch, DeepSpeed, Megatron, VLLM, RLHF, SGLang

AI Asset Management

Unified management of resource pools, models, images, queues, and monitoring

Dual-Cluster-Architektur fuer verschiedene Workloads

GPU-Cluster

Verwaltet durch das native NVIDIA Device Plugin fuer Trainings-Workloads:

Native NVIDIA device plugin

High-performance GPUs (H200, H100)

Dedicated for LLM training

Full GPU resource allocation

vGPU-Cluster

Verwaltet durch HAMi fuer GPU-Speichervirtualisierung:

HAMi GPU memory virtualization

GPUs (H20, V100, A100, 4090)

Fine-grained allocation (1-2GB)

Small model inference

Significant Results: 3x GPU Utilization Improvement

By leveraging open-source technologies including HAMi and Kubernetes, AIStudio has achieved remarkable results at massive scale.

GPU Utilization

13% → 37%

Nearly 3x improvement

Platform Scale

10,000+ pods

Running simultaneously

Daily Requests

10M+

Processed per day

Cluster Coverage

5 clusters

Public and private cloud

Zero Downtime

100%

During transition and operation

Workload Types

Unified

Training and inference on same platform

HAMi Enables GPU Multiplexing and Heterogeneous Scheduling

The successful integration of HAMi demonstrates how open-source technologies enable organizations to achieve remarkable infrastructure efficiency.

Kubernetes serves as the foundation for stable operations with robust scheduling

HAMi enables GPU multiplexing and heterogeneous scheduling optimization

Dual-cluster approach separates workloads based on resource requirements

Seamless integration between public and private cloud environments

Future Innovation Plans

Ke Holdings' infrastructure team continues to innovate and expand their platform on top of HAMi and Kubernetes.

Adopting heterogeneous devices: Huawei Ascend and other non-NVIDIA accelerators

Cloud expansion: Integration with Alibaba Cloud

Advanced scheduling policies: network topology-awareness, card type specification, UUID-based allocation

Open-Source Success Story

Ke Holdings has successfully demonstrated how leveraging HAMi and Kubernetes can dramatically improve GPU utilization while supporting massive-scale AI workloads. The AIStudio platform serves as a model for organizations seeking to optimize their machine learning infrastructure.

HAMi erkunden Kontakt