Case Study | Ke Holdings Scales ML Infrastructure with GPU Virtualization using Kubernetes and HAMi

Discover how Ke Holdings built AIStudio based on HAMi and Kubernetes, achieving nearly 3x GPU utilization improvement (13% → 37%) while supporting 10,000+ pods and 10M+ daily requests across hybrid cloud environments.

GPU utilization improvement

10,000+

pods running simultaneously

10M+

daily requests processed

Company Overview

Ke Holdings Inc. is an integrated online and offline platform for housing transactions and related services based in China. The centralized infrastructure team operates a shared machine learning platform used across all business units, providing end-to-end compute services for model development, training, and large-scale inference.

Leading housing transaction platform in China

Centralized machine learning platform for all business units

End-to-end compute services for AI workloads

Massive GPU infrastructure across hybrid clouds

Ke Holdings

Leading housing transaction platform in China

Challenges in Scaling ML Infrastructure

As machine learning initiatives scaled, the infrastructure team faced significant challenges in GPU resource management across a complex hybrid-cloud environment.

Scale and Complexity

5 clusters across public and private clouds, thousands of GPU cards including diverse models (H200, H20, V100, 4090, H100, A100)

Hybrid-cloud Environment

Managing GPU resources across public cloud (Volcano Engine, Tencent Cloud, Ali Cloud) and private cloud with ~1,000 NVIDIA GPUs

Diverse Workload Requirements

Large-scale model training requiring full GPU access vs. small model inference needing minimal GPU memory (1-2GB)

Low GPU Utilization

Only 13% initial utilization rate due to multi-cloud complexity and diverse workload requirements

AIStudio Platform Built on Kubernetes and HAMi

Using CNCF projects HAMi and Kubernetes as foundation, Ke Holdings designed and implemented AIStudio, a smart computing platform serving as the basis for the organization's machine learning infrastructure.

Leveraging Kubernetes and HAMi for GPU virtualization, AIStudio provides a unified platform bridging upper-layer SaaS services with underlying compute resources.

Multi-scenario Support

Simultaneously supports inference services, A/B testing tasks, and training tasks on same infrastructure

Advanced Optimization

Acceleration capabilities for inference frameworks, datasets, images, checkpoints, and models with fault tolerance

Multi-framework Support

PyTorch, DeepSpeed, Megatron, VLLM, RLHF, and SGLang

AI Asset Management

Centralized management of resource pools, model repositories, image repositories, queues, CubeFS volumes, and monitoring

Dual-Cluster Architecture for Different Workloads

GPU Clusters

Managed by native NVIDIA device plugin for training workloads requiring complete GPU resources:

Native NVIDIA device plugin

High-performance GPUs (H200, H100)

Dedicated for LLM training

Full GPU resource allocation

vGPU Clusters

Managed by HAMi for GPU memory virtualization for small model inference:

HAMi GPU memory virtualization

GPUs (H20, V100, A100, 4090)

Fine-grained allocation (1-2GB)

Small model inference

Significant Results: 3x GPU Utilization Improvement

By leveraging open-source technologies including HAMi and Kubernetes, AIStudio has achieved remarkable results at massive scale.

GPU Utilization

13% → 37%

Nearly 3x improvement

Platform Scale

10,000+ pods

Running simultaneously

Daily Requests

10M+

Processed per day

Cluster Coverage

5 clusters

Public and private cloud

Zero Downtime

100%

During transition and operation

Workload Types

Unified

Training and inference on same platform

HAMi Enables GPU Multiplexing and Heterogeneous Scheduling

The successful integration of HAMi demonstrates how open-source technologies enable organizations to achieve remarkable infrastructure efficiency.

Kubernetes serves as the foundation for stable operations with robust scheduling and management capabilities

HAMi enables GPU multiplexing and heterogeneous scheduling optimization, increasing cluster GPU utilization by nearly 3x

Dual-cluster approach separates workloads based on resource requirements for optimal efficiency

Seamless integration between public and private cloud environments enables unified platform management

Future Innovation Plans

Ke Holdings' infrastructure team continues to innovate and expand their platform on top of HAMi and Kubernetes.

Adopting heterogeneous devices: Plans to incorporate Huawei Ascend and other non-NVIDIA accelerators

Cloud expansion: Integration with Alibaba Cloud to complement existing Volcano Engine and Tencent Cloud deployments

Advanced scheduling policies: Network topology-awareness, card type specification, and UUID-based allocation

Open-Source Success Story

Ke Holdings has successfully demonstrated how leveraging HAMi and Kubernetes can dramatically improve GPU utilization while supporting massive-scale AI workloads. The AIStudio platform serves as a model for organizations seeking to optimize their machine learning infrastructure.

Explore HAMi Contact Us