Case Study | Building Flexible GPU Clouds with HAMi at DaoCloud

Discover how DaoCloud operates two major cloud native platforms for AI workloads—D.run Compute Cloud and DaoCloud Enterprise—using HAMi to achieve >80% GPU utilization across 10,000+ GPUs spanning 10+ data centers.

10,000+

GPUs across platforms

>80%

average GPU utilization

20-30%

reduction in operating costs

Company Overview

DaoCloud operates two major cloud native platforms for AI workloads. D.run Compute Cloud is a public GPU cloud serving individual developers and small teams, while DaoCloud Enterprise (DCE) is a private Kubernetes platform for enterprise customers running both training and inference.

Two major cloud native platforms

D.run Compute Cloud for public GPU cloud

DaoCloud Enterprise (DCE) for private K8s

10+ data centers across China and Hong Kong

DaoCloud

Cloud native platform provider for AI workloads

Challenges in GPU Resource Management

As GPU demand grew rapidly across both platforms, several challenges emerged that required a flexible GPU virtualization solution.

Whole-card Allocation

Many inference and lightweight workloads used only a fraction of GPU resources, leaving significant portions of compute and memory underutilized and limiting how DaoCloud could package GPU SKUs.

Heterogeneous Hardware

DaoCloud needed to support mainstream NVIDIA GPUs while also integrating domestic accelerators from multiple vendors. Proprietary vGPU solutions increased licensing costs.

Multi-tenant Governance

On DCE, enterprise customers wanted shared GPU pools with department-level quotas, queue-based resource allocation, and clear isolation across teams.

Cloud Native Alignment

DaoCloud's core strategy revolves around Kubernetes and open-source technologies. Any GPU sharing solution had to stay fully cloud native, vendor-agnostic, and compatible with existing CNCF tooling.

HAMi as the Unified GPU Layer

DaoCloud adopted HAMi, a CNCF Incubating project, for heterogeneous AI computing virtualization, as the unified GPU layer across both D.run and DCE. HAMi provides device virtualization, vGPU partitioning, and scheduling for heterogeneous accelerators in Kubernetes clusters.

D.run Compute Cloud: vGPU SKUs for Public GPU Users

On D.run, DaoCloud integrated HAMi into each regional Kubernetes cluster to enable fine-grained GPU sharing and higher utilization.

vGPU Slicing

Physical GPUs partitioned into multiple vGPU slices with defined compute and memory. Lightweight inference jobs can run on fractional GPUs.

SKU-based Marketplace

vGPU slices are exposed as standardized SKUs in a central marketplace. Users select GPU SKUs based on workload size.

Multi-region Deployment

HAMi powers 7 active D.run regions across Mainland China and Hong Kong, covering over 10 data centers.

Domestic Accelerator Support

DaoCloud extended HAMi to support domestic GPU vendors, ensuring consistent management under a unified abstraction layer.

DaoCloud Enterprise (DCE): Shared GPU Pool for Enterprise

On DCE, DaoCloud built a centralized GPU resource pool using HAMi, unifying GPU capacity for multiple enterprise tenants.

Unified GPU Pool

Enterprise users contribute and consume GPUs from a central pool that serves both training and inference workloads.

Quotas & RBAC

HAMi's vGPU resources are integrated with DaoCloud's existing quota and role-based access systems.

Simplified Experience

Algorithm engineers request GPU resources through the platform without worrying about underlying hardware differences.

Co-developing HAMi with the Community

DaoCloud has been one of HAMi's earliest and most active contributors.

Contributed real-world insights from D.run and DCE back to the open-source community

Collaborated upstream to improve GPU over-subscription mechanisms, node configuration management, and heterogeneous hardware handling

Helped maintain documentation and deployment guides to support production adoption by other cloud providers

Significant Results: Cost Reduction and Improved Efficiency

By integrating HAMi, DaoCloud consolidated previously fragmented GPU resources into a more unified, efficient, and scalable GPU layer across both public and private clouds.

GPU Utilization

>80%

Average utilization per card after HAMi deployment

Cost Reduction

20-30%

Reduction in GPU-related operating costs

Unified Abstraction

Single Layer

Across NVIDIA and domestic GPUs

Deployment Scale

10,000+ GPUs

Across 10+ data centers

Multi-region

7 Regions

Active D.run regions across China

Open Collaboration

Active

Contributing improvements upstream

“HAMi is more than compatible with DaoCloud's business, it's something we've built together. As one of HAMi's earliest contributors, we've witnessed its evolution from inception to maturity. HAMi now runs across both D.run and DCE, and our real-world improvements continuously flow back to the community. HAMi and DaoCloud share the same open-source DNA, and we'll continue contributing to HAMi to bring true vGPU technology to the world.”

Captain, AI/LLM Infra Product Lead, DaoCloud

Open-Source Partnership Success

DaoCloud has successfully integrated HAMi across both its public and private GPU cloud platforms, achieving dramatic improvements in utilization and cost efficiency while contributing back to the open-source community.

Explore HAMi Contact Us