HAMi v2.9.0 Deep Dive: Ascend User-Space Partitioning, DRA Production-Ready, and Scheduling Ecosystem Expansion

11. Mai 2026

HAMi v2.9.0 has been officially released! Since v2.8, the project has made significant progress in heterogeneous device virtualization depth, Kubernetes native standards adoption, and scheduler ecosystem expansion. v2.9 brings systematic enhancements in Ascend user-space partitioning, DRA production readiness, pluggable scheduler integration, and observability/security, evolving HAMi from a "GPU sharing tool" into a unified heterogeneous computing management and scheduling infrastructure platform.

Highlights at a Glance

Key feature highlights of the v2.9 release:

Ascend 910C User-Space Virtualization: Introduces HAMi-core mode with LD_PRELOAD-based user-space interception, enabling fine-grained memory and compute sharing without modifying application code, validated in China Merchants Bank's production environment.
HAMi-DRA Production-Ready: HAMi-DRA, the independent implementation based on Kubernetes DRA standards, bumps to v0.2.0, officially reaching production-ready status across NVIDIA / Ascend / Enflame platforms.
Security & Stability Enhancements: Scheduler routes add DoS protection, NodeLock optimizes to exponential backoff, Webhook adds resource quota checking, with fixes for multiple critical production stability issues.
Heterogeneous Device Coverage Expansion: New Vastai (Hanbo Semiconductor) device support further enriches the domestic heterogeneous computing management landscape.

Ascend 910C User-Space Partitioning — HAMi-core Mode (Key Feature)

Huawei Ascend 910C is one of the primary domestic AI computing chips. In production environments, users have long faced a critical question: How can multiple inference/training tasks share a single Ascend card?

Traditional device management approaches have two extremes:

Exclusive mode: One Pod occupies the entire card, resulting in extremely low resource utilization
SR-IOV hardware partitioning: Requires specific hardware support, fixed partitioning granularity, insufficient flexibility

HAMi v2.9.0 introduces HAMi-core mode, implementing user-space soft partitioning of memory and compute without modifying application code or requiring specific hardware support. This is the most important feature in this release.

Why User-Space Partitioning?

Before HAMi-core, Ascend device sharing primarily relied on SR-IOV hardware virtualization, which had several fundamental limitations:

Coarse granularity: SR-IOV splits a physical card into fixed Virtual Functions (VFs), typically only supporting fixed ratios like 1/2, 1/4 of the entire card
Inflexible: Partitioning ratios are preset at the hardware level and cannot be dynamically adjusted based on actual workloads
Hardware dependency: Not all Ascend hardware versions support SR-IOV, and firmware coordination is required

HAMi-core fundamentally changes this landscape — through pure software interception and management of ACL calls in user space, it achieves MB-level memory and percentage-level compute fine-grained partitioning. A single Ascend 910C can simultaneously serve multiple inference or training tasks with different specifications.

Technical Architecture

How HAMi-core works:

LD_PRELOAD Interception: At container startup, an interception library is injected via LD_PRELOAD, intercepting the application's calls to Ascend Computing Language (ACL)
Memory Isolation: Each Pod's memory allocation is strictly limited to its declared quota at MB-level granularity, preventing one task from exhausting the entire card's memory
Compute Throttling: Based on the Pod's declared compute quota, ACL compute calls are scheduled via time-slice round-robin, ensuring fair compute resource access for all tasks
Passthrough Execution: Calls within quota limits are passed directly to the hardware driver without additional latency, ensuring near-native performance

Partitioning Granularity Comparison

Comparison of HAMi-core with traditional approaches:

Dimension	Exclusive Mode	SR-IOV	HAMi-core (v2.9)
Memory Partitioning	Not partitionable	Fixed per VF allocation	MB-level precise control
Compute Partitioning	Not partitionable	Proportional per VF	Percentage-level flexible configuration
Number of Partitions	1 Pod/card	Usually 2-4 VF/card	10+ Pods/card
Hardware Support Required	No	Yes	No
Application Code Changes Required	No	No	No
Dynamic Adjustment	Not supported	Not supported	Supported

For example, a 64GB Ascend 910C can be allocated to multiple tasks as follows:

# Task 1: Large model inference, 32GB memory + 50% compute
resources:
  limits:
    hami.io/vnpu-core: "50"
    hami.io/vnpu-core-memory: "32768"  # 32GB = 32768MB

# Task 2: Model fine-tuning, 16GB memory + 30% compute
resources:
  limits:
    hami.io/vnpu-core: "30"
    hami.io/vnpu-core-memory: "16384"  # 16GB = 16384MB

# Task 3: Lightweight inference, 8GB memory + 20% compute
resources:
  limits:
    hami.io/vnpu-core: "20"
    hami.io/vnpu-core-memory: "8192"  # 8GB = 8192MB

Core Capabilities

Ascend 910C SuperPod Support

For SuperPod environments, HAMi implements module-pair level resource allocation. In distributed training scenarios, multiple Ascend chips are interconnected via HCCS/RoCE to form super nodes. HAMi can identify and manage this topology structure, fully leveraging the hardware advantages of super nodes.

vNPU-Core Virtualization

A new hami-vnpu-core resource type supports more flexible compute partitioning strategies:

apiVersion: v1
kind: Pod
metadata:
  name: ascend-inference
  annotations:
    hami.io/vnpu-core: "ascend910c"
spec:
  containers:
    - name: inference
      image: ascend-inference:latest
      resources:
        limits:
          hami.io/vnpu-core: "1"
          hami.io/vnpu-core-memory: "16384"

Annotation-based node filtering and multi-device request capabilities:

# Node filtering: Only schedule on nodes with specific annotations
metadata:
  annotations:
    hami.io/vnpu-core-node-filter: "ascend910c-module-0"

Enabling HAMi-core

Set ascend.hamiVnpuCore to true in Helm values to enable this feature:

# values.yaml
ascend:
  hamiVnpuCore: true

It can also be enabled individually in ascend-device-plugin node configuration, supporting partial node enablement within the same cluster.

Important: In v2.9, Pods must explicitly declare huawei.com/vnpu-mode: 'hami-core' in annotations to use HAMi-core mode:

apiVersion: v1
kind: Pod
metadata:
  name: ascend-inference
  annotations:
    huawei.com/vnpu-mode: "hami-core"
spec:
  containers:
    - name: inference
      image: ascend-inference:latest
      resources:
        limits:
          hami.io/vnpu-core: "1"
          hami.io/vnpu-core-memory: "16384"

Pods without this annotation will continue using the legacy template-based vNPU partitioning. If no suitable nodes are available, the task will remain in pending state.

Production Validation

This feature has been validated in China Merchants Bank's production environment. Based on the HAMi-vNPU-Core soft partitioning solution, China Merchants Bank achieved 100% resource pooling of Ascend 910C computing power and high-performance communication for large models, significantly improving domestic computing resource utilization.

Thanks to Huawei Cloud Canada Lab and China Merchants Bank @ashergaga for their contributions to this feature.

This release also updates the HAMi-core performance benchmark data. For detailed benchmark procedures, refer to the project documentation.

HAMi-DRA — Lightweight HAMi Based on Kubernetes Standards

In HAMi v2.9.0, HAMi-DRA officially reaches production-ready status. HAMi-DRA is an independent implementation project based on Kubernetes Dynamic Resource Assignment (DRA) standards, positioned as a "lightweight HAMi".

From Device Plugin to DRA

DRA is the next-generation device resource declaration and allocation mechanism being advanced by the Kubernetes community. The traditional Device Plugin model has the following limitations:

Inflexible resource declaration: Device resources are hard-coded via limits[nvidia.com/gpu], unable to express complex requirements like memory/compute separation or topology constraints
Scattered scheduling logic: Each device plugin needs to implement its own scheduling logic, making unified management difficult
Difficult resource composition: Cannot express compound requirements like "multiple GPUs with specific topology"

DRA standardizes device resource declaration, allocation, and management by introducing new APIs like ResourceClaim and DeviceClass.

Figure 1: Device Plugin vs DRA Model Comparison

HAMi-DRA Design Philosophy

HAMi-DRA adopts a Mutating Webhook architecture. The core philosophy can be summarized in three points:

Don't change user habits: Continue using Device Plugin syntax, automatically converting to DRA resource models underneath
Internalize complexity: Webhook, Driver, and lifecycle management are all handled by the system
Drive evolution through community collaboration: Contributors from different companies validate solutions in real production environments

Platform Support

HAMi-DRA has bumped to v0.2.0, supporting three major platforms:

Platform	Virtualization Method	Status
NVIDIA	HAMi-core time-slicing + memory soft limits	Production-ready
Ascend	vNPU-Core user-space partitioning	Production-ready
Enflame	Compute/memory partitioning	Production-ready

Installing HAMi-DRA:

Prerequisites:

Kubernetes >= 1.34 with DRA Consumable Capacity featuregate enabled
Container runtime with CDI (Container Device Interface) enabled
NVIDIA GPU Driver 440+
cert-manager installed

# Clone the repository and install
git clone https://github.com/Project-HAMi/HAMi-DRA.git
cd HAMi-DRA
helm install hami-dra ./charts/hami-dra

# If not using gpu-operator's containerd driver:
helm install hami-dra ./charts/hami-dra --set drivers.nvidia.containerDriver=false

# If monitoring components are not needed:
helm install hami-dra ./charts/hami-dra --set monitor.enabled=false

After installation, usage is the same as HAMi.

Submitting workloads (automatically converted from Device Plugin syntax):

apiVersion: v1
kind: Pod
metadata:
  name: gpu-inference
  annotations:
    hami.io/gpu-memory: "4096"
    hami.io/gpu-core: "50"
spec:
  containers:
    - name: inference
      image: pytorch:latest
      resources:
        limits:
          nvidia.com/gpu: "1"

HAMi-DRA Project: https://github.com/Project-HAMi/HAMi-DRA

Volcano vGPU Upgraded to v0.19 + CDI

HAMi v2.9.0 synchronizes the built-in Volcano vGPU Device Plugin to v0.19, maintaining consistency with the Volcano upstream.

CDI Mode Support

Container Device Interface (CDI) is the standard device injection interface for container runtimes defined by the CNCF CDI specification. Compared to traditional approaches:

Dimension	Traditional Approach	CDI Approach
Device Injection	Manual /dev mounts, environment variable setup	Declarative CDI device list
Runtime Coupling	Tightly coupled	Loosely coupled
Multi-device Support	Manual management required	Automatic aggregation
MIG Support	Complex configuration management	Standardized declaration

Enable CDI mode:

# values.yaml
devicePlugin:
  cdIEnabled: true
  nvidia:
    cdIEnabled: true

This upgrade also fixes MIG allocation issues under CDI mode, further enhancing NVIDIA GPU flexible partitioning capabilities.

Observability Enhancements

v2.9.0 includes multiple observability improvements:

vGPU Monitor adds --metrics-bind-address parameter for custom metrics exposure address
Helm Chart adds Prometheus ServiceMonitor covering scheduler and device plugin
Prometheus metrics and label naming aligned with community best practices
New device type label support in metrics
Optimized log level control with related unit tests added

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hami-scheduler
  namespace: hami-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: hami-scheduler
  endpoints:
    - port: metrics
      interval: 15s

Security & Stability

Security Hardening

DoS Protection

Scheduler routes add io.LimitReader to limit HTTP request body size, preventing OOM caused by malicious or abnormally large request bodies.

Webhook Resource Quota Checking

v2.9.0 adds Resource Quota checking capability in the Webhook. GPU resource requests can be validated against quota limits at the Pod submission stage, avoiding post-scheduling rollbacks and improving overall scheduling efficiency.

Critical Bug Fixes

v2.9.0 fixes multiple critical issues affecting production stability:

Issue	Impact	Fix
NodeLock contention	Large-scale cluster scheduling performance degradation	Exponential backoff strategy
Leader election nil pointer	HA deployment occasional crashes	Null check
Scheduler score division by zero	Scoring anomalies	Safe division
Multi-container Pod device allocation	Init container device allocation conflicts	Lifecycle-aware allocation
Kernel 6.17 NVIDIA health check	GPU status check handshake failure	Boundary condition handling
Global image tag override	Component-level image tags ignored	Tag priority fix
Stale Deleted handshake	Node scheduling interruption	State cleanup
Device filter ineffective	filter device config not working	Filter logic fix
Device Plugin/Scheduler annotation mismatch	Device allocation anomalies	Annotation alignment

Heterogeneous Device Ecosystem Expansion

New Vastai (Hanbo Semiconductor) Device Support

Vastai Technologies is a leading domestic general-purpose GPU chip design company, with chips widely used in AI inference, graphics rendering, video processing, and other scenarios. HAMi v2.9.0 adds support for Vastai devices, further expanding HAMi's heterogeneous computing management into the domestic GPU space.

Two Allocation Modes

Vastai devices support two resource allocation modes:

Mode	Description	Use Case
Full-Card Mode	Each Pod exclusively occupies an entire GPU	Large model training, performance-sensitive inference
Die Mode	Partitioned by chip Die, supports topology-aware scheduling	Multi-task sharing, resource utilization optimization

In Die mode, the scheduler is aware of the AIC (Accelerator Interface Card) topology, attempting to allocate multiple resources requested by the same Pod to the same AIC, reducing cross-Die communication overhead.

Configuration and Usage

Node Labeling:

Before installing HAMi, label nodes with Vastai devices:

kubectl label node <node-name> vastai=on

Helm values configuration:

vastai:
  enabled: true
  customresources:
    - vastaitech.com/va

Full-Card mode example:

apiVersion: v1
kind: Pod
metadata:
  name: vastai-inference
spec:
  containers:
    - name: inference
      image: vastai-inference:latest
      resources:
        limits:
          vastaitech.com/va: "1"

Die mode example:

Specify device selection strategy via annotations:

apiVersion: v1
kind: Pod
metadata:
  name: vastai-die-inference
  annotations:
    vastaitech.com/use-va: "0"
    vastaitech.com/nouse-va: "1"
spec:
  containers:
    - name: inference
      image: vastai-inference:latest
      resources:
        limits:
          vastaitech.com/va: "2"

With Vastai device support, HAMi now covers NVIDIA, Huawei Ascend, Cambricon, Hygon DCU, Biren, Enflame, MetaX, Kunlunxin, AMD, Iluvatar, AWS Neuron, and Vastai — over a dozen heterogeneous computing devices, making it one of the most broadly covering heterogeneous device virtualization and scheduling projects in the Kubernetes ecosystem.

DRA Ecosystem Alliance

DRA is becoming the next-generation device management model for Kubernetes, but there are implementation uncertainties on the vendor side and high usage barriers on the user side. To address this, the HAMi community announced the launch of the DRA Ecosystem Alliance at the 3rd HAMi Meetup in Shenzhen.

Goals of the DRA Ecosystem Alliance:

Connect device vendors with users to drive DRA adoption in real-world scenarios
Promote DRA standards evolution to reduce the engineering cost of heterogeneous device integration
Unify the scheduling layer to abstract underlying hardware differences, enabling unified heterogeneous computing management

Upgrade Guide

Upgrade to v2.9.0 via Helm:

helm repo add hami-charts https://project-hami.github.io/HAMi/
helm repo update
helm upgrade hami hami-charts/hami -n hami-system

For complete installation documentation, visit: https://project-hami.io/docs/usage/install

Upgrade Notes:

If using Volcano vGPU mode, note the CDI-related configuration changes

If using Ascend devices and want to enable HAMi-core mode, refer to the Ascend configuration section in the latest documentation

It's recommended to verify compatibility in a test environment before upgrading

Community Updates

During the v2.9.0 release cycle, the HAMi community remained active in technical advocacy, product ecosystem, and user practices. Here are the noteworthy community developments.

Community Events

KubeCon EU 2026: HAMi appeared in Amsterdam as a CNCF Sandbox project, setting up a Project Pavilion booth and taking the main stage Keynote Demo to showcase the latest Kubernetes GPU virtualization advancements to developers worldwide. Read recap
KCD Beijing 2026: Over 1,000 registrations, breaking KCD Beijing records. The HAMi community was invited to share "From Device Plugin to DRA: GPU Scheduling Paradigm Upgrade and HAMi-DRA Practice." Read recap
3rd HAMi Meetup Shenzhen: Seven experts from CNCF, SF Technology, China Merchants Bank, Enflame Technology, and others shared insights on the future of AI computing cloud-native. Read recap
HAMi WebUI Official Release: The HAMi community launched the open-source GPU monitoring dashboard HAMi WebUI v1.1.0, presenting the entire GPU cluster in a single visual interface, achieving a complete closed loop from GPU scheduling to visual observability. Read blog post

Website and Documentation Overhaul

Since the v2.8.0 release, the HAMi website and documentation have undergone the largest-scale restructuring in project history. Approximately 195 PRs were merged into the website repository, covering the following major areas:

Website Redesign: Homepage redesign, architecture diagram redraws, unified blog styling, mobile optimization, enhanced footer, migration from external search to built-in search
New Documentation: GPU virtualization principles page, HAMi quick start guide, real-time GPU monitoring guide, upgrade and uninstallation guide, HAMi WebUI user and developer guides, Vastai device documentation
i18n Sync: Continuous Chinese-English documentation alignment, sidebar label localization, announcement banner multilingual support
Community Content: New blog posts including KubeCon EU 2026 recap, KCD Beijing 2026 DRA sharing, HAMi WebUI release, Meetup Shenzhen recap; adopter information updates for Keike, NIO, SNOW Corp., Bovi Wisdom
Quality Governance: Site-wide copy de-marketing, grammar corrections, code block language annotations, format standardization, contributor guide and governance documentation improvements

Thanks to @mesutoezdil for contributions to HAMi official documentation optimization.

Website: https://project-hami.io

CNCF Case Studies

An increasing number of enterprises are using HAMi in production environments to build GPU virtualization and heterogeneous computing scheduling capabilities. The following case studies have been published on the CNCF website:

Keike Holdings: Built AIStudio intelligent computing platform based on Kubernetes + HAMi, improving GPU utilization from 13% to 37% (nearly 3x), supporting 10,000+ concurrent Pods, processing tens of millions of daily business requests. Read full article
NIO: Adopted HAMi hybrid GPU sharing strategy for autonomous driving workloads, improving CI pipeline GPU utilization by approximately 10x, reducing simulation workload GPU time by approximately 30%, covering a production cluster of approximately 600 GPUs. Read full article
SNOW Corp.: Korea's NAVER subsidiary SNOW Corp. manages 1,000+ A100 GPUs, achieving GPU sharing through HAMi to handle 700% traffic spikes, halving GPU requirements with estimated savings of $17.4 million. Read full article

New Contributors

The v2.9.0 release welcomed 19 new contributors to the HAMi project from different countries and organizations:

@maishivamhoo123, @hoteye, @jsl9208, @ashergaga, @Atroxgod, @MyoungHaSong, @charford, @jcustenborder, @Nov11, @ilia-medvedev, @Yonsun-w, @CFH2436, @kenwoodjw, @anandj91, @ManishSharma1609, @maverick123123, @almazkhalikov, @lin121291, @mesutoezdil

Thank you to every contributor!

Related Links:

GitHub Release: https://github.com/Project-HAMi/HAMi/releases/tag/v2.9.0
HAMi-DRA: https://github.com/Project-HAMi/HAMi-DRA
Volcano vGPU Device Plugin: https://github.com/Project-HAMi/volcano-vgpu-device-plugin
Project Documentation: https://project-hami.io
Community Portal: https://project-hami.io/community
Community Discord (recommended): https://discord.gg/Amhy7XmbNq
Community CNCF Slack: https://cloud-native.slack.com/archives/C08844T5WBQ

Artikel teilen

Zurueck zum Blog