SNOW Corp Logo
+
HAMi Logo

Fallstudie | SNOW Corp Scales GenAI for 200M Users with HAMi GPU Sharing and KEDA Autoscaling

Discover how SNOW Corp orchestrates 1,000+ GPUs to handle 700% viral traffic spikes, achieving 91% MTTR reduction, 85% fewer surge errors, and USD 17.4M in estimated cost savings using HAMi and KEDA on Kubernetes.

1,000+
A100 GPUs orchestrated
200M+
global users across 3 apps
700%
viral traffic spikes handled

Unternehmensuebersicht

SNOW Corp., a subsidiary of NAVER from South Korea, operates a fleet of 1,000+ A100 GPUs serving GenAI features for 200M+ global users across three top-ranked apps (SNOW, EPIK, B612). The infrastructure serves 1,200+ AI workflows and 400+ models, handling extreme traffic volatility from viral AI trends.

Subsidiary of NAVER, South Korea

Three top-ranked GenAI apps: SNOW, EPIK, B612

1,200+ AI workflows and 400+ models in production

Multi-region on-premise Kubernetes platform

SNOW Corp Logo

SNOW Corp.

NAVER subsidiary serving 200M+ GenAI users globally

Challenge: GPU Scheduling at Extreme Scale

Kubernetes' native GPU scheduling treats GPUs as atomic resources — a pod either gets a full GPU or nothing. This model broke down under SNOW's heterogeneous workload demands and unpredictable viral traffic spikes.

Heterogeneous Workloads

Training, inference, and batch processing have vastly different GPU utilization profiles

Traffic Unpredictability

700% viral traffic spikes with no GPU-level observability

Scheduling Blindness

2-3 containers competing for GPU resources with no coordination

Cost Explosion

~2x over-provisioning to handle peak loads

Solution: CNCF-Powered GPU Orchestration Stack

SNOW migrated to a multi-region on-premise Kubernetes platform with the CNCF ecosystem underpinning the entire stack: Cilium for CNI, Helm for GitOps-based deployment, Traefik for ingress, Prometheus/Loki/Grafana for observability, HAMi for GPU sharing, and KEDA for autoscaling.

HAMi

GPU Sharing with HAMi

Kubernetes' default scheduler enforces strict GPU isolation, which blocked migration of SNOW's sequential Train-to-Inference pipelines — where a trainer and inference engine must share a single GPU. HAMi resolves this by virtualizing GPU resources (vGPU), enabling multiple containers within the same pod to share a single GPU concurrently.

Native kube-scheduler integration

Autoscaling ecosystem compatible

2x fewer GPUs for train+inference

Proactive GPU Orchestration with KEDA

Standard metrics (CPU/RAM, DCGM utilization) proved unreliable for SNOW's heterogeneous workloads. KEDA's built-in RabbitMQ scaler functioned as a lagging indicator — given a ~60-second model warm-up time, scaling triggered after a queue backlog formed was consistently too late.

Custom Metric Server for KEDA

Proactive scaling before saturation

Smart scale-in with cooldown

Hybrid Cloud Bursting for Viral Spikes

When viral trends like the 'Ghibli Filter' tripled traffic within 3 hours, SNOW expanded dynamically into CSP regions using a unified GitOps pipeline that deployed identical Helm charts across all clusters.

Unified GitOps pipeline

CSP nodes consume from central queue

Zero service interruption

Impact and Results

SNOW's cloud-native transformation demonstrates how combining Kubernetes with the broader CNCF ecosystem can overcome fundamental limitations in GPU scheduling and observability at extreme scale.

MTTR Improvement

91%

Reduced from ~2 hrs to ~10 min

Surge Error Reduction

85%

During peak traffic

Cost Savings

USD 17.4M

vs. on-demand cloud GPU

SNOW's cloud-native transformation demonstrates how combining Kubernetes with the broader CNCF ecosystem can overcome fundamental limitations in GPU scheduling and observability at extreme scale. By introducing HAMi for GPU sharing and augmenting KEDA with proactive, custom metrics, SNOW shifted from reactive, manual operations to predictive, automated orchestration.
SNOW Corp. Infrastructure Team

A Replicable Blueprint for Large-Scale GenAI

SNOW's approach highlights a replicable blueprint for running large-scale GenAI workloads — where intelligent resource utilization, autoscaling precision, and deep observability are critical to turning infrastructure challenges into competitive advantage.