In a significant move to address the computational bottlenecks of large language models (LLMs), Huawei Data Storage and Zhongke Hongyun have jointly unveiled a new AI inference acceleration solution. Announced on December 23, 2025, this collaboration aims to tackle the high latency and resource inefficiency often associated with processing long-context prompts, a common challenge in enterprise AI applications. The solution promises to deliver smarter, faster, and more accessible AI inference capabilities directly to business operations.
A Fusion of Storage and Compute for Optimized AI
The core of the joint solution is the deep integration of Huawei's OceanStor A-series storage systems with Zhongke Hongyun's HyperCN intelligent computing internet cloud platform. This partnership creates a unified data and compute fabric designed specifically for AI workloads. By using Huawei's storage as a high-performance data foundation, the system can manage the massive amounts of data generated during AI inference more efficiently, while HyperCN provides the orchestration layer to manage diverse computing resources.
Key Innovations Driving Performance Gains
The solution introduces several technical advancements to accelerate inference. A central feature is Huawei's UCM (Unified Cache Management) technology, which intelligently persists the KV Cache—a critical component for LLM inference—directly to the OceanStor storage. This "inference memory" prevents redundant calculations for repeated prompts, significantly speeding up subsequent responses. Furthermore, the integration of algorithms like Prefix Cache and GSA sparse acceleration specifically targets and reduces the time-to-first-token (TTFT), the initial delay a user experiences when querying a model.
Technical Compatibility & Features:
- Supported AI Hardware: NVIDIA GPUs, Huawei Ascend, Cambricon.
- Supported Frameworks: MindSpore, vLLM, SGLang.
- Core Technologies: Huawei UCM for KV Cache persistence, Prefix Cache, GSA Sparse Acceleration.
- Platform Integration: Kubernetes-based orchestration with Huawei OceanStor A-series storage.
Delivering Measurable Results in Real-World Tests
According to performance benchmarks released by the companies, the solution delivers substantial improvements. In a standard intelligent Q&A scenario, the time-to-first-token was reduced by 57.5%. The benefits scale with context length: in a long-document reasoning test with a sequence length of 39,000 tokens, the solution achieved an 86% increase in concurrent processing capability and a 36% boost in overall inference throughput. These metrics translate to more responsive AI assistants and the ability to process complex documents much faster.
Reported Performance Improvements:
- Time-to-First-Token (TTFT) in Q&A: Reduced by 57.5%.
- Long-Document Reasoning (39K tokens):
- Concurrent capability: Increased by 86%.
- Inference throughput: Increased by 36%.
Designed for Heterogeneous and Enterprise-Ready Deployment
Recognizing the diverse landscape of AI hardware, the solution is built for flexibility. It supports and can orchestrate a mix of AI accelerators from NVIDIA, Huawei's own Ascend, and Cambricon. It is also compatible with mainstream AI frameworks like MindSpore, vLLM, and SGLang, and integrates seamlessly with Kubernetes for containerized deployment. This agnostic approach allows enterprises to leverage existing infrastructure investments. The platform also includes a full AI toolchain, covering data management, model development, training, and inference, facilitating centralized management of AI assets.
Current Status: The joint solution is in the pilot application phase, with deployments underway in the energy & power, smart manufacturing, and national laboratory sectors.
Pilot Programs Signal Broad Industry Application
The Huawei-Zhongke Hongyun solution is not merely a theoretical offering. It has already entered pilot application phases within several key industrial sectors, including energy and power, smart manufacturing, and national laboratory research. These early deployments are crucial for validating the solution's performance in demanding, real-world environments and for refining its capabilities ahead of a broader market release. This move positions the joint offering as a practical tool for accelerating AI adoption in mission-critical industries.
