China Unicom proposes a new framework, MeanCache, to refresh the multimodal generative model inference acceleration benchmark.

robot
Abstract generation in progress

For stock trading, just look at the Golden Qilin analyst research reports—authoritative, professional, timely, and comprehensive. Help you uncover high-potential theme opportunities!

(Source: Machine Heart Pro)

Author and team introduction: The first author of this article is Gao Huanlin. The corresponding authors are Zhao Fang and Lian Shiguo. All authors are from the Union Data Intelligence Co., Ltd. (China Unicom Data Science and Artificial Intelligence Research Institute) - Yuanjing Large Model Research and Development team and Nanjing University, focusing on the R&D of the Union Yuanjing large model.

The inference speed of multimodal generation models such as FLUX and Qwen-Image has long been a pain point for deploying industrial-grade multimodal models. Traditional feature caching (Feature Caching) approaches, when pursuing high-speedup ratios, often cause trajectory drift due to severe fluctuations in instantaneous speed.

To address this pain point, research teams from the China Unicom Data Science and Artificial Intelligence Research Institute and Nanjing University, building on their earlier work LeMiCa (NeurIPS 2025 Spotlight), have continued to deepen their efforts and released the advanced acceleration framework MeanCache.

This work not only inherits the team’s deep accumulation in accelerating diffusion models, but also achieves a technical breakthrough: inspired by MeanFlow, for the first time MeanCache introduces an “average speed” perspective into cached inference. By using JVP to accurately correct generation trajectories, it achieves more than 4x faster inference. This result has been selected for ICLR 2026, an AI top conference. Currently, both the paper and code are open-sourced.

Technological innovation: A new caching paradigm driven by average speed

The core contribution of MeanCache is shifting caching acceleration from “instantaneous speed” to “average speed.” It mainly includes the following two key technical points:

JVP-driven average speed

This modeling approach expands the caching perspective from a single “point” to an “interval.” By providing a more stable guidance signal, it effectively corrects trajectory deviations under high-speedup acceleration.

Trajectory stability scheduling strategy

“When should we cache?” Previously, methods often relied on fixed step sizes or manual thresholds. MeanCache models the inference process as an optimization problem on a multigraph (Multigraph).

It treats each time step as a node, and defines the stability deviation between the predicted mean speed and the true value as the edge weight:

Nodes and edges form a multigraph, and then through the Peak-Suppressed Shortest Path algorithm, under a given computational budget and rule set, the optimal caching strategy is computed:

Experimental results: Refreshing SOTA acceleration performance

Text-to-image

In commercial text-to-image models Qwen-Image and FLUX.1 [dev], respectively, they achieve the highest 4x acceleration and reach SOTA performance on Image Reward and perception metrics.

From a visual quality perspective, as the speedup ratio increases, the images generated by MeanCache perform better in terms of content consistency.

Text-to-video

In the video generation model HunyuanVideo, it also achieves 3.6x acceleration and improvements on SOTA metrics.

In qualitative analysis of the video, MeanCache also shows better acceleration effects, both in terms of image quality and content consistency.

Semantic consistency: Going one step further, in tests on high-difficulty rare-word prompts (such as the picture “Peristeronic”), MeanCache demonstrates stronger semantic robustness.

Top industry team recommendations

At the same time, MeanCache has supported the latest Alibaba Tongyi Z-Image and Qwen-Image-2512 text-to-image models, and received the official homepage recommendation from the Z-Image team. The community has already supported ComfyUI

Summary and outlook

As a lightweight, training-free Flow Matching acceleration framework, MeanCache innovatively proposes the “average speed caching” and “trajectory stability scheduling” solutions. On the basis of ensuring high-fidelity image quality and content consistency, this approach significantly improves the inference efficiency of large models. The Union Yuanjing large model team will build on this as a foundation and continue to deeply explore model inference acceleration and generation in complex scenarios. We are committed to contributing more diversified technical perspectives to the industry, further lowering the usage barrier and compute cost of industrial-grade generative models.

Endless news, precise interpretation—right in the Sina Finance APP

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin