Inventi Impact: Digital Multimedia Broadcasting

Articles

Inventi:edmb/102588/25

STREAMMIND: Unlocking Full Frame Rate Streaming Video Dialogue Through Event-Gated Cognition

01-Jul-2025 Research 2025 : July-September

Xin Ding, Hao Wu, Yifan Yang, Shiqi Jiang, Qianxi Zhang, Donglin Bai, Zhibo Chen, Ting Cao

With the rise of real-world human-AI interaction applications, such as AI assistants, the need for Streaming Video Dialogue is critical. To address this need, we introduce STREAMMIND, a video LLM framework that achieves ultra-FPS streaming video processing (100 fps on a single A100) and enables proactive, always-on responses in real time, without explicit user intervention. To solve the key challenge of the contradiction between linear video streaming speed and quadratic transformer computation cost, we propose a novel perception-cognition interleaving paradigm named “event-gated LLM invocation”, in contrast to the existing per-time-step LLM invocation. By introducing a Cognition Gate network between the video encoder and the LLM, LLM is only invoked when relevant events occur. To realize the event feature extraction with constant cost, we propose Event-Preserving Feature Extractor (EPFE) based on state-space method, generating a single perception token for spatiotemporal features. These techniques enable the video LLM with full-FPS perception and real-time cognition response. Experiments on Ego4D and SoccerNet streaming tasks, as well as standard offline benchmarks, demonstrate stateof- the-art performance in both model capability and realtime efficiency, paving the way for ultra-high-FPS applications, such as Game AI and interactive media. The code and data is available at https://aka.ms/StreamMind.

How to Cite this Article
Attribution/ CC Compliant Citation: Ding, Xin, et al. "StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition." arXiv preprint arXiv:2503.06220 (2025). https://doi.org/arXiv:2503.06220v2 https://creativecommons.org/licenses/by/4.0/ Some formatting elements, header, footer, logos, dates and pagination were modified while adapting this article.
Download Full Text

Call Us: +4 (800) 888-0008

Inventi Impact: Digital Multimedia Broadcasting

Articles

Inventi:edmb/102588/25

STREAMMIND: Unlocking Full Frame Rate Streaming Video Dialogue Through Event-Gated Cognition

How to Cite this Article

Links

Contact Us