Inventi Impact: Multimedia

Articles

Inventi:emm/115095/26

Audiovisual Fusion Technique for Detecting Sensitive Content in Videos

01-Jul-2026 Research 2026 : July-September

Daniel Povedano Álvarez, Ana Lucila Sandoval Orozco, Luis Javier García Villalba

The detection of sensitive content in online videos is a key challenge for ensuring digital safety and effective content moderation. This work proposes the Multimodal Audiovisual Attention (MAV-Att), a multimodal deep learning framework that jointly exploits audio and visual cues to improve detection accuracy. The model was evaluated on the LSPD dataset, comprising 52,427 video segments of 20 s each, with optimized keyframe extraction. MAV-Att consists of dual audio and image branches enhanced by attention mechanisms to capture both temporal and cross-modal dependencies. Trained using a joint optimisation loss, the system achieved F1-scores of 94.9% on segments and 94.5% on entire videos, surpassing previous state-of-the-art models by 6.75%.

How to Cite this Article
Attribution/ CC Compliant Citation: Povedano Álvarez, D.; Sandoval Orozco, A.L.; García Villalba, L.J. Audiovisual Fusion Technique for Detecting Sensitive Content in Videos. Eng. Proc. 2026, 123, 11. https://doi.org/10.3390/engproc2026123011 http://creativecommons.org/licenses/by/4.0/ Some formatting elements, header, footer, logos, dates and pagination were modified while adapting this article.
Download Full Text

(+91) 89626 12340
[email protected]
Inventi Journals Pvt. Ltd.
SDX 82, Minal Residency, JK Road,
BHOPAL, 462023, MP, India