том 90 страницы 103103

A deep learning approach for music visualization: From audio features to descriptive video generation

Тип публикацииJournal Article
Дата публикации2025-12-01
scimago Q2
wos Q2
white level БС1
SJR0.665
CiteScore6.1
Impact factor3.4
ISSN01419382, 18727387
Краткое описание
This paper proposes a deep learning-based audio visualization method designed to generate video content synchronized with the audio’s style and rhythm through comprehensive analysis of multi-modal features including emotional semantics, stylistic patterns, rhythmic structures, and instrumental signatures. Conventional audio visualization approaches primarily generate videos through basic signal features such as spectral frequency and beat tracking, yet fail to interpret high-level auditory semantics including emotional contexts and stylistic complexity, resulting in mismatch between the visual content and the audio emotion. The innovation of this paper lies in its multi-dimensional audio analysis, which, combined with a Large Language Model, generates precise visual descriptions, followed by the use of a Text-to-Image model to create images that align with the audio’s style. The synthesized images are subsequently temporally aligned with audio streams via frame interpolation model, ensuring time alignment and dynamic consistency between the audio and video content. Experimental results demonstrate that the proposed method effectively ensures the quality of audio visualization, making the generated videos more closely aligned with the emotional and rhythmic changes in the audio.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Метрики
0
Поделиться
Цитировать
ГОСТ |
Цитировать
Huang F. et al. A deep learning approach for music visualization: From audio features to descriptive video generation // Displays. 2025. Vol. 90. p. 103103.
ГОСТ со всеми авторами (до 50) Скопировать
Huang F., Xu Z., Min X., Song S. A deep learning approach for music visualization: From audio features to descriptive video generation // Displays. 2025. Vol. 90. p. 103103.
RIS |
Цитировать
TY - JOUR
DO - 10.1016/j.displa.2025.103103
UR - https://linkinghub.elsevier.com/retrieve/pii/S0141938225001404
TI - A deep learning approach for music visualization: From audio features to descriptive video generation
T2 - Displays
AU - Huang, Fan
AU - Xu, Zhixin
AU - Min, Xiongkuo
AU - Song, Song
PY - 2025
DA - 2025/12/01
PB - Elsevier
SP - 103103
VL - 90
SN - 0141-9382
SN - 1872-7387
ER -
BibTex
Цитировать
BibTex (до 50 авторов) Скопировать
@article{2025_Huang,
author = {Fan Huang and Zhixin Xu and Xiongkuo Min and Song Song},
title = {A deep learning approach for music visualization: From audio features to descriptive video generation},
journal = {Displays},
year = {2025},
volume = {90},
publisher = {Elsevier},
month = {dec},
url = {https://linkinghub.elsevier.com/retrieve/pii/S0141938225001404},
pages = {103103},
doi = {10.1016/j.displa.2025.103103}
}
Ошибка в публикации?