A deep learning approach for music visualization: From audio features to descriptive video generation
Тип публикации: Journal Article
Дата публикации: 2025-12-01
scimago Q2
wos Q2
white level БС1
SJR: 0.665
CiteScore: 6.1
Impact factor: 3.4
ISSN: 01419382, 18727387
Краткое описание
This paper proposes a deep learning-based audio visualization method designed to generate video content synchronized with the audio’s style and rhythm through comprehensive analysis of multi-modal features including emotional semantics, stylistic patterns, rhythmic structures, and instrumental signatures. Conventional audio visualization approaches primarily generate videos through basic signal features such as spectral frequency and beat tracking, yet fail to interpret high-level auditory semantics including emotional contexts and stylistic complexity, resulting in mismatch between the visual content and the audio emotion. The innovation of this paper lies in its multi-dimensional audio analysis, which, combined with a Large Language Model, generates precise visual descriptions, followed by the use of a Text-to-Image model to create images that align with the audio’s style. The synthesized images are subsequently temporally aligned with audio streams via frame interpolation model, ensuring time alignment and dynamic consistency between the audio and video content. Experimental results demonstrate that the proposed method effectively ensures the quality of audio visualization, making the generated videos more closely aligned with the emotional and rhythmic changes in the audio.
Найдено
Ничего не найдено, попробуйте изменить настройки фильтра.
Вы ученый?
Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Метрики
0
Всего цитирований:
0
Цитировать
ГОСТ |
RIS |
BibTex
Цитировать
ГОСТ
Скопировать
Huang F. et al. A deep learning approach for music visualization: From audio features to descriptive video generation // Displays. 2025. Vol. 90. p. 103103.
ГОСТ со всеми авторами (до 50)
Скопировать
Huang F., Xu Z., Min X., Song S. A deep learning approach for music visualization: From audio features to descriptive video generation // Displays. 2025. Vol. 90. p. 103103.
Цитировать
RIS
Скопировать
TY - JOUR
DO - 10.1016/j.displa.2025.103103
UR - https://linkinghub.elsevier.com/retrieve/pii/S0141938225001404
TI - A deep learning approach for music visualization: From audio features to descriptive video generation
T2 - Displays
AU - Huang, Fan
AU - Xu, Zhixin
AU - Min, Xiongkuo
AU - Song, Song
PY - 2025
DA - 2025/12/01
PB - Elsevier
SP - 103103
VL - 90
SN - 0141-9382
SN - 1872-7387
ER -
Цитировать
BibTex (до 50 авторов)
Скопировать
@article{2025_Huang,
author = {Fan Huang and Zhixin Xu and Xiongkuo Min and Song Song},
title = {A deep learning approach for music visualization: From audio features to descriptive video generation},
journal = {Displays},
year = {2025},
volume = {90},
publisher = {Elsevier},
month = {dec},
url = {https://linkinghub.elsevier.com/retrieve/pii/S0141938225001404},
pages = {103103},
doi = {10.1016/j.displa.2025.103103}
}
Ошибка в публикации?