Displays

, том 90 , страницы 103103

A deep learning approach for music visualization: From audio features to descriptive video generation

Тип публикации: Journal Article

Дата публикации: 2025-12-01

Elsevier

Displays

scimago Q2

wos Q2

white level БС1

SJR: 0.665

CiteScore: 6.1

Impact factor: 3.4

ISSN: 01419382, 18727387

DOI: 10.1016/j.displa.2025.103103

Скопировать DOI

Краткое описание

This paper proposes a deep learning-based audio visualization method designed to generate video content synchronized with the audio’s style and rhythm through comprehensive analysis of multi-modal features including emotional semantics, stylistic patterns, rhythmic structures, and instrumental signatures. Conventional audio visualization approaches primarily generate videos through basic signal features such as spectral frequency and beat tracking, yet fail to interpret high-level auditory semantics including emotional contexts and stylistic complexity, resulting in mismatch between the visual content and the audio emotion. The innovation of this paper lies in its multi-dimensional audio analysis, which, combined with a Large Language Model, generates precise visual descriptions, followed by the use of a Text-to-Image model to create images that align with the audio’s style. The synthesized images are subsequently temporally aligned with audio streams via frame interpolation model, ensuring time alignment and dynamic consistency between the audio and video content. Experimental results demonstrate that the proposed method effectively ensures the quality of audio visualization, making the generated videos more closely aligned with the emotional and rhythmic changes in the audio.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.

Войти с ORCID

Метрики

Цитировать

ГОСТ |

Цитировать

ГОСТ Скопировать

Huang F. et al. A deep learning approach for music visualization: From audio features to descriptive video generation // Displays. 2025. Vol. 90. p. 103103.

ГОСТ со всеми авторами (до 50) Скопировать

Huang F., Xu Z., Min X., Song S. A deep learning approach for music visualization: From audio features to descriptive video generation // Displays. 2025. Vol. 90. p. 103103.

RIS |

Цитировать

RIS Скопировать

TY - JOUR

DO - 10.1016/j.displa.2025.103103

UR - https://linkinghub.elsevier.com/retrieve/pii/S0141938225001404

TI - A deep learning approach for music visualization: From audio features to descriptive video generation

T2 - Displays

AU - Huang, Fan

AU - Xu, Zhixin

AU - Min, Xiongkuo

AU - Song, Song

PY - 2025

DA - 2025/12/01

PB - Elsevier

SP - 103103

VL - 90

SN - 0141-9382

SN - 1872-7387

ER -

BibTex

Цитировать

BibTex (до 50 авторов) Скопировать

@article{2025_Huang,

author = {Fan Huang and Zhixin Xu and Xiongkuo Min and Song Song},

title = {A deep learning approach for music visualization: From audio features to descriptive video generation},

journal = {Displays},

year = {2025},

volume = {90},

publisher = {Elsevier},

month = {dec},

url = {https://linkinghub.elsevier.com/retrieve/pii/S0141938225001404},

pages = {103103},

doi = {10.1016/j.displa.2025.103103}

}

Ошибка в публикации?

Издатель

Elsevier

Журнал

Displays

scimago Q2

wos Q2

white level БС1

SJR

0.665

CiteScore

6.1

Impact factor

3.4

ISSN

01419382 (Print)

18727387