Open Access

Applied Sciences (Switzerland)

, том 13 , издание 13 , страницы 7579

Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths

Тип публикации: Journal Article

Дата публикации: 2023-06-27

MDPI

Applied Sciences (Switzerland)

SCImago Q2

WOS Q2

БС2

SJR: 0.555

CiteScore: 5.5

Impact factor: 2.5

ISSN: 20763417

DOI: 10.3390/app13137579

Скопировать DOI

Computer Science Applications

Process Chemistry and Technology

General Materials Science

Instrumentation

General Engineering

Fluid Flow and Transfer Processes

Краткое описание

Speech is critical for interpersonal communication, but not everyone has fluent communication skills. Speech disfluency, including stuttering and interruptions, affects not only emotional expression but also clarity of expression for people who stutter. Existing methods for detecting speech disfluency rely heavily on annotated data, which can be costly. Additionally, these methods have not considered the issue of variable-length disfluent speech, which limits the scalability of detection methods. To address these limitations, this paper proposes an automated method for detecting speech disfluency that can improve communication skills for individuals and assist therapists in tracking the progress of stuttering patients. The proposed method focuses on detecting four types of disfluency features using single-task detection and utilizes embeddings from the pre-trained wav2vec2.0 model, as well as convolutional neural network (CNN) and Transformer models for feature extraction. The model’s scalability is improved by considering the issue of variable-length disfluent speech and modifying the model based on the entropy invariance of attention mechanisms. The proposed automated method for detecting speech disfluency has the potential to assist individuals in overcoming speech disfluency, improve their communication skills, and aid therapists in tracking the progress of stuttering patients. Additionally, the model’s scalability across languages and lengths enhances its practical applicability. The experiments demonstrate that the model outperforms baseline models in both English and Chinese datasets, proving its universality and scalability in real-world applications.

Для доступа к списку цитирований публикации необходимо авторизоваться.

Войти с ORCID

Топ-30

Журналы

	1 2
Lecture Notes in Computer Science	Lecture Notes in Computer Science, 2, 22.22% Lecture Notes in Computer Science 2 публикации, 22.22%
Expert Systems with Applications	Expert Systems with Applications, 1, 11.11% Expert Systems with Applications 1 публикация, 11.11%
Clinical Linguistics and Phonetics	Clinical Linguistics and Phonetics, 1, 11.11% Clinical Linguistics and Phonetics 1 публикация, 11.11%
Lecture Notes in Networks and Systems	Lecture Notes in Networks and Systems, 1, 11.11% Lecture Notes in Networks and Systems 1 публикация, 11.11%
Computers in Biology and Medicine	Computers in Biology and Medicine, 1, 11.11% Computers in Biology and Medicine 1 публикация, 11.11%
Intelligent Systems with Applications	Intelligent Systems with Applications, 1, 11.11% Intelligent Systems with Applications 1 публикация, 11.11%
	1 2

Издатели

	1 2 3
Elsevier	Elsevier, 3, 33.33% Elsevier 3 публикации, 33.33%
Springer Nature	Springer Nature, 3, 33.33% Springer Nature 3 публикации, 33.33%
Taylor & Francis	Taylor & Francis, 1, 11.11% Taylor & Francis 1 публикация, 11.11%
Cold Spring Harbor Laboratory	Cold Spring Harbor Laboratory, 1, 11.11% Cold Spring Harbor Laboratory 1 публикация, 11.11%
Institute of Electrical and Electronics Engineers (IEEE)	Institute of Electrical and Electronics Engineers (IEEE), 1, 11.11% Institute of Electrical and Electronics Engineers (IEEE) 1 публикация, 11.11%
	1 2 3

Мы не учитываем публикации, у которых нет DOI.
Статистика публикаций обновляется еженедельно.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.

Войти с ORCID

PDF

Метрики

Цитировать

ГОСТ |

Цитировать

ГОСТ Скопировать

Liu J. et al. Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths // Applied Sciences (Switzerland). 2023. Vol. 13. No. 13. p. 7579.

ГОСТ со всеми авторами (до 50) Скопировать

Liu J., Wumaier A., Wei D., Guo S. Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths // Applied Sciences (Switzerland). 2023. Vol. 13. No. 13. p. 7579.

RIS |

Цитировать

RIS Скопировать

TY - JOUR

DO - 10.3390/app13137579

UR - https://doi.org/10.3390/app13137579

TI - Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths

T2 - Applied Sciences (Switzerland)

AU - Liu, Jiajun

AU - Wumaier, Aishan

AU - Wei, Dongping

AU - Guo, Shen

PY - 2023

DA - 2023/06/27

PB - MDPI

SP - 7579

IS - 13

VL - 13

SN - 2076-3417

ER -

BibTex |

Цитировать

BibTex (до 50 авторов) Скопировать

@article{2023_Liu,

author = {Jiajun Liu and Aishan Wumaier and Dongping Wei and Shen Guo},

title = {Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths},

journal = {Applied Sciences (Switzerland)},

year = {2023},

volume = {13},

publisher = {MDPI},

month = {jun},

url = {https://doi.org/10.3390/app13137579},

number = {13},

pages = {7579},

doi = {10.3390/app13137579}

}

MLA

Цитировать

MLA Скопировать

Liu, Jiajun, et al. “Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths.” Applied Sciences (Switzerland), vol. 13, no. 13, Jun. 2023, p. 7579. https://doi.org/10.3390/app13137579.

Издатель

MDPI

Журнал

Applied Sciences (Switzerland)

SCImago Q2

WOS Q2

БС2

SJR

0.555

CiteScore

5.5

Impact factor

2.5

ISSN

20763417 (Electronic)

Ошибка в публикации?