International Journal of Machine Learning and Cybernetics, volume 15, issue 10, pages 4787-4799
Pronunciation guided copy and correction model for ASR error correction
Ling Dong
1, 2
,
Wenjun Wang
1, 2
,
Zhengtao Yu
1, 2
,
Yuxin Huang
1, 2
,
Junjun Guo
1, 2
,
Guojiang Zhou
1, 2
2
Yunnan Key Laboratory of Artificial Intelligence, Kunming, China
|
Publication type: Journal Article
Publication date: 2024-06-10
Q1
Q2
SJR: 0.988
CiteScore: 7.9
Impact factor: 3.1
ISSN: 18688071, 1868808X
Abstract
Error correction has proven to be an effective means for refining mistakes produced by Automatic Speech Recognition (ASR) models, thereby contributing to a notable reduction in the Word Error Rate (WER) at the ASR post-edit stage. Existing ASR error correction methods built upon sequence-to-sequence architecture may be suffered from the over-correction issue, resulting in the introduction of new mistakes or alterations to correct portions. In this paper, we propose a Pronunciation Guided Copy and Correction (PGCC) model for ASR error correction. Leveraging the fact that ASR hypotheses share a big overlap with the correct text and are frequently characterized by homophone errors, our approach incorporates a copy module into the BART pre-trained model’s encoder-decoder structure, this module optimally decides whether to retain a token from the source input (via copying) or generate a modified one through the decoder. Furthermore, a hierarchical phonetic feature encoder is designed to provide guidance to the copy module and BART decoder, implicitly identifying the positions of homophone errors and generating precise corrections. Experiments on two public datasets demonstrate the effectiveness of our proposed method, showcasing remarkable reductions of 18.18% and 44.84% in character error rate and outperforming solid baseline models.
Found
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
Cite this
GOST |
RIS |
BibTex |
MLA
Cite this
GOST
Copy
Dong L. et al. Pronunciation guided copy and correction model for ASR error correction // International Journal of Machine Learning and Cybernetics. 2024. Vol. 15. No. 10. pp. 4787-4799.
GOST all authors (up to 50)
Copy
Dong L., Wang W., Yu Z., Huang Y., Guo J., Zhou G. Pronunciation guided copy and correction model for ASR error correction // International Journal of Machine Learning and Cybernetics. 2024. Vol. 15. No. 10. pp. 4787-4799.
Cite this
RIS
Copy
TY - JOUR
DO - 10.1007/s13042-024-02191-7
UR - https://link.springer.com/10.1007/s13042-024-02191-7
TI - Pronunciation guided copy and correction model for ASR error correction
T2 - International Journal of Machine Learning and Cybernetics
AU - Dong, Ling
AU - Wang, Wenjun
AU - Yu, Zhengtao
AU - Huang, Yuxin
AU - Guo, Junjun
AU - Zhou, Guojiang
PY - 2024
DA - 2024/06/10
PB - Springer Nature
SP - 4787-4799
IS - 10
VL - 15
SN - 1868-8071
SN - 1868-808X
ER -
Cite this
BibTex (up to 50 authors)
Copy
@article{2024_Dong,
author = {Ling Dong and Wenjun Wang and Zhengtao Yu and Yuxin Huang and Junjun Guo and Guojiang Zhou},
title = {Pronunciation guided copy and correction model for ASR error correction},
journal = {International Journal of Machine Learning and Cybernetics},
year = {2024},
volume = {15},
publisher = {Springer Nature},
month = {jun},
url = {https://link.springer.com/10.1007/s13042-024-02191-7},
number = {10},
pages = {4787--4799},
doi = {10.1007/s13042-024-02191-7}
}
Cite this
MLA
Copy
Dong, Ling, et al. “Pronunciation guided copy and correction model for ASR error correction.” International Journal of Machine Learning and Cybernetics, vol. 15, no. 10, Jun. 2024, pp. 4787-4799. https://link.springer.com/10.1007/s13042-024-02191-7.