International Journal of Machine Learning and Cybernetics, volume 15, issue 10, pages 4787-4799

Pronunciation guided copy and correction model for ASR error correction

Ling Dong ^{1, 2}

Wenjun Wang ^{1, 2}

Zhengtao Yu ^{1, 2}

Yuxin Huang ^{1, 2}

Junjun Guo ^{1, 2}

Guojiang Zhou ^{1, 2}

Hide authors affiliations Show authors affiliations: 2 affiliations

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China |

Yunnan Key Laboratory of Artificial Intelligence, Kunming, China |

Publication type: Journal Article

Publication date: 2024-06-10

Springer Nature

Journal: International Journal of Machine Learning and Cybernetics

scimago Q1

wos Q2

SJR: 0.988

CiteScore: 7.9

Impact factor: 3.1

ISSN: 18688071, 1868808X

DOI: 10.1007/s13042-024-02191-7

Copy DOI

Abstract

Error correction has proven to be an effective means for refining mistakes produced by Automatic Speech Recognition (ASR) models, thereby contributing to a notable reduction in the Word Error Rate (WER) at the ASR post-edit stage. Existing ASR error correction methods built upon sequence-to-sequence architecture may be suffered from the over-correction issue, resulting in the introduction of new mistakes or alterations to correct portions. In this paper, we propose a Pronunciation Guided Copy and Correction (PGCC) model for ASR error correction. Leveraging the fact that ASR hypotheses share a big overlap with the correct text and are frequently characterized by homophone errors, our approach incorporates a copy module into the BART pre-trained model’s encoder-decoder structure, this module optimally decides whether to retain a token from the source input (via copying) or generate a modified one through the decoder. Furthermore, a hierarchical phonetic feature encoder is designed to provide guidance to the copy module and BART decoder, implicitly identifying the positions of homophone errors and generating precise corrections. Experiments on two public datasets demonstrate the effectiveness of our proposed method, showcasing remarkable reductions of 18.18% and 44.84% in character error rate and outperforming solid baseline models.

Found

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Metrics

Cite this

GOST |

Cite this

GOST Copy

Dong L. et al. Pronunciation guided copy and correction model for ASR error correction // International Journal of Machine Learning and Cybernetics. 2024. Vol. 15. No. 10. pp. 4787-4799.

GOST all authors (up to 50) Copy

Dong L., Wang W., Yu Z., Huang Y., Guo J., Zhou G. Pronunciation guided copy and correction model for ASR error correction // International Journal of Machine Learning and Cybernetics. 2024. Vol. 15. No. 10. pp. 4787-4799.

RIS |

Cite this

RIS Copy

TY - JOUR

DO - 10.1007/s13042-024-02191-7

UR - https://link.springer.com/10.1007/s13042-024-02191-7

TI - Pronunciation guided copy and correction model for ASR error correction

T2 - International Journal of Machine Learning and Cybernetics

AU - Dong, Ling

AU - Wang, Wenjun

AU - Yu, Zhengtao

AU - Huang, Yuxin

AU - Guo, Junjun

AU - Zhou, Guojiang

PY - 2024

DA - 2024/06/10

PB - Springer Nature

SP - 4787-4799

IS - 10

VL - 15

SN - 1868-8071

SN - 1868-808X

ER -

BibTex |

Cite this

BibTex (up to 50 authors) Copy

@article{2024_Dong,

author = {Ling Dong and Wenjun Wang and Zhengtao Yu and Yuxin Huang and Junjun Guo and Guojiang Zhou},

title = {Pronunciation guided copy and correction model for ASR error correction},

journal = {International Journal of Machine Learning and Cybernetics},

year = {2024},

volume = {15},

publisher = {Springer Nature},

month = {jun},

url = {https://link.springer.com/10.1007/s13042-024-02191-7},

number = {10},

pages = {4787--4799},

doi = {10.1007/s13042-024-02191-7}

}

MLA

Cite this

MLA Copy

Dong, Ling, et al. “Pronunciation guided copy and correction model for ASR error correction.” International Journal of Machine Learning and Cybernetics, vol. 15, no. 10, Jun. 2024, pp. 4787-4799. https://link.springer.com/10.1007/s13042-024-02191-7.

Found error?

Publisher

Springer Nature

Journal

International Journal of Machine Learning and Cybernetics

scimago Q1

wos Q2

SJR

0.988

CiteScore

7.9

Impact factor

3.1

ISSN

18688071 (Print)

1868808X (Electronic)