Open Access
Eurasip Journal on Audio, Speech, and Music Processing, volume 2024, issue 1, publication number 64
Domain-weighted transfer learning and discriminative embeddings for low-resource speaker verification
Han Wang
1
,
Mingrui He
1
,
Mingjun Zhang
1
,
Changzhi Luo
2
,
Longting Xu
1
2
Matrixtime Robotics Co., Ltd, Shanghai, China
Publication type: Journal Article
Publication date: 2024-12-20
scimago Q2
SJR: 0.414
CiteScore: 4.1
Impact factor: 1.7
ISSN: 16874714, 16874722
Abstract
Transfer learning has been shown to be effective in enhancing speaker verification performance in low-resource conditions. However, the inclusion of additional datasets may cause domain mismatch. Additionally, mismatched data volume and model complexity during fine-tuning can degrade speaker verification performance. In this paper, we propose a domain-weighted allocation fine-tuning strategy that employs the Kernel Mean Matching (KMM) algorithm to adjust the distribution differences between the in-domain and out-of-domain datasets. It assigns weights to each sample in the source datasets and utilizes the maximum mean discrepancy (MMD) distance to measure the effectiveness of distribution adaptation. The domain-weighted allocation fine-tuning strategy (DWA-FT) effectively mitigates the issue of domain mismatch during model training. We also propose two backend canonical correlation analysis (CCA) embedding transformation methods, the CCA embedding fusion and the CCA embedding constraint. These methods aim to enhance the quality of speaker embeddings. The experimental results demonstrate that the proposed methods effectively enhance the performance of the speaker verification system in low-resource scenarios. Compared to the baseline, our methods achieve relative improvements of 51.03% in PLDA scoring and 46.02% in cosine similarity scoring on the Himia dataset.
Found
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.