Open Access
Open access
Eurasip Journal on Audio, Speech, and Music Processing, volume 2024, issue 1, publication number 64

Domain-weighted transfer learning and discriminative embeddings for low-resource speaker verification

Han Wang 1
Mingrui He 1
Mingjun Zhang 1
Changzhi Luo 2
Longting Xu 1
Publication typeJournal Article
Publication date2024-12-20
scimago Q2
SJR0.414
CiteScore4.1
Impact factor1.7
ISSN16874714, 16874722
Abstract
Transfer learning has been shown to be effective in enhancing speaker verification performance in low-resource conditions. However, the inclusion of additional datasets may cause domain mismatch. Additionally, mismatched data volume and model complexity during fine-tuning can degrade speaker verification performance. In this paper, we propose a domain-weighted allocation fine-tuning strategy that employs the Kernel Mean Matching (KMM) algorithm to adjust the distribution differences between the in-domain and out-of-domain datasets. It assigns weights to each sample in the source datasets and utilizes the maximum mean discrepancy (MMD) distance to measure the effectiveness of distribution adaptation. The domain-weighted allocation fine-tuning strategy (DWA-FT) effectively mitigates the issue of domain mismatch during model training. We also propose two backend canonical correlation analysis (CCA) embedding transformation methods, the CCA embedding fusion and the CCA embedding constraint. These methods aim to enhance the quality of speaker embeddings. The experimental results demonstrate that the proposed methods effectively enhance the performance of the speaker verification system in low-resource scenarios. Compared to the baseline, our methods achieve relative improvements of 51.03% in PLDA scoring and 46.02% in cosine similarity scoring on the Himia dataset.
Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?