Open Access

Eurasip Journal on Audio, Speech, and Music Processing, volume 2024, issue 1, publication number 64

Domain-weighted transfer learning and discriminative embeddings for low-resource speaker verification

Han Wang ¹

Mingrui He ¹

Mingjun Zhang ¹

Changzhi Luo ²

Longting Xu ¹

Hide authors affiliations

School of Information Science and Technology, Donghua University, Shanghai, China |

Matrixtime Robotics Co., Ltd, Shanghai, China

Publication type: Journal Article

Publication date: 2024-12-20

Springer Nature

Journal: Eurasip Journal on Audio, Speech, and Music Processing

scimago Q2

SJR: 0.414

CiteScore: 4.1

Impact factor: 1.7

ISSN: 16874714, 16874722

DOI: 10.1186/s13636-024-00385-z

Copy DOI

Abstract

Transfer learning has been shown to be effective in enhancing speaker verification performance in low-resource conditions. However, the inclusion of additional datasets may cause domain mismatch. Additionally, mismatched data volume and model complexity during fine-tuning can degrade speaker verification performance. In this paper, we propose a domain-weighted allocation fine-tuning strategy that employs the Kernel Mean Matching (KMM) algorithm to adjust the distribution differences between the in-domain and out-of-domain datasets. It assigns weights to each sample in the source datasets and utilizes the maximum mean discrepancy (MMD) distance to measure the effectiveness of distribution adaptation. The domain-weighted allocation fine-tuning strategy (DWA-FT) effectively mitigates the issue of domain mismatch during model training. We also propose two backend canonical correlation analysis (CCA) embedding transformation methods, the CCA embedding fusion and the CCA embedding constraint. These methods aim to enhance the quality of speaker embeddings. The experimental results demonstrate that the proposed methods effectively enhance the performance of the speaker verification system in low-resource scenarios. Compared to the baseline, our methods achieve relative improvements of 51.03% in PLDA scoring and 46.02% in cosine similarity scoring on the Himia dataset.

Found

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Publication PDF

Metrics

Cite this

GOST | RIS | BibTex

Found error?

Publisher

Springer Nature

Journal

Eurasip Journal on Audio, Speech, and Music Processing

scimago Q2

SJR

0.414

CiteScore

4.1

Impact factor

1.7

ISSN

16874714 (Print)

16874722 (Electronic)