Open Access
Open access
Eurasip Journal on Audio, Speech, and Music Processing, volume 2024, issue 1, publication number 63

Variants of LSTM cells for single-channel speaker-conditioned target speaker extraction

Ragini Sinha 1
Christian Rollwage 1
Simon Doclo 1, 2
1
 
Fraunhofer Institute for Digital Media Technology IDMT, Oldenburg Branch for Hearing, Speech and Audio Technology HSA, Oldenburg, Germany
2
 
Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
Publication typeJournal Article
Publication date2024-12-02
scimago Q2
SJR0.414
CiteScore4.1
Impact factor1.7
ISSN16874714, 16874722
Abstract

Speaker-conditioned target speaker extraction aims at estimating the target speaker from a mixture of speakers utilizing auxiliary information about the target speaker. In this paper, we consider a single-channel target speaker extraction system consisting of a speaker embedder network and a speaker separator network. Instead of using standard long short-term memory (LSTM) cells in the separator network, we propose two variants of LSTM cells that are customized for speaker-conditioned target speaker extraction. The first variant customizes both the forget gate and input gate of the LSTM cell, aiming at retaining only relevant features related to target speaker and disregarding the interfering speakers by simultaneously resetting and updating the cell state using the speaker embedding. For the second variant, we introduce a new gate within the LSTM cell, referred to as auxiliary-modulation gate. This gate modulates the information processing during cell state reset, aiming at learning the long-term and short-term discriminative features of the target speaker. Both in unidirectional and bidirectional mode, experimental results on 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures (containing 1, 2, or 3 speakers) show that both proposed variants of LSTM cells outperform the standard LSTM cells for target speaker extraction, where the best performance is obtained using the auxiliary-gated LSTM cells.

Found 
Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?