Open Access
Open access
volume 13 pages 36701-36713

Hybrid Convolutional Neural Network-Transformer Model for End-to-End Binaural Sound Source Localization in Reverberant Environments

Publication typeJournal Article
Publication date2025-02-24
scimago Q1
wos Q2
SJR0.849
CiteScore9.0
Impact factor3.6
ISSN21693536
Abstract
The end-to-end binaural sound source localization model can implicitly extract features from the original signal waveforms and take full advantage of neural networks. In this study, we propose a new deep learning-based end-to-end binaural sound source localization model called WavLocT, which employs a gammatone filter bank to simulate the frequency properties of the human ear and decomposes the signals received by microphones into subband signals. The WavLocT model has a unique feature extraction block that incorporates a convolutional neural network (CNN) and transformer structure, in which binaural localization features are extracted from different subband signal waveforms via the CNN structure. Additionally, we use a selective attention mechanism for different frequency subbands via the transformer encoder. In the training phase, we simulated three reverberant rooms of different sizes and trained two models separately for diffuse and directional noise environments. In the testing phase, we applied the trained models to both binaural room impulse response (BRIR)-matched and BRIR-mismatched environments. We selected the root mean square error (RMSE) and accuracy (Acc) as evaluation metrics to evaluate model performance and compared the results with those of two recent CNN-based end-to-end binaural sound source localization models. The results of the simulation experiments demonstrated that the proposed WavLocT model effectively estimated the azimuth of the desired speech signal in both diffuse and directional noise environments. Specifically, in the diffuse noise environment, WavLocT achieved an average RMSE of 6.45° and Acc of 70.02% across all mismatched rooms, outperforming WavLoc (6.72° RMSE, 56.10% Acc) and WavLocEC (7.46° RMSE, 65.86% Acc). In the directional noise environment, WavLocT achieved an average RMSE of 7.61° and Acc of 63.38% across all mismatched rooms, outperforming WavLoc (8.34° RMSE, 52.38% Acc) and WavLocEC (8.19° RMSE, 58.05% Acc).
Found 
Found 

Top-30

Journals

1
Applied Sciences (Switzerland)
1 publication, 100%
1

Publishers

1
MDPI
1 publication, 100%
1
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
1
Share
Cite this
GOST |
Cite this
GOST Copy
Chen X. et al. Hybrid Convolutional Neural Network-Transformer Model for End-to-End Binaural Sound Source Localization in Reverberant Environments // IEEE Access. 2025. Vol. 13. pp. 36701-36713.
GOST all authors (up to 50) Copy
Chen X., Zhao L., Cui J., Li H., Wang X. Hybrid Convolutional Neural Network-Transformer Model for End-to-End Binaural Sound Source Localization in Reverberant Environments // IEEE Access. 2025. Vol. 13. pp. 36701-36713.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1109/access.2025.3545065
UR - https://ieeexplore.ieee.org/document/10902008/
TI - Hybrid Convolutional Neural Network-Transformer Model for End-to-End Binaural Sound Source Localization in Reverberant Environments
T2 - IEEE Access
AU - Chen, Xinyi
AU - Zhao, Lijia
AU - Cui, Jie
AU - Li, Hua
AU - Wang, Xiaodong
PY - 2025
DA - 2025/02/24
PB - Institute of Electrical and Electronics Engineers (IEEE)
SP - 36701-36713
VL - 13
SN - 2169-3536
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@article{2025_Chen,
author = {Xinyi Chen and Lijia Zhao and Jie Cui and Hua Li and Xiaodong Wang},
title = {Hybrid Convolutional Neural Network-Transformer Model for End-to-End Binaural Sound Source Localization in Reverberant Environments},
journal = {IEEE Access},
year = {2025},
volume = {13},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
month = {feb},
url = {https://ieeexplore.ieee.org/document/10902008/},
pages = {36701--36713},
doi = {10.1109/access.2025.3545065}
}