Open Access
Open access
volume 2025 issue 1 publication number 7

Investigations on higher-order spherical harmonic input features for deep learning-based multiple speaker detection and localization

Publication typeJournal Article
Publication date2025-02-12
scimago Q2
wos Q2
SJR0.417
CiteScore4.5
Impact factor1.9
ISSN16874714, 16874722
Abstract

In this paper, a detailed investigation of deep learning-based speaker detection and localization (SDL) with higher-order Ambisonics signals is conducted. Different spherical harmonic (SH) input features such as the higher-order pseudointensity vector (HO-PIV), relative harmonic coefficients (RHCs), and the spatially-localized pseudointensity vector (SL-PIV), a feature proposed for the first time as an input feature for deep learning-based SDL, are examined using first- to fourth-order SH signals. The trained neural networks, optimized with a single loss function for the combined tasks of detection and localization, are then evaluated in detail for overall SDL performance as well as their performance in the sub-tasks of detection and, particularly, localization. The results are further analyzed in dependence on room reverberation, signal-to-interference ratio (SIR), as well as the number and distances between multiple simultaneously active speakers, utilizing both simulated and measured data. The findings indicate an overall improvement in SDL performance up to third-order Ambisonics for all investigated features, while using fourth-order signals does not yield any further improvement or sometimes even delivers worse results. Notably, the HO-PIV and the SL-PIV, both extensions of the first-order pseudointensity vector (FO-PIV), have proven to be suitable input features. In particular the newly proposed SL-PIV has been found to be the best of the investigated features on third- and fourth-order Ambisonics signals, especially in the most demanding scenarios on measured data, with multiple, closely located speakers and poor SIR.

Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Share
Cite this
GOST |
Cite this
GOST Copy
Poschadel N. et al. Investigations on higher-order spherical harmonic input features for deep learning-based multiple speaker detection and localization // Eurasip Journal on Audio, Speech, and Music Processing. 2025. Vol. 2025. No. 1. 7
GOST all authors (up to 50) Copy
Poschadel N., Preihs S., Peissig J. Investigations on higher-order spherical harmonic input features for deep learning-based multiple speaker detection and localization // Eurasip Journal on Audio, Speech, and Music Processing. 2025. Vol. 2025. No. 1. 7
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1186/s13636-025-00393-7
UR - https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-025-00393-7
TI - Investigations on higher-order spherical harmonic input features for deep learning-based multiple speaker detection and localization
T2 - Eurasip Journal on Audio, Speech, and Music Processing
AU - Poschadel, Nils
AU - Preihs, Stephan
AU - Peissig, Jürgen
PY - 2025
DA - 2025/02/12
PB - Springer Nature
IS - 1
VL - 2025
SN - 1687-4714
SN - 1687-4722
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@article{2025_Poschadel,
author = {Nils Poschadel and Stephan Preihs and Jürgen Peissig},
title = {Investigations on higher-order spherical harmonic input features for deep learning-based multiple speaker detection and localization},
journal = {Eurasip Journal on Audio, Speech, and Music Processing},
year = {2025},
volume = {2025},
publisher = {Springer Nature},
month = {feb},
url = {https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-025-00393-7},
number = {1},
pages = {7},
doi = {10.1186/s13636-025-00393-7}
}