Open Access
Open access
Eurasip Journal on Audio, Speech, and Music Processing, volume 2024, issue 1, publication number 65

Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array

Takao Kawamura 1
Yuma Kinoshita 1, 2
Nobutaka Ono 1
Robin Scheibler 3
1
 
Department of Computer Science, Tokyo Metropolitan University, Hino-shi, Japan
3
 
Music Processing Team, LY Corporation, Chiyoda-ku, Japan
Publication typeJournal Article
Publication date2024-12-21
scimago Q2
SJR0.414
CiteScore4.1
Impact factor1.7
ISSN16874714, 16874722
Abstract

In this study, we investigate the effectiveness of spatial features in acoustic scene classification using distributed microphone arrays. Under the assumption that multiple subarrays, each equipped with microphones, are synchronized, we investigate two types of spatial feature: intra- and inter-generalized cross-correlation phase transforms (GCC-PHATs). These are derived from channels within the same subarray and between different subarrays, respectively. Our approach treats the log-Mel spectrogram as a spectral feature and intra- and/or inter-GCC-PHAT as a spatial feature. We propose two integration methods for spectral and spatial features: (a) middle integration, which fuses embeddings obtained by spectral and spatial features, and (b) late integration, which fuses decisions estimated using spectral and spatial features. The evaluation experiments showed that, when using only spectral features, employing all channels did not markedly improve the F1-score compared with the single-channel case. In contrast, integrating both spectral and spatial features improved the F1-score compared with using only spectral features. Additionally, we confirmed that the F1-score for late integration was slightly higher than that for middle integration.

Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
Share
Cite this
GOST | RIS | BibTex
Found error?