Open Access

Eurasip Journal on Audio, Speech, and Music Processing, volume 2024, issue 1, publication number 65

Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array

Takao Kawamura ¹

Yuma Kinoshita ^{1, 2}

Nobutaka Ono ¹

Robin Scheibler ³

Hide authors affiliations

Department of Computer Science, Tokyo Metropolitan University, Hino-shi, Japan

Department of Human and Information Science, Tokai University, Hiratsuka-shi, Japan |

Music Processing Team, LY Corporation, Chiyoda-ku, Japan

Publication type: Journal Article

Publication date: 2024-12-21

Springer Nature

Journal: Eurasip Journal on Audio, Speech, and Music Processing

scimago Q2

SJR: 0.414

CiteScore: 4.1

Impact factor: 1.7

ISSN: 16874714, 16874722

DOI: 10.1186/s13636-024-00386-y

Copy DOI

Abstract

In this study, we investigate the effectiveness of spatial features in acoustic scene classification using distributed microphone arrays. Under the assumption that multiple subarrays, each equipped with microphones, are synchronized, we investigate two types of spatial feature: intra- and inter-generalized cross-correlation phase transforms (GCC-PHATs). These are derived from channels within the same subarray and between different subarrays, respectively. Our approach treats the log-Mel spectrogram as a spectral feature and intra- and/or inter-GCC-PHAT as a spatial feature. We propose two integration methods for spectral and spatial features: (a) middle integration, which fuses embeddings obtained by spectral and spatial features, and (b) late integration, which fuses decisions estimated using spectral and spatial features. The evaluation experiments showed that, when using only spectral features, employing all channels did not markedly improve the F1-score compared with the single-channel case. In contrast, integrating both spectral and spatial features improved the F1-score compared with using only spectral features. Additionally, we confirmed that the F1-score for late integration was slightly higher than that for middle integration.

Found

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Publication PDF

Metrics

Cite this

GOST | RIS | BibTex

Found error?

Publisher

Springer Nature

Journal

Eurasip Journal on Audio, Speech, and Music Processing

scimago Q2

SJR

0.414

CiteScore

4.1

Impact factor

1.7

ISSN

16874714 (Print)

16874722 (Electronic)