TR2026-098

The MERL Systems for DCASE 2026 Challenge Task 4

- Saijo, K., Masuyama, Y., Boeddeker, C., Wichern, G., Richter, J., Edo, T., Le Roux, J., "The MERL Systems for DCASE 2026 Challenge Task 4," Tech. Rep. TR2026-098, IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE Challenge), June 2026.
  BibTeX TR2026-098 PDF
  - @techreport{Saijo2026jun,
  - author = {{Saijo, Kohei and Masuyama, Yoshiki and Boeddeker, Christoph and Wichern, Gordon and Richter, Julius and Edo, Takahiro and Le Roux, Jonathan}},
  - title = {{The MERL Systems for DCASE 2026 Challenge Task 4}},
  - institution = {IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE Challenge)},
  - year = 2026,
  - month = jun,
  - url = {https://www.merl.com/publications/TR2026-098}
  - }
MERL Contacts:
Research Areas:

Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

This technical report describes our spatial semantic segmentation of sound scenes (S5) systems for DCASE 2026 Challenge Task 4. Inspired by the top-ranked system in DCASE 2025 Task 4, we adopt a cascaded framework consisting of universal sound separation (USS) with source counting, source classification, and class-aware refine- ment. In the first stage, a TF-Locoformer-based USS model separates multi-channel mixtures into single-channel foreground and interference signals. Then, each separated signal is classified into one of 18 foreground classes or as interference. The separated fore- ground signals are further refined by another TF-Locoformer-based model conditioned on the predicted class labels and the observed mixture. Our best system achieves CA-PI-SDRi of 14.95 dB and mixture accuracy of 78.11% on the dev test set.

Related News & Events

AWARD MERL Team Wins DCASE 2026 Challenge on Anomalous Sound Detection for Machine Condition Monitoring
Date: June 30, 2026
Awarded to: Takuya Fujimura, Gordon Wichern, Yoshiki Masuyama, Christoph Boeddeker, Kohei Saijo, Julius Richter, Takahiro Edo, and Jonathan Le Roux
MERL Contacts: Christoph Boeddeker; Takahiro Edo; Jonathan Le Roux; Yoshiki Masuyama; Julius Richter; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Signal Processing, Speech & Audio
Brief
- MERL's Speech & Audio team ranked 1st out of 51 teams in the DCASE 2026 Challenge’s Task 2, “Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring.” The team was led by MERL intern Takuya Fujimura, and also included Gordon Wichern, Yoshiki Masuyama, Christoph Boeddeker, Kohei Saijo, Julius Richter, Takahiro Edo, and Jonathan Le Roux.
  
  The IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE Challenge), started in 2013, has been organized yearly since 2016, and gathers challenges on multiple tasks related to the detection, analysis, and generation of sound events. This year, the DCASE 2026 Challenge received 421 submissions from 135 teams across seven tasks.
  
  The MERL team won Task 2, Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring, which aims at building noise-robust systems for automatically detecting machine failure via microphones when only normal machine operating data is available for system development. Task 2 was by far the most popular out of the 7 DCASE 2026 tasks, with 51 teams submitting 168 entries. The MERL team's system was built around MERL’s recently proposed paradigm of noise-aware self-supervised learning, which extracts noise robust features leveraging two-channel recordings, in which one microphone is used to capture noise. Anomaly detection is then performed in the extracted denoised feature space using advanced score normalization. The team's best submission obtained a composite score of 70.24% on five evaluation machines, largely outperforming the 2nd best team's 65.45%.
  
  MERL also participated in Task 4, Spatial Semantic Segmentation of Sound Scenes (S5) and placed 3rd out of 10 teams in separation performance. Our cascaded system consists of universal sound separation with source counting, source classification, and class-aware refinement, where the separation and refinement modules are built upon MERL's TF-Locoformer separation technology. Notably, the team's best submission obtained a label prediction accuracy of 76.92% on the evaluation set, largely outperforming the 2nd best team's 65.54%.

MERL Contacts:

YoshikiMasuyama

ChristophBoeddeker

GordonWichern

JuliusRichter

JonathanLe Roux

Research Areas:

Abstract:

Yoshiki
Masuyama

Christoph
Boeddeker

Gordon
Wichern

Julius
Richter

Jonathan
Le Roux