TR2026-034

FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement


    •  Masuyama, Y., Saijo, K., Paissan, F., Han, J., Delcroix, M., Aihara, R., Germain, F.G., Wichern, G., Le Roux, J., "FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2026.
      BibTeX TR2026-034 PDF
      • @inproceedings{Masuyama2026may2,
      • author = {Masuyama, Yoshiki and Saijo, Kohei and Paissan, Francesco and Han, Jiangyu and Delcroix, Marc and Aihara, Ryo and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
      • title = {{FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement}},
      • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      • year = 2026,
      • month = may,
      • url = {https://www.merl.com/publications/TR2026-034}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

Speech separation and enhancement (SSE) has advanced remark- ably and achieved promising results in controlled settings, such as a fixed number of speakers and a fixed array configuration. Towards a universal SSE system, single-channel systems have been extended to deal with a variable number of speakers (i.e., outputs). Mean- while, multi-channel systems accommodating various array configurations (i.e., inputs) have been developed. However, these attempts have been pursued separately. In this paper, we propose a flexible input and output SSE system, named FlexIO. It performs conditional separation using prompt vectors, one per speaker as a condi- tion, allowing separation of an arbitrary number of speakers. Multi- channel mixtures are processed together with the prompt vectors via an array-agnostic channel communication mechanism. Our experiments demonstrate that FlexIO successfully covers diverse conditions with one to five microphones and one to three speakers. We also confirm the robustness of FlexIO on CHiME-4 real data.

 

  • Related Publication

  •  Masuyama, Y., Saijo, K., Paissan, F., Han, J., Delcroix, M., Aihara, R., Germain, F.G., Wichern, G., Le Roux, J., "FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement", arXiv, October 2025.
    BibTeX arXiv
    • @article{Masuyama2025oct2,
    • author = {Masuyama, Yoshiki and Saijo, Kohei and Paissan, Francesco and Han, Jiangyu and Delcroix, Marc and Aihara, Ryo and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
    • title = {{FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement}},
    • journal = {arXiv},
    • year = 2025,
    • month = oct,
    • url = {https://arxiv.org/abs/2510.21485}
    • }