Software & Data Downloads — CFS

Cocktail Fork Separation for training and using the Multi Resolution CrossNet (MRX) model.

PyTorch implementation of the Multi Resolution CrossNet (MRX) model proposed in our ICASSP 2022 paper, "The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks." We include the weights for a model pre-trained on the Divide and Remaster (DnR) dataset, which can separate the audio from a soundtrack (e.g., movie or commercial) into individual speech, music, and sound effects stems. A pytorch_lightning script for model training using the DnR dataset is also included.

MERL Contact
- Jonathan
  Le Roux

Related Publications
Petermann, D., Wichern, G., Wang, Z.-Q., Le Roux, J., "The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP43922.2022.9746005, April 2022, pp. 526-530.
BibTeX TR2022-022 PDF Video Software
- @inproceedings{Petermann2022apr,
- author = {Petermann, Darius and Wichern, Gordon and Wang, Zhong-Qiu and {Le Roux}, Jonathan},
- title = {{The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2022,
- pages = {526--530},
- month = apr,
- doi = {10.1109/ICASSP43922.2022.9746005},
- url = {https://www.merl.com/publications/TR2022-022}
- }

Access software at https://github.com/merlresearch/cocktail-fork-separation.

JonathanLe Roux

Jonathan
Le Roux