Software & Data Downloads — CFS

Cocktail Fork Separation for training and using the Multi Resolution CrossNet (MRX) model.

PyTorch implementation of the Multi Resolution CrossNet (MRX) model proposed in our ICASSP 2022 paper, "The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks." We include the weights for a model pre-trained on the Divide and Remaster (DnR) dataset, which can separate the audio from a soundtrack (e.g., movie or commercial) into individual speech, music, and sound effects stems. A pytorch_lightning script for model training using the DnR dataset is also included.

  •  Petermann, D., Wichern, G., Wang, Z.-Q., Le Roux, J., "The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/​ICASSP43922.2022.9746005, April 2022, pp. 526-530.
    BibTeX TR2022-022 PDF Software
    • @inproceedings{Petermann2022apr,
    • author = {Petermann, Darius and Wichern, Gordon and Wang, Zhong-Qiu and Le Roux, Jonathan},
    • title = {The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks},
    • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
    • year = 2022,
    • pages = {526--530},
    • month = apr,
    • doi = {10.1109/ICASSP43922.2022.9746005},
    • url = {https://www.merl.com/publications/TR2022-022}
    • }

Access software at https://github.com/merlresearch/cocktail-fork-separation.