TR2020-136

Hierarchical Musical Instrument Separation


Abstract:

Many sounds that humans encounter are hierarchical in nature; a piano note is one of many played during a performance, which is one of many instruments in a band, which might be playing in a bar with other noises occurring. Inspired by this, we re-frame the musical source separation problem as hierarchical, combining similar instruments together at certain levels and separating them at other levels. This allows us to deconstruct the same mixture in multiple ways, depending on the appropriate level of the hierarchy for a given application. In this paper, we present various methods for hierarchical musical instrument separation, with some methods focusing on separating specific instruments (like guitars) and other methods that determine what to separate based on a user-supplied audio example. We additionally show that separating all hierarchy levels is possible even when training data is limited at fine-grained levels of the hierarchy

 

  • Software Download

  • Related News & Events

    •  NEWS   Jonathan Le Roux discusses MERL's audio source separation work on popular machine learning podcast
      Date: January 24, 2022
      Where: The TWIML AI Podcast
      MERL Contact: Jonathan Le Roux
      Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
      Brief
      • MERL Speech & Audio Senior Team Leader Jonathan Le Roux was featured in an extended interview on the popular TWIML AI Podcast, presenting MERL's work towards solving the "cocktail party problem". Humans have the extraordinary ability to focus on particular sounds of interest within a complex acoustic scene, such as a cocktail party. MERL's Speech & Audio Team has been at the forefront of the field's effort to develop algorithms giving machines similar abilities. Jonathan talked with host Sam Charrington about the group's decade-long journey on this topic, from early pioneering work using deep learning for speech enhancement and speech separation, to recent works on weakly-supervised separation, hierarchical sound separation, as well as the separation of real-world soundtracks into speech, music, and sound effects (aka the "cocktail fork problem").

        The TWIML AI podcast, formerly known as This Week in Machine Learning & AI, was created in 2016 and is followed by more than 10,000 subscribers on Youtube and Twitter. Jonathan's interview marks the 555th episode of the podcast.
    •  
    •  AWARD   Best Poster Award and Best Video Award at the International Society for Music Information Retrieval Conference (ISMIR) 2020
      Date: October 15, 2020
      Awarded to: Ethan Manilow, Gordon Wichern, Jonathan Le Roux
      MERL Contacts: Jonathan Le Roux; Gordon Wichern
      Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
      Brief
      • Former MERL intern Ethan Manilow and MERL researchers Gordon Wichern and Jonathan Le Roux won Best Poster Award and Best Video Award at the 2020 International Society for Music Information Retrieval Conference (ISMIR 2020) for the paper "Hierarchical Musical Source Separation". The conference was held October 11-14 in a virtual format. The Best Poster Awards and Best Video Awards were awarded by popular vote among the conference attendees.

        The paper proposes a new method for isolating individual sounds in an audio mixture that accounts for the hierarchical relationship between sound sources. Many sounds we are interested in analyzing are hierarchical in nature, e.g., during a music performance, a hi-hat note is one of many such hi-hat notes, which is one of several parts of a drumkit, itself one of many instruments in a band, which might be playing in a bar with other sounds occurring. Inspired by this, the paper re-frames the audio source separation problem as hierarchical, combining similar sounds together at certain levels while separating them at other levels, and shows on a musical instrument separation task that a hierarchical approach outperforms non-hierarchical models while also requiring less training data. The paper, poster, and video can be seen on the paper page on the ISMIR website.
    •