Gordon Wichern

- Phone: 617-621-7574
- Email:
-
Position:
Research / Technical Staff
Principal Research Scientist -
Education:
Ph.D., Arizona State University, 2010 -
Research Areas:
External Links:
Gordon's Quick Links
-
Biography
Gordon's research interests are at the intersection of signal processing and machine learning applied to speech, music, and environmental sounds. Prior to joining MERL, Gordon worked at iZotope inc. developing audio signal processing software, and at MIT Lincoln Laboratory where he worked in radar target tracking.
-
Recent News & Events
-
NEWS MERL presenting 13 papers and an industry talk at ICASSP 2020 Date: May 4, 2020 - May 8, 2020
Where: Virtual Barcelona
MERL Contacts: Karl Berntorp; Petros Boufounos; Chiori Hori; Takaaki Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Yanting Ma; Hassan Mansour; Niko Moritz; Philip Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & AudioBrief- MERL researchers are presenting 13 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held virtually from May 4-8, 2020. Petros Boufounos is also presenting a talk on the Computational Sensing Revolution in Array Processing (video) in ICASSP’s Industry Track, and Siheng Chen is co-organizing and chairing a special session on a Signal-Processing View of Graph Neural Networks.
Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, array processing, and parameter estimation. Videos for all talks are available on MERL's YouTube channel, with corresponding links in the references below.
This year again, MERL is a sponsor of the conference and will be participating in the Student Job Fair; please join us to learn about our internship program and career opportunities.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year. Originally planned to be held in Barcelona, Spain, ICASSP has moved to a fully virtual setting due to the COVID-19 crisis, with free registration for participants not covering a paper.
- MERL researchers are presenting 13 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held virtually from May 4-8, 2020. Petros Boufounos is also presenting a talk on the Computational Sensing Revolution in Array Processing (video) in ICASSP’s Industry Track, and Siheng Chen is co-organizing and chairing a special session on a Signal-Processing View of Graph Neural Networks.
-
NEWS MERL Speech & Audio Researchers Presenting 7 Papers and a Tutorial at Interspeech 2019 Date: September 15, 2019 - September 19, 2019
Where: Graz, Austria
MERL Contacts: Chiori Hori; Takaaki Hori; Jonathan Le Roux; Niko Moritz; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Speech & Audio Team researchers will be presenting 7 papers at the 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019, which is being held in Graz, Austria from September 15-19, 2019. Topics to be presented include recent advances in end-to-end speech recognition, speech separation, and audio-visual scene-aware dialog. Takaaki Hori is also co-presenting a tutorial on end-to-end speech processing.
Interspeech is the world's largest and most comprehensive conference on the science and technology of spoken language processing. It gathers around 2000 participants from all over the world.
- MERL Speech & Audio Team researchers will be presenting 7 papers at the 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019, which is being held in Graz, Austria from September 15-19, 2019. Topics to be presented include recent advances in end-to-end speech recognition, speech separation, and audio-visual scene-aware dialog. Takaaki Hori is also co-presenting a tutorial on end-to-end speech processing.
See All News & Events for Gordon -
-
Awards
-
AWARD Best Poster Award and Best Video Award at the International Society for Music Information Retrieval Conference (ISMIR) 2020 Date: October 15, 2020
Awarded to: Ethan Manilow, Gordon Wichern, Jonathan Le Roux
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- Former MERL intern Ethan Manilow and MERL researchers Gordon Wichern and Jonathan Le Roux won Best Poster Award and Best Video Award at the 2020 International Society for Music Information Retrieval Conference (ISMIR 2020) for the paper "Hierarchical Musical Source Separation". The conference was held October 11-14 in a virtual format. The Best Poster Awards and Best Video Awards were awarded by popular vote among the conference attendees.
The paper proposes a new method for isolating individual sounds in an audio mixture that accounts for the hierarchical relationship between sound sources. Many sounds we are interested in analyzing are hierarchical in nature, e.g., during a music performance, a hi-hat note is one of many such hi-hat notes, which is one of several parts of a drumkit, itself one of many instruments in a band, which might be playing in a bar with other sounds occurring. Inspired by this, the paper re-frames the audio source separation problem as hierarchical, combining similar sounds together at certain levels while separating them at other levels, and shows on a musical instrument separation task that a hierarchical approach outperforms non-hierarchical models while also requiring less training data. The paper, poster, and video can be seen on the paper page on the ISMIR website.
- Former MERL intern Ethan Manilow and MERL researchers Gordon Wichern and Jonathan Le Roux won Best Poster Award and Best Video Award at the 2020 International Society for Music Information Retrieval Conference (ISMIR 2020) for the paper "Hierarchical Musical Source Separation". The conference was held October 11-14 in a virtual format. The Best Poster Awards and Best Video Awards were awarded by popular vote among the conference attendees.
-
-
Research Highlights
-
Internships with Gordon
-
SA1469: Audio source separation and sound event detection
We are seeking multiple graduate students interested in helping advance the fields of source separation, speech enhancement, and sound event detection/localization in challenging multi-source and far-field scenarios. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. The ideal candidate would be a senior Ph.D. student with experience in audio signal processing, microphone array processing, probabilistic modeling, and deep learning techniques requiring minimal supervision (e.g., unsupervised, weakly-supervised, self-supervised, or few shot learning). The expected duration of the internship is 3-6 months and start date is flexible.
-
-
MERL Publications
- "Transcription is All You Need: Learning to Separate Musical Mixtures with Score as Supervision", arXiv, November 2020. ,
- "All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection", Annual Conference of the International Speech Communication Association (Interspeech), DOI: 10.21437/Interspeech.2020-2757, October 2020, pp. 3112-3116.BibTeX TR2020-138 PDF
- @inproceedings{Moritz2020oct,
- author = {Moritz, Niko and Wichern, Gordon and Hori, Takaaki and Le Roux, Jonathan},
- title = {All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection},
- booktitle = {Annual Conference of the International Speech Communication Association (Interspeech)},
- year = 2020,
- pages = {3112--3116},
- month = oct,
- doi = {10.21437/Interspeech.2020-2757},
- issn = {1990-9772},
- url = {https://www.merl.com/publications/TR2020-138}
- }
, - "Hierarchical Musical Instrument Separation", International Society for Music Information Retrieval (ISMIR) Conference, October 2020, pp. 376-383.BibTeX TR2020-136 PDF
- @inproceedings{Manilow2020oct,
- author = {Manilow, Ethan and Wichern, Gordon and Le Roux, Jonathan},
- title = {Hierarchical Musical Instrument Separation},
- booktitle = {International Society for Music Information Retrieval (ISMIR) Conference},
- year = 2020,
- pages = {376--383},
- month = oct,
- isbn = {978-0-9813537-0-8},
- url = {https://www.merl.com/publications/TR2020-136}
- }
, - "Autoclip: Adaptive Gradient Clipping For Source Separation Networks", IEEE International Workshop on Machine Learning for Signal Processing (MLSP), DOI: https://doi.org/10.1109/MLSP49062.2020.9231926, September 2020.BibTeX TR2020-132 PDF
- @inproceedings{Seetharaman2020sep,
- author = {Seetharaman, Prem and Wichern, Gordon and Pardo, Bryan and Le Roux, Jonathan},
- title = {Autoclip: Adaptive Gradient Clipping For Source Separation Networks},
- booktitle = {IEEE International Workshop on Machine Learning for Signal Processing (MLSP)},
- year = 2020,
- month = sep,
- publisher = {IEEE},
- doi = {https://doi.org/10.1109/MLSP49062.2020.9231926},
- url = {https://www.merl.com/publications/TR2020-132}
- }
, - "Finding Strength in Weakness: Learning to Separate Sounds with Weak Supervision", IEEE/ACM Transactions on Audio, Speech, and Language Processing, DOI: 10.1109/TASLP.2020.3013105, Vol. 28, pp. 2386-2399, September 2020.BibTeX TR2020-126 PDF
- @article{Pishdadian2020sep,
- author = {Pishdadian, Fatemeh and Wichern, Gordon and Le Roux, Jonathan},
- title = {Finding Strength in Weakness: Learning to Separate Sounds with Weak Supervision},
- journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
- year = 2020,
- volume = 28,
- pages = {2386--2399},
- month = sep,
- doi = {10.1109/TASLP.2020.3013105},
- url = {https://www.merl.com/publications/TR2020-126}
- }
,
-
Other Publications
- "Low-Latency approximation of bidirectional recurrent networks for speech denoising", 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2017, pp. 66-70.BibTeX
- @Inproceedings{8169996,
- author = {Wichern, G. and Lukin, A.},
- title = {Low-Latency approximation of bidirectional recurrent networks for speech denoising},
- booktitle = {2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
- year = 2017,
- pages = {66--70},
- month = {Oct}
- }
, - "Quantitative Analysis of Masking in Multitrack Mixes Using Loudness Loss", Sep 2016, Audio Engineering Society Convention 141.BibTeX External
- @Conference{wichern2016quantitative,
- author = {Wichern, G. and Robertson, H. and Wishnick, A.},
- title = {Quantitative Analysis of Masking in Multitrack Mixes Using Loudness Loss},
- booktitle = {Audio Engineering Society Convention 141},
- year = 2016,
- month = {Sep},
- url = {http://www.aes.org/e-lib/browse.cfm?elib=18450}
- }
, - "Comparison of Loudness Features for Automatic Level Adjustment in Mixing", Oct 2015, Audio Engineering Society Convention 139.BibTeX External
- @Conference{wichern2015comparison,
- author = {Wichern, G. and Wishnick, A. and Lukin, A. and Robertson, H.},
- title = {Comparison of Loudness Features for Automatic Level Adjustment in Mixing},
- booktitle = {Audio Engineering Society Convention 139},
- year = 2015,
- month = {Oct},
- url = {http://www.aes.org/e-lib/browse.cfm?elib=17928}
- }
, - "Noise adaptive optimization of matrix initialization for frequency-domain independent component analysis", Digital Signal Processing, Vol. 23, No. 1, pp. 1-8, 2013.BibTeX
- @Article{yamada2013noise,
- author = {Yamada, M. and Wichern, G. and Kondo, K. and Sugiyama, M. and Sawada, H.},
- title = {Noise adaptive optimization of matrix initialization for frequency-domain independent component analysis},
- journal = {Digital Signal Processing},
- year = 2013,
- volume = 23,
- number = 1,
- pages = {1--8},
- publisher = {Academic Press}
- }
, - "Improving the accuracy of least-squares probabilistic classifiers", IEICE transactions on information and systems, Vol. 94, No. 6, pp. 1337-1340, 2011.BibTeX
- @Article{yamada2011improving,
- author = {Yamada, M. and Sugiyama, M. and Wichern, G. and Simm, J.},
- title = {Improving the accuracy of least-squares probabilistic classifiers},
- journal = {IEICE transactions on information and systems},
- year = 2011,
- volume = 94,
- number = 6,
- pages = {1337--1340},
- publisher = {The Institute of Electronics, Information and Communication Engineers}
- }
, - "Segmentation, Indexing, and Retrieval for Environmental and Natural Sounds", IEEE Transactions on Audio, Speech, and Language Processing, Vol. 18, No. 3, pp. 688-707, March 2010.BibTeX
- @Article{5410056,
- author = {Wichern, G. and Xue, J. and Thornburg, H. and Mechtley, B. and Spanias, A.},
- title = {Segmentation, Indexing, and Retrieval for Environmental and Natural Sounds},
- journal = {IEEE Transactions on Audio, Speech, and Language Processing},
- year = 2010,
- volume = 18,
- number = 3,
- pages = {688--707},
- month = mar
- }
, - "Direct importance estimation with probabilistic principal component analyzers", 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, March 2010, pp. 1962-1965.BibTeX
- @Inproceedings{5495290,
- author = {Yamada, M. and Sugiyama, M. and Wichern, G.},
- title = {Direct importance estimation with probabilistic principal component analyzers},
- booktitle = {2010 IEEE International Conference on Acoustics, Speech and Signal Processing},
- year = 2010,
- pages = {1962--1965},
- month = mar
- }
, - "Acceleration of sequence kernel computation for real-time speaker identification", 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, March 2010, pp. 1626-1629.BibTeX
- @Inproceedings{5495542,
- author = {Yamada, M. and Sugiyama, M. and Wichern, G. and Matsui, T.},
- title = {Acceleration of sequence kernel computation for real-time speaker identification},
- booktitle = {2010 IEEE International Conference on Acoustics, Speech and Signal Processing},
- year = 2010,
- pages = {1626--1629},
- month = mar
- }
, - "Automatic audio tagging using covariate shift adaptation", 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, March 2010, pp. 253-256.BibTeX
- @Inproceedings{5495973,
- author = {Wichern, G. and Yamada, M. and Thornburg, H. and Sugiyama, M. and Spanias, A.},
- title = {Automatic audio tagging using covariate shift adaptation},
- booktitle = {2010 IEEE International Conference on Acoustics, Speech and Signal Processing},
- year = 2010,
- pages = {253--256},
- month = mar
- }
, - "Combining semantic, social, and acoustic similarity for retrieval of environmental sounds", 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, March 2010, pp. 2402-2405.BibTeX
- @Inproceedings{5496225,
- author = {Mechtley, B. and Wichern, G. and Thornburg, H. and Spanias, A.},
- title = {Combining semantic, social, and acoustic similarity for retrieval of environmental sounds},
- booktitle = {2010 IEEE International Conference on Acoustics, Speech and Signal Processing},
- year = 2010,
- pages = {2402--2405},
- month = mar
- }
, - "Audio content-based feature extraction algorithms using J-DSP for arts, media and engineering courses", 2010 IEEE Frontiers in Education Conference (FIE), Oct 2010, pp. T1F-1-T1F-6.BibTeX
- @Inproceedings{5673157,
- author = {Shah, M. and Wichern, G. and Spanias, A. and Thornburg, H.},
- title = {Audio content-based feature extraction algorithms using J-DSP for arts, media and engineering courses},
- booktitle = {2010 IEEE Frontiers in Education Conference (FIE)},
- year = 2010,
- pages = {T1F--1--T1F--6},
- month = {Oct}
- }
, - "Re-Sonification of Geographic Sound Activity using Acoustic, Semantic and Social Information", Proceedings of the 16th International Conference on Auditory Display (ICAD2010), 2010.BibTeX
- @Inproceedings{fink2010re,
- author = {Fink, A. and Mechtley, B. and Wichern, G. and Liu, J. and Thornburg, H. and Spanias, A. and Coleman, G.},
- title = {Re-Sonification of Geographic Sound Activity using Acoustic, Semantic and Social Information},
- booktitle = {Proceedings of the 16th International Conference on Auditory Display (ICAD2010)},
- year = 2010,
- organization = {Georgia Institute of Technology}
- }
, - "An ontological framework for retrieving environmental sounds using semantics and acoustic content", EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2010, No. 1, pp. 192363, 2010.BibTeX
- @Article{wichern2010ontological,
- author = {Wichern, G. and Mechtley, B. and Fink, A. and Thornburg, H. and Spanias, A.},
- title = {An ontological framework for retrieving environmental sounds using semantics and acoustic content},
- journal = {EURASIP Journal on Audio, Speech, and Music Processing},
- year = 2010,
- volume = 2010,
- number = 1,
- pages = 192363,
- publisher = {Springer International Publishing}
- }
, - "Direct importance estimation with a mixture of probabilistic principal component analyzers", IEICE Transactions on Information and Systems, Vol. 93, No. 10, pp. 2846-2849, 2010.BibTeX
- @Article{yamada2010direct,
- author = {Yamada, M. and Sugiyama, M. and Wichern, G. and Simm, J.},
- title = {Direct importance estimation with a mixture of probabilistic principal component analyzers},
- journal = {IEICE Transactions on Information and Systems},
- year = 2010,
- volume = 93,
- number = 10,
- pages = {2846--2849},
- publisher = {The Institute of Electronics, Information and Communication Engineers}
- }
, - "Multi-channel audio segmentation for continuous observation and archival of large spaces", 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, April 2009, pp. 237-240.BibTeX
- @Inproceedings{4959564,
- author = {Wichern, G. and Thornburg, H. and Spanias, A.},
- title = {Multi-channel audio segmentation for continuous observation and archival of large spaces},
- booktitle = {2009 IEEE International Conference on Acoustics, Speech and Signal Processing},
- year = 2009,
- pages = {237--240},
- month = apr
- }
, - "Continuous observation and archival of acoustic scenes using wireless sensor networks", 2009 16th International Conference on Digital Signal Processing, July 2009, pp. 1-6.BibTeX
- @Inproceedings{5201082,
- author = {Wichern, G. and Kwon, H. and Spanias, A. and Fink, A. and Thornburg, H.},
- title = {Continuous observation and archival of acoustic scenes using wireless sensor networks},
- booktitle = {2009 16th International Conference on Digital Signal Processing},
- year = 2009,
- pages = {1--6},
- month = jul
- }
, - "Unifying semantic and content-based approaches for retrieval of environmental sounds", 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct 2009, pp. 13-16.BibTeX
- @Inproceedings{5346493,
- author = {Wichern, G. and Thornburg, H. and Spanias, A.},
- title = {Unifying semantic and content-based approaches for retrieval of environmental sounds},
- booktitle = {2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics},
- year = 2009,
- pages = {13--16},
- month = {Oct}
- }
, - "Fast query by example of environmental sounds via robust and efficient cluster-based indexing", 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, March 2008, pp. 5-8.BibTeX
- @Inproceedings{4517532,
- author = {Xue, J. and Wichern, G. and Thornburg, H. and Spanias, A.},
- title = {Fast query by example of environmental sounds via robust and efficient cluster-based indexing},
- booktitle = {2008 IEEE International Conference on Acoustics, Speech and Signal Processing},
- year = 2008,
- pages = {5--8},
- month = mar
- }
, - "Robust Multi-Features Segmentation and Indexing for Natural Sound Environments", 2007 International Workshop on Content-Based Multimedia Indexing, June 2007, pp. 69-76.BibTeX
- @Inproceedings{4275057,
- author = {Wichern, G. and Thornburg, H. and Mechtley, B. and Fink, A. and Tu, K. and Spanias, A.},
- title = {Robust Multi-Features Segmentation and Indexing for Natural Sound Environments},
- booktitle = {2007 International Workshop on Content-Based Multimedia Indexing},
- year = 2007,
- pages = {69--76},
- month = jun
- }
, - "An Operationally Adaptive System for Rapid Acoustic Transmission Loss Prediction", 2007 International Joint Conference on Neural Networks, Aug 2007, pp. 2262-2267.BibTeX
- @Inproceedings{4371310,
- author = {McCarron, M. and Azimi-Sadjadi, M. R. and Wichem, G. and Mungiole, M.},
- title = {An Operationally Adaptive System for Rapid Acoustic Transmission Loss Prediction},
- booktitle = {2007 International Joint Conference on Neural Networks},
- year = 2007,
- pages = {2262--2267},
- month = {Aug}
- }
, - "Distortion-Aware Query-by-Example for Environmental Sounds", 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct 2007, pp. 335-338.BibTeX
- @Inproceedings{4393051,
- author = {Wichern, G. and Xue, J. and Thornburg, H. and Spanias, A.},
- title = {Distortion-Aware Query-by-Example for Environmental Sounds},
- booktitle = {2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics},
- year = 2007,
- pages = {335--338},
- month = {Oct}
- }
, - "Environmentally adaptive acoustic transmission loss prediction in turbulent and nonturbulent atmospheres", Neural Networks, Vol. 20, No. 4, pp. 484 - 497, 2007.BibTeX External
- @Article{WICHERN2007484,
- author = {Wichern, G. and Azimi-Sadjadi, M. R. and Mungiole, M.},
- title = {Environmentally adaptive acoustic transmission loss prediction in turbulent and nonturbulent atmospheres},
- journal = {Neural Networks},
- year = 2007,
- volume = 20,
- number = 4,
- pages = {484 -- 497},
- note = {Computational Intelligence in Earth and Environmental Sciences},
- url = {http://www.sciencedirect.com/science/article/pii/S089360800700055X}
- }
, - "An Environmentally Adaptive System for Rapid Acoustic Transmission Loss Prediction", The 2006 IEEE International Joint Conference on Neural Network Proceedings, 2006, pp. 5118-5125.BibTeX
- @Inproceedings{1716812,
- author = {Wichern, G. and Azimi-Sadjadi, M. R. and Mungiole, M.},
- title = {An Environmentally Adaptive System for Rapid Acoustic Transmission Loss Prediction},
- booktitle = {The 2006 IEEE International Joint Conference on Neural Network Proceedings},
- year = 2006,
- pages = {5118--5125}
- }
, - "Properties of randomly distributed sparse arrays", Proc. SPIE, 2006, vol. 6201.BibTeX
- @Inproceedings{azimi2006properties,
- author = {Azimi-Sadjadi, MR and Jiang, Y and Wichern, G},
- title = {Properties of randomly distributed sparse arrays},
- booktitle = {Proc. SPIE},
- year = 2006,
- volume = 6201
- }
, - "Unattended sparse acoustic array configurations and beamforming algorithms", Proc. SPIE, 2005, vol. 5796, pp. 40-51.BibTeX
- @Inproceedings{azimi2005unattended,
- author = {Azimi-Sadjadi, MR and Pezeshki, A and Scharf, LL and Wichern, G},
- title = {Unattended sparse acoustic array configurations and beamforming algorithms},
- booktitle = {Proc. SPIE},
- year = 2005,
- volume = 5796,
- pages = {40--51}
- }
,
- "Low-Latency approximation of bidirectional recurrent networks for speech denoising", 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2017, pp. 66-70.
-
Videos
-
MERL Issued Patents
-
Title: "Methods and Systems for Enhancing Audio Signals Corrupted by Noise"
Inventors: Le Roux, Jonathan; Watanabe, Shinji; Hershey, John R.; Wichern, Gordon P
Patent No.: 10,726,856
Issue Date: Jul 28, 2020 -
Title: "Methods and Systems for End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction"
Inventors: Le Roux, Jonathan; Hershey, John R.; Wang, Zhongqiu; Wichern, Gordon P
Patent No.: 10,529,349
Issue Date: Jan 7, 2020
-
Title: "Methods and Systems for Enhancing Audio Signals Corrupted by Noise"