John R. Hershey

John R. Hershey
  • Biography

    Prior to joining MERL in 2010, John spent 5 years at IBM's T.J. Watson Research Center in New York, leading a Noise Robust Speech Recognition team.  He also spent a year as a visiting researcher in the speech group at Microsoft Research, after obtaining his Ph D from UCSD.   He is currently working on machine learning for signal separation, speech recognition, language processing, and adaptive user interfaces.

  • News & Events


    See All News & Events for John
  • Awards

    •  AWARD   MERL's Speech Team Achieves World's 2nd Best Performance at the Third CHiME Speech Separation and Recognition Challenge
      Date: December 15, 2015
      Awarded to: John R. Hershey, Takaaki Hori, Jonathan Le Roux and Shinji Watanabe
      MERL Contacts: John R. Hershey; Takaaki Hori; Jonathan Le Roux; Shinji Watanabe
      Research Areas: Multimedia, Speech & Audio
      Brief
      • The results of the third 'CHiME' Speech Separation and Recognition Challenge were publicly announced on December 15 at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015) held in Scottsdale, Arizona, USA. MERL's Speech and Audio Team, in collaboration with SRI, ranked 2nd out of 26 teams from Europe, Asia and the US. The task this year was to recognize speech recorded using a tablet in real environments such as cafes, buses, or busy streets. Due to the high levels of noise and the distance from the speaker's mouth to the microphones, this is very challenging task, where the baseline system only achieved 33.4% word error rate. The MERL/SRI system featured state-of-the-art techniques including multi-channel front-end, noise-robust feature extraction, and deep learning for speech enhancement, acoustic modeling, and language modeling, leading to a dramatic 73% reduction in word error rate, down to 9.1%. The core of the system has since been released as a new official challenge baseline for the community to use.
    •  
    •  AWARD   Awaya Prize Young Researcher Award
      Date: March 11, 2014
      Awarded to: Yuuki Tachioka
      Awarded for: "Effectiveness of discriminative approaches for speech recognition under noisy environments on the 2nd CHiME Challenge"
      Awarded by: Acoustical Society of Japan (ASJ)
      MERL Contacts: Shinji Watanabe; Jonathan Le Roux; John R. Hershey
      Research Areas: Multimedia, Speech & Audio
      Brief
      • MELCO researcher Yuuki Tachioka received the Awaya Prize Young Researcher Award from the Acoustical Society of Japan (ASJ) for "effectiveness of discriminative approaches for speech recognition under noisy environments on the 2nd CHiME Challenge", which was based on joint work with MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John R. Hershey.
    •  
    •  AWARD   CHiME 2012 Speech Separation and Recognition Challenge Best Performance
      Date: June 1, 2013
      Awarded to: Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux and John R. Hershey
      Awarded for: "Discriminative Methods for Noise Robust Speech Recognition: A CHiME Challenge Benchmark"
      Awarded by: International Workshop on Machine Listening in Multisource Environments (CHiME)
      MERL Contacts: Shinji Watanabe; Jonathan Le Roux; John R. Hershey
      Research Areas: Multimedia, Speech & Audio
      Brief
      • The results of the 2nd 'CHiME' Speech Separation and Recognition Challenge are out! The team formed by MELCO researcher Yuuki Tachioka and MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John Hershey obtained the best results in the continuous speech recognition task (Track 2). This very challenging task consisted in recognizing speech corrupted by highly non-stationary noises recorded in a real living room. Our proposal, which also included a simple yet extremely efficient denoising front-end, focused on investigating and developing state-of-the-art automatic speech recognition back-end techniques: feature transformation methods, as well as discriminative training methods for acoustic and language modeling. Our system significantly outperformed other participants. Our code has since been released as an improved baseline for the community to use.
    •  
    See All Awards for MERL
  • Research Highlights

  • MERL Publications

    See All Publications for John
  • Other Publications

    •  Cui, Xiaodong; Xue, Jian; Chen, Xin; Olsen, Peder A; Dognin, Pierre L; Chaudhari, Upendra V; Hershey, John R; Zhou, Bowen, "Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages", IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, No. 8, pp. 2252-2264, 2012.
      BibTeX
      • @Article{Cui2012,
      • author = {Cui, Xiaodong and Xue, Jian and Chen, Xin and Olsen, Peder A and Dognin, Pierre L and Chaudhari, Upendra V and Hershey, John R and Zhou, Bowen},
      • title = {Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages},
      • journal = {IEEE Transactions on Audio, Speech, and Language Processing},
      • year = 2012,
      • volume = 20,
      • number = 8,
      • pages = {2252--2264},
      • publisher = {IEEE}
      • }
    •  Chen, Xin; Cui, Xiaodong; Xue, Jian; Olsen, Peder; Hershey, John R; Zhou, Bowen; Zhao, Yunxin, "Clustering of bootstrapped acoustic model with full covariance", IEEE Conf. Acoust. Speech Signal Processing, 2011, pp. 4496-4499.
      BibTeX
      • @Inproceedings{Chen2011,
      • author = {Chen, Xin and Cui, Xiaodong and Xue, Jian and Olsen, Peder and Hershey, John R and Zhou, Bowen and Zhao, Yunxin},
      • title = {Clustering of bootstrapped acoustic model with full covariance},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2011,
      • pages = {4496--4499},
      • organization = {IEEE}
      • }
    •  Cui, Xiaodong; Chen, Xin; Xue, Jian; Olsen, Peder A; Hershey, John R; Zhou, Bowen, "Acoustic modeling with bootstrap and restructuring based on full covariance", Proc. Interspeech, 2011.
      BibTeX
      • @Inproceedings{Cui2011,
      • author = {Cui, Xiaodong and Chen, Xin and Xue, Jian and Olsen, Peder A and Hershey, John R and Zhou, Bowen},
      • title = {Acoustic modeling with bootstrap and restructuring based on full covariance},
      • booktitle = {Proc. Interspeech},
      • year = 2011
      • }
    •  Hershey, John R.; Olsen, Peder A.; Rennie, Steven J.; Aaron, Andy, "Audio Alchemy: Getting Computers to Understand Overlapping Speech", Scientific American Online, April 2011.
      BibTeX
      • @Article{Hershey2011,
      • author = {Hershey, John R. and Olsen, Peder A. and Rennie, Steven J. and Aaron, Andy},
      • title = {Audio Alchemy: Getting Computers to Understand Overlapping Speech},
      • journal = {Scientific American Online},
      • year = 2011,
      • month = apr,
      • organization = {Scientific American Online Article},
      • url = {http://www.scientificamerican.com/article.cfm?id=speech-getting-computers-understand-overlapping}
      • }
    •  Olsen, Peder A.; Hershey, John R.; Rennie, Steven J.; Goel, Vaibhava, "A speech recognition solution to an ancient cryptography problem," Tech. Rep. RC25109, IBM, 2011.
      BibTeX
      • @Techreport{Olsen2011,
      • author = {Olsen, Peder A. and Hershey, John R. and Rennie, Steven J. and Goel, Vaibhava},
      • title = {A speech recognition solution to an ancient cryptography problem},
      • institution = {IBM},
      • year = 2011,
      • number = {RC25109},
      • address = {Yorktown Heights, New York, USA}
      • }
    •  Cooke, Martin; Hershey, John R.; Rennie, Steven J., "Monaural Speech Separation and Recognition Challenge", Computer Speech and Language, Vol. 24, No. 1, pp. 1-15, January 2010.
      BibTeX
      • @Article{Cooke2010,
      • author = {Cooke, Martin and Hershey, John R. and Rennie, Steven J.},
      • title = {Monaural Speech Separation and Recognition Challenge},
      • journal = {Computer Speech and Language},
      • year = 2010,
      • volume = 24,
      • number = 1,
      • pages = {1--15},
      • month = jan
      • }
    •  Dognin, Pierre L.; Hershey, John R.; Goel, Vaibhava; Olsen, Peder A., "Restructuring Acoustic Models for Client and Server-based Automatic Speech Recognition", SQ2010, March 2010.
      BibTeX
      • @Inproceedings{Dognin2010,
      • author = {Dognin, Pierre L. and Hershey, John R. and Goel, Vaibhava and Olsen, Peder A.},
      • title = {Restructuring Acoustic Models for Client and Server-based Automatic Speech Recognition},
      • booktitle = {SQ2010},
      • year = 2010,
      • month = mar
      • }
    •  Dognin, Pierre L; Hershey, John R; Goel, Vaibhava; Olsen, Peder A, "Restructuring exponential family mixture models.", Proc. Interspeech, 2010, pp. 62-65.
      BibTeX
      • @Inproceedings{Dognin2010a,
      • author = {Dognin, Pierre L and Hershey, John R and Goel, Vaibhava and Olsen, Peder A},
      • title = {Restructuring exponential family mixture models.},
      • booktitle = {Proc. Interspeech},
      • year = 2010,
      • pages = {62--65}
      • }
    •  Hershey, John R.; Olsen, Peder A.; Rennie, Steven J., "Signal Interaction and the Devil Function", Proc. Interspeech, September 2010.
      BibTeX
      • @Inproceedings{Hershey2010,
      • author = {Hershey, John R. and Olsen, Peder A. and Rennie, Steven J.},
      • title = {Signal Interaction and the Devil Function},
      • booktitle = {Proc. Interspeech},
      • year = 2010,
      • address = {Makuhari, Japan},
      • month = sep,
      • organization = {ISCA}
      • }
    •  Hershey, John R.; Rennie, Steven J.; Olsen, Peder A.; Kristjansson, Trausti T., "Super-human multi-talker speech recognition: A graphical modeling approach", Computer Speech and Language, Vol. 24, No. 1, pp. 45-66, January 2010.
      BibTeX
      • @Article{Hershey2010a,
      • author = {Hershey, John R. and Rennie, Steven J. and Olsen, Peder A. and Kristjansson, Trausti T.},
      • title = {Super-human multi-talker speech recognition: A graphical modeling approach},
      • journal = {Computer Speech and Language},
      • year = 2010,
      • volume = 24,
      • number = 1,
      • pages = {45--66},
      • month = jan
      • }
    •  Marks, Tim K; Hershey, John R; Movellan, Javier R, "Tracking motion, deformation, and texture using conditionally gaussian processes", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 2, pp. 348-363, 2010.
      BibTeX
      • @Article{Marks2010,
      • author = {Marks, Tim K and Hershey, John R and Movellan, Javier R},
      • title = {Tracking motion, deformation, and texture using conditionally gaussian processes},
      • journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
      • year = 2010,
      • volume = 32,
      • number = 2,
      • pages = {348--363},
      • publisher = {IEEE}
      • }
    •  Olsen, Peder A; Goel, Vaibhava; Micchelli, Charles A; Hershey, John R, "Modeling posterior probabilities using the linear exponential family.", Proc. Interspeech, 2010, pp. 2994-2997.
      BibTeX
      • @Inproceedings{Olsen2010,
      • author = {Olsen, Peder A and Goel, Vaibhava and Micchelli, Charles A and Hershey, John R},
      • title = {Modeling posterior probabilities using the linear exponential family.},
      • booktitle = {Proc. Interspeech},
      • year = 2010,
      • pages = {2994--2997}
      • }
    •  Rennie, Steven J.; Hershey, John R.; Olsen, Peder A., "Graphical Models for Single-Channel Multitalker Speech Recognition", IEEE Signal Processing Magazine, Vol. 27, No. 6, pp. 66-80, 2010.
      BibTeX
      • @Article{Rennie2010,
      • author = {Rennie, Steven J. and Hershey, John R. and Olsen, Peder A.},
      • title = {Graphical Models for Single-Channel Multitalker Speech Recognition},
      • journal = {IEEE Signal Processing Magazine},
      • year = 2010,
      • volume = 27,
      • number = 6,
      • pages = {66--80}
      • }
    •  Dognin, Pierre L.; Goel, Vaibhava; Hershey, John R.; Olsen, Peder A., "A Fast, Accurate Approximation to Log Likelihood of Gaussian Mixture Models", IEEE Conf. Acoust. Speech Signal Processing, April 2009.
      BibTeX
      • @Inproceedings{Dognin2009,
      • author = {Dognin, Pierre L. and Goel, Vaibhava and Hershey, John R. and Olsen, Peder A.},
      • title = {A Fast, Accurate Approximation to Log Likelihood of Gaussian Mixture Models},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2009,
      • month = apr
      • }
    •  Dognin, Pierre L.; Hershey, John R.; Goel, Vaibhava; Olsen, Peder A., "Refactoring Acoustic Models using Variational Density Approximation", IEEE Conf. Acoust. Speech Signal Processing, April 2009, pp. 4473-4476.
      BibTeX
      • @Inproceedings{Dognin2009a,
      • author = {Dognin, Pierre L. and Hershey, John R. and Goel, Vaibhava and Olsen, Peder A.},
      • title = {Refactoring Acoustic Models using Variational Density Approximation},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2009,
      • pages = {4473--4476},
      • month = apr
      • }
    •  Dognin, Pierre L.; Hershey, John R.; Goel, Vaibhava; Olsen, Peder A., "Refactoring Acoustic Models using Variational Expectation-Maximization", Proc. Interspeech, April 2009, pp. 4473-4476.
      BibTeX
      • @Inproceedings{Dognin2009b,
      • author = {Dognin, Pierre L. and Hershey, John R. and Goel, Vaibhava and Olsen, Peder A.},
      • title = {Refactoring Acoustic Models using Variational Expectation-Maximization},
      • booktitle = {Proc. Interspeech},
      • year = 2009,
      • pages = {4473--4476},
      • month = apr
      • }
    •  Rennie, Steven J.; Hershey, John R.; Olsen, Peder A., "Hierarchical Variational Loopy Belief Propagation for Multi-talker Speech Recognition", Proc. ASRU, December 2009, pp. 176-181.
      BibTeX
      • @Inproceedings{Rennie2009,
      • author = {Rennie, Steven J. and Hershey, John R. and Olsen, Peder A.},
      • title = {Hierarchical Variational Loopy Belief Propagation for Multi-talker Speech Recognition},
      • booktitle = {Proc. ASRU},
      • year = 2009,
      • pages = {176--181},
      • month = dec
      • }
    •  Rennie, Steven J.; Hershey, John R.; Olsen, Peder A., "Variational Loopy Belief Propagation for Multi-talker Speech Recognition", Proc. Eurospeech, September 2009, pp. 1331-1334.
      BibTeX
      • @Inproceedings{Rennie2009a,
      • author = {Rennie, Steven J. and Hershey, John R. and Olsen, Peder A.},
      • title = {Variational Loopy Belief Propagation for Multi-talker Speech Recognition},
      • booktitle = {Proc. Eurospeech},
      • year = 2009,
      • pages = {1331--1334},
      • address = {Brighton, UK},
      • month = sep
      • }
    •  Rennie, Steven J.; Hershey, John R.; Olsen, Peder A., "Single-Channel Speech Separation and Recognition using Loopy Belief Propagation", IEEE Conf. Acoust. Speech Signal Processing, April 2009, pp. 3845-3848.
      BibTeX
      • @Inproceedings{Rennie2009b,
      • author = {Rennie, Steven J. and Hershey, John R. and Olsen, Peder A.},
      • title = {Single-Channel Speech Separation and Recognition using Loopy Belief Propagation},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2009,
      • pages = {3845--3848},
      • address = {Taipei, Taiwan},
      • month = apr,
      • organization = {IEEE}
      • }
    •  Chen, Jia-Yu; Hershey, John R; Olsen, Peder A; Yashchin, Emmanuel, "Accelerated monte carlo for kullback-leibler divergence between gaussian mixture models", IEEE Conf. Acoust. Speech Signal Processing, 2008, pp. 4553-4556.
      BibTeX
      • @Inproceedings{Chen2008,
      • author = {Chen, Jia-Yu and Hershey, John R and Olsen, Peder A and Yashchin, Emmanuel},
      • title = {Accelerated monte carlo for kullback-leibler divergence between gaussian mixture models},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2008,
      • pages = {4553--4556},
      • organization = {IEEE}
      • }
    •  Hershey, John R; Olsen, Peder A, "Variational bhattacharyya divergence for hidden markov models", IEEE Conf. Acoust. Speech Signal Processing, 2008, pp. 4557-4560.
      BibTeX
      • @Inproceedings{Hershey2008,
      • author = {Hershey, John R and Olsen, Peder A},
      • title = {Variational bhattacharyya divergence for hidden markov models},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2008,
      • pages = {4557--4560},
      • organization = {IEEE}
      • }
    •  Mohanty, Binit; Hershey, John R.; Olsen, Peder A.; Kozat, Suleyman; Goel, Vaibhava, "Optimizing speech recognition grammars using a measure of similarity between hidden Markov models", IEEE Conf. Acoust. Speech Signal Processing, 2008, pp. 4953-4956.
      BibTeX
      • @Inproceedings{Mohanty2008,
      • author = {Mohanty, Binit and Hershey, John R. and Olsen, Peder A. and Kozat, Suleyman and Goel, Vaibhava},
      • title = {Optimizing speech recognition grammars using a measure of similarity between hidden Markov models},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2008,
      • pages = {4953--4956},
      • organization = {IEEE}
      • }
    •  Rennie, Steven J.; Hershey, John R.; Olsen, Peder A., "Efficient model-based speech separation and denoising using non-negative subspace analysis", IEEE Conf. Acoust. Speech Signal Processing, April 2008, pp. 1833-1836.
      BibTeX
      • @Inproceedings{Rennie2008,
      • author = {Rennie, Steven J. and Hershey, John R. and Olsen, Peder A.},
      • title = {Efficient model-based speech separation and denoising using non-negative subspace analysis},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2008,
      • pages = {1833--1836},
      • address = {Las Vegas, Nevada},
      • month = apr
      • }
    •  Chen, Jia-Yu; Olsen, Peder; Hershey, John R., "Word Confusability - Measuring Hidden Markov Model Similarity", Proc. Interspeech, August 2007.
      BibTeX
      • @Inproceedings{Chen2007,
      • author = {Chen, Jia-Yu and Olsen, Peder and Hershey, John R.},
      • title = {Word Confusability - Measuring Hidden Markov Model Similarity},
      • booktitle = {Proc. Interspeech},
      • year = 2007,
      • month = aug
      • }
    •  Hershey, J. R.; Olsen, P. A., "Approximating the Kullback Leibler divergence between Gaussian mixture models", IEEE Conf. Acoust. Speech Signal Processing, 2007, pp. 317-320.
      BibTeX
      • @Inproceedings{Hershey2007,
      • author = {Hershey, J. R. and Olsen, P. A.},
      • title = {Approximating the Kullback Leibler divergence between Gaussian mixture models},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2007,
      • pages = {317--320}
      • }
    •  Hershey, John R.; Kristjansson, Trausti T.; Rennie, Steven J.; Olsen, Peder A., "Single Channel Speech Separation Using Factorial Dynamics" in Advances in Neural Information Processing Systems, Scholkopf, B. and Platt, J. and Hoffman, T., Eds., pp. 593-600, MIT Press, 2007.
      BibTeX
      • @Incollection{Hershey2007a,
      • author = {Hershey, John R. and Kristjansson, Trausti T. and Rennie, Steven J. and Olsen, Peder A.},
      • title = {Single Channel Speech Separation Using Factorial Dynamics},
      • booktitle = {Advances in Neural Information Processing Systems},
      • year = 2007,
      • editor = {Scholkopf, B. and Platt, J. and Hoffman, T.},
      • pages = {593--600},
      • address = {Cambridge, Massachusetts},
      • publisher = {MIT Press}
      • }
    •  Hershey, John R.; Olsen, Peder A.; Rennie, Steven J., "Variational Kullback Leibler Divergence For Hidden Markov Models", Proc. ASRU, December 2007.
      BibTeX
      • @Inproceedings{Hershey2007b,
      • author = {Hershey, John R. and Olsen, Peder A. and Rennie, Steven J.},
      • title = {Variational Kullback Leibler Divergence For Hidden Markov Models},
      • booktitle = {Proc. ASRU},
      • year = 2007,
      • address = {Kyoto, Japan},
      • month = dec
      • }
    •  Hershey, John R; Olsen, Peder A; Gopinath, Ramesh A, "Variational sampling approaches to word confusability", Information Theory and Applications Workshop, 2007, pp. 1-119.
      BibTeX
      • @Inproceedings{Hershey2007c,
      • author = {Hershey, John R and Olsen, Peder A and Gopinath, Ramesh A},
      • title = {Variational sampling approaches to word confusability},
      • booktitle = {Information Theory and Applications Workshop},
      • year = 2007,
      • pages = {1--119},
      • organization = {IEEE}
      • }
    •  Hershey, John R; Olsen, Peder A, "Approximating the Kullback Leibler divergence between Gaussian mixture models", IEEE Conf. Acoust. Speech Signal Processing, 2007, vol. 4, pp. IV-317.
      BibTeX
      • @Inproceedings{Hershey2007d,
      • author = {Hershey, John R and Olsen, Peder A},
      • title = {Approximating the Kullback Leibler divergence between Gaussian mixture models},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2007,
      • volume = 4,
      • pages = {IV--317},
      • organization = {IEEE}
      • }
    •  Olsen, Peder A.; Hershey, John R., "Bhattacharyya Error and Divergence using Variational Importance Sampling", Proc. Interspeech, August 2007.
      BibTeX
      • @Inproceedings{Olsen2007,
      • author = {Olsen, Peder A. and Hershey, John R.},
      • title = {Bhattacharyya Error and Divergence using Variational Importance Sampling},
      • booktitle = {Proc. Interspeech},
      • year = 2007,
      • month = aug,
      • note = {to appear}
      • }
    •  Kristjansson, T. T.; Hershey, J. R.; Olsen, P. A.; Rennie, S. J.; Gopinath, R., "Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System", Proc. Interspeech, 2006.
      BibTeX
      • @Inproceedings{Kristjansson2006,
      • author = {Kristjansson, T. T. and Hershey, J. R. and Olsen, P. A. and Rennie, S. J. and Gopinath, R.},
      • title = {Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System},
      • booktitle = {Proc. Interspeech},
      • year = 2006
      • }
    •  Rennie, Steven J.; Olsen, Peder A.; Hershey, John R.; Kristjansson, Trausti T., "Separating multiple speakers using temporal constraints", ISCA Workshop on Statistical And Perceptual Audition (SAPA), 2006.
      BibTeX
      • @Inproceedings{Rennie2006,
      • author = {Rennie, Steven J. and Olsen, Peder A. and Hershey, John R. and Kristjansson, Trausti T.},
      • title = {Separating multiple speakers using temporal constraints},
      • booktitle = {ISCA Workshop on Statistical And Perceptual Audition (SAPA)},
      • year = 2006
      • }
    •  Rennie, Steven J.; Olsen, Peder A.; Hershey, John R.; Kristjansson, Trausti T., "The Iroquois model: Using temporal dynamics to separate speakers", Workshop on Statistical and Perceptual Audio Processing (SAPA), 2006.
      BibTeX
      • @Inproceedings{Rennie2006a,
      • author = {Rennie, Steven J. and Olsen, Peder A. and Hershey, John R. and Kristjansson, Trausti T.},
      • title = {The Iroquois model: Using temporal dynamics to separate speakers},
      • booktitle = {Workshop on Statistical and Perceptual Audio Processing (SAPA)},
      • year = 2006
      • }
    •  Hershey, John R, "Perceptual inference in generative models", 2005, University of California, San Diego.
      BibTeX
      • @Phdthesis{Hershey2005,
      • author = {Hershey, John R},
      • title = {Perceptual inference in generative models},
      • school = {University of California, San Diego},
      • year = 2005
      • }
    •  Marks, Tim K.; Hershey, John R.; Roddey, J. Cooper; Movellan, Javier R., "Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters" in Advances in Neural Information Processing Systems, Saul, Lawrence K. and Weiss, Yair and Bottou, Leon, Eds., pp. 889-896, MIT Press, 2005.
      BibTeX
      • @Incollection{Marks2005,
      • author = {Marks, Tim K. and Hershey, John R. and Roddey, J. Cooper and Movellan, Javier R.},
      • title = {Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters},
      • booktitle = {Advances in Neural Information Processing Systems},
      • year = 2005,
      • editor = {Saul, Lawrence K. and Weiss, Yair and Bottou, Leon},
      • pages = {889--896},
      • address = {Cambridge, MA},
      • publisher = {MIT Press}
      • }
    •  Fasel, I; Dahl, R; Hershey, J R; Fortenberry, B; Susskind, J; Movellan, J R, "The machine perception toolbox", 2004.
      BibTeX
      • @Misc{Fasel2004,
      • author = {Fasel, I and Dahl, R and Hershey, J R and Fortenberry, B and Susskind, J and Movellan, J R},
      • title = {The machine perception toolbox},
      • year = 2004,
      • volume = 10,
      • number = 07,
      • pages = 2004
      • }
    •  Hershey, J. R.; Attias, H.; Jojic, N.; Kristjansson, T., "Audio Visual Graphical Models for Speech Detection and Enhancement", IEEE Conf. Acoust. Speech Signal Processing, May 17-21 2004.
      BibTeX
      • @Inproceedings{Hershey2004,
      • author = {Hershey, J. R. and Attias, H. and Jojic, N. and Kristjansson, T.},
      • title = {Audio Visual Graphical Models for Speech Detection and Enhancement},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2004,
      • address = {Montreal Canada},
      • month = {May 17-21},
      • organization = {IEEE}
      • }
    •  Hershey, John R.; Attias, Hagai; Jojic, Nebojsa; Kristjansson, Trausti, "Audio-visual graphical models for speech processing", IEEE Conf. Acoust. Speech Signal Processing, 2004, vol. 5.
      BibTeX
      • @Inproceedings{Hershey2004a,
      • author = {Hershey, John R. and Attias, Hagai and Jojic, Nebojsa and Kristjansson, Trausti},
      • title = {Audio-visual graphical models for speech processing},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2004,
      • volume = 5,
      • organization = {IEEE}
      • }
    •  Hershey, John R.; Kristjansson, Trausti; Zhang, Zhengyou, "Model-based fusion of bone and air sensors for speech enhancement and robust speech recognition", ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing (SAPA), 2004.
      BibTeX
      • @Inproceedings{Hershey2004b,
      • author = {Hershey, John R. and Kristjansson, Trausti and Zhang, Zhengyou},
      • title = {Model-based fusion of bone and air sensors for speech enhancement and robust speech recognition},
      • booktitle = {ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing (SAPA)},
      • year = 2004
      • }
    •  Kristjansson, Trausti T.; Attias, Hagai; Hershey, John R., "Single Microphone Source Separation using High Resolution Signal Reconstruction", IEEE Conf. Acoust. Speech Signal Processing, May 2004, pp. 817-820.
      BibTeX
      • @Inproceedings{Kristjansson2004,
      • author = {Kristjansson, Trausti T. and Attias, Hagai and Hershey, John R.},
      • title = {Single Microphone Source Separation using High Resolution Signal Reconstruction},
      • booktitle = {IEEE Conf. Acoust. Speech Signal Processing},
      • year = 2004,
      • pages = {817--820},
      • month = may
      • }
    •  Kristjansson, Trausti T.; Attias, Hagai; Hershey, John R., "Stereo based 3d tracking and scene learning, employing particle filtering within em", European Conference on Computer Vision (ECCV), 2004, pp. 546-559.
      BibTeX
      • @Inproceedings{Kristjansson2004a,
      • author = {Kristjansson, Trausti T. and Attias, Hagai and Hershey, John R.},
      • title = {Stereo based 3d tracking and scene learning, employing particle filtering within em},
      • booktitle = {European Conference on Computer Vision (ECCV)},
      • year = 2004,
      • pages = {546--559},
      • publisher = {Springer Berlin Heidelberg}
      • }
    •  Marks, Tim K.; Hershey, John R.; Roddey, J. Cooper; Movellan, Javier R., "3D Tracking of Morphable Objects Using Conditionally Gaussian Nonlinear Filters", Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Workshop on Generative Model Based Vision (GMBV), 2004.
      BibTeX
      • @Inproceedings{Marks2004,
      • author = {Marks, Tim K. and Hershey, John R. and Roddey, J. Cooper and Movellan, Javier R.},
      • title = {3D Tracking of Morphable Objects Using Conditionally Gaussian Nonlinear Filters},
      • booktitle = {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Workshop on Generative Model Based Vision (GMBV)},
      • year = 2004
      • }
    •  Movellan, Javier R.; Hershey, John R.; Susskind, Josh, "Large Scale Convolutional HMMs for Real Time Video Tracking", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2004.
      BibTeX
      • @Inproceedings{Movellan2004,
      • author = {Movellan, Javier R. and Hershey, John R. and Susskind, Josh},
      • title = {Large Scale Convolutional HMMs for Real Time Video Tracking},
      • booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
      • year = 2004
      • }
    •  Movellan, Javier; Hershey, John R.; Susskind, Josh, "Realtime video tracking using convolution HMMs", Proc. Conf. Computer Vision and Pattern Recognition, 2004.
      BibTeX
      • @Inproceedings{Movellan2004a,
      • author = {Movellan, Javier and Hershey, John R. and Susskind, Josh},
      • title = {Realtime video tracking using convolution HMMs},
      • booktitle = {Proc. Conf. Computer Vision and Pattern Recognition},
      • year = 2004,
      • address = {Washington DC}
      • }
    •  Susskind, Josh; Hershey, John R.; Movellan, Javier R., "Exact inference in robots using topographical uncertainty maps", Proceedings of the Second International Conference on Development and Learning (ICDL), 2004.
      BibTeX
      • @Inproceedings{Susskind2004,
      • author = {Susskind, Josh and Hershey, John R. and Movellan, Javier R.},
      • title = {Exact inference in robots using topographical uncertainty maps},
      • booktitle = {Proceedings of the Second International Conference on Development and Learning (ICDL)},
      • year = 2004,
      • address = {The Salk Institute, San Diego}
      • }
    •  Kristiansson, Trausti T.; Hershey, John R., "High Resolution Signal Reconstruction", Proc. ASRU, 2003.
      BibTeX
      • @Inproceedings{Kristiansson2003,
      • author = {Kristiansson, Trausti T. and Hershey, John R.},
      • title = {High Resolution Signal Reconstruction},
      • booktitle = {Proc. ASRU},
      • year = 2003
      • }
    •  Marks, Tim K.; Roddey, J. Cooper; Hershey, John R.; Movellan, Javier R., "G-flow: A Generative Framework for Non-Rigid 3D Tracking", 10th Joint Symposium on Neural Computation, 2003.
      BibTeX
      • @Inproceedings{Marks2003,
      • author = {Marks, Tim K. and Roddey, J. Cooper and Hershey, John R. and Movellan, Javier R.},
      • title = {G-flow: A Generative Framework for Non-Rigid 3D Tracking},
      • booktitle = {10th Joint Symposium on Neural Computation},
      • year = 2003
      • }
    •  Movellan, Javier R.; Hershey, John R.; Marks, Tim K.; Roddey, J. Cooper, "GFlow: A Generative Model for Fast Tracking using 3D Deformable Models", DARPA Symposium on Human ID, 2003.
      BibTeX
      • @Inproceedings{Movellan2003,
      • author = {Movellan, Javier R. and Hershey, John R. and Marks, Tim K. and Roddey, J. Cooper},
      • title = {GFlow: A Generative Model for Fast Tracking using 3D Deformable Models},
      • booktitle = {DARPA Symposium on Human ID},
      • year = 2003,
      • address = {Washington DC}
      • }
    •  Hershey, J. R.; Casey, M., "Audio-visual sound separation via hidden Markov models" in Advances in Neural Information Processing Systems, vol. 2, pp. 1173-1180, MIT Press, 2002.
      BibTeX
      • @Incollection{Hershey2002,
      • author = {Hershey, J. R. and Casey, M.},
      • title = {Audio-visual sound separation via hidden Markov models},
      • booktitle = {Advances in Neural Information Processing Systems},
      • year = 2002,
      • volume = 2,
      • pages = {1173--1180},
      • publisher = {MIT Press}
      • }
    •  Bartlett, M. S.; Braathen, B.; Littlewort-Ford, G.; Hershey, J. R.; Fasel, I.; Marks, T.; Smith, E.; Sejnowski, T. J.; Movellan, J. R., "Automatic Analysis of of Spontaneous Facial Behavior: A Final Project Report," Tech. Rep. UCSD MPLab TR 2001.08, University of California, San Diego, 2001.
      BibTeX
      • @Techreport{Bartlett2001,
      • author = {Bartlett, M. S. and Braathen, B. and Littlewort-Ford, G. and Hershey, J. R. and Fasel, I. and Marks, T. and Smith, E. and Sejnowski, T. J. and Movellan, J. R.},
      • title = {Automatic Analysis of of Spontaneous Facial Behavior: A Final Project Report},
      • institution = {University of California, San Diego},
      • year = 2001,
      • number = {UCSD MPLab TR 2001.08}
      • }
    •  Fasel, Ian R; Hershey, John R; Movellan, Javier R, "Active Sampling in High Dimensions for Face Detection", Proceedings of the 8th Symposium on Neural Computation, 2001.
      BibTeX
      • @Inproceedings{Fasel2001,
      • author = {Fasel, Ian R and Hershey, John R and Movellan, Javier R},
      • title = {Active Sampling in High Dimensions for Face Detection},
      • booktitle = {Proceedings of the 8th Symposium on Neural Computation},
      • year = 2001,
      • address = {La Jolla, CA}
      • }
    •  Hershey, John R.; Casey, Michael, "Audio-Visual Sound Separation Via Hidden Markov Models.", Advances in Neural Information Processing Systems, Dietterich, Thomas Glen and Becker, Suzanna and Ghahramani, Zoubin, Eds., 2001, pp. 1173-1180.
      BibTeX
      • @Inproceedings{Hershey2001,
      • author = {Hershey, John R. and Casey, Michael},
      • title = {Audio-Visual Sound Separation Via Hidden Markov Models.},
      • booktitle = {Advances in Neural Information Processing Systems},
      • year = 2001,
      • editor = {Dietterich, Thomas Glen and Becker, Suzanna and Ghahramani, Zoubin},
      • pages = {1173--1180},
      • address = {Cambridge, Massachusetts},
      • publisher = {MIT Press}
      • }
    •  Gorodnitsky, Irina; Hershey, John R, "A low-level Cortical perception model with applications to image analysis", IEEE International Conference on Image Processing (ICIP), 2000.
      BibTeX
      • @Inproceedings{Gorodnitsky2000,
      • author = {Gorodnitsky, Irina and Hershey, John R},
      • title = {A low-level Cortical perception model with applications to image analysis},
      • booktitle = {IEEE International Conference on Image Processing (ICIP)},
      • year = 2000
      • }
    •  Hershey, John R.; Movellan, Javier R., "Audio Vision: Using Audio-Visual Synchrony to Locate Sounds", Advances in Neural Information Processing Systems, Solla, S.A. and Leen, T.K. and Muller, K.-R., Eds., 2000, pp. 813-819.
      BibTeX
      • @Inproceedings{Hershey2000,
      • author = {Hershey, John R. and Movellan, Javier R.},
      • title = {Audio Vision: Using Audio-Visual Synchrony to Locate Sounds},
      • booktitle = {Advances in Neural Information Processing Systems},
      • year = 2000,
      • editor = {Solla, S.A. and Leen, T.K. and Muller, K.-R.},
      • pages = {813--819},
      • publisher = {MIT Press}
      • }
    •  Hershey, John R.; Movellan, Javier R., "Audio-vision: seeing sounds", Proceedings of the 6th Annual Joint Symposium on Neural Computation, 1999.
      BibTeX
      • @Inproceedings{Hershey1999,
      • author = {Hershey, John R. and Movellan, Javier R.},
      • title = {Audio-vision: seeing sounds},
      • booktitle = {Proceedings of the 6th Annual Joint Symposium on Neural Computation},
      • year = 1999
      • }
    •  Hershey, John R.; Movellan, Javier R., "Looking for Sounds: Using Audio-Visual Mutual Information to Locate Sound Sources", Proceedings of the 5th Annual Joint Symposium on Neural Computation, 1998.
      BibTeX
      • @Inproceedings{Hershey1998,
      • author = {Hershey, John R. and Movellan, Javier R.},
      • title = {Looking for Sounds: Using Audio-Visual Mutual Information to Locate Sound Sources},
      • booktitle = {Proceedings of the 5th Annual Joint Symposium on Neural Computation},
      • year = 1998,
      • publisher = {Institute for Neural Computation}
      • }
  • Free Downloads

  • Videos

  • MERL Issued Patents

    • Title: "Flat-Panel Acoustic Apparatus"
      Inventors: Le Roux, Jonathan; Hershey, John R.; Yerazunis, William S.; Boufounos, Petros T.; Daudet, Laurent
      Patent No.: 9,661,414
      Issue Date: May 23, 2017
    • Title: "Method for Processing Speech Signals Using an Ensemble of Speech Enhancement Procedures"
      Inventors: Le Roux, Jonathan; Watanabe, Shinji; Hershey, John R.
      Patent No.: 9,601,130
      Issue Date: Mar 21, 2017
    • Title: "Neural Networks for Transforming Signals"
      Inventors: Hershey, John R.; Le Roux, Jonathan; Weninger, Felix
      Patent No.: 9,582,753
      Issue Date: Feb 28, 2017
    • Title: "Method and System for Detecting Events in an Acoustic Signal Subject to Cyclo-Stationary Noise"
      Inventors: Hershey, John R.; Potluru, Vamsi K.; Le Roux, Jonathan
      Patent No.: 9,477,895
      Issue Date: Oct 25, 2016
    • Title: "Actions Prediction for Hypothetical Driving Conditions"
      Inventors: Harsham, Bret A.; Hershey, John R.; Le Roux, Jonathan; Nikovski, Daniel N.; Esenther, Alan W.
      Patent No.: 9,434,389
      Issue Date: Sep 6, 2016
    • Title: "Method for Distinguishing Components of an Acoustic Signal"
      Inventors: Hershey, John R.; Le Roux, Jonathan; Watanabe, Shinji; Chen, Zhuo
      Patent No.: 9,638,110
      Issue Date: Jun 14, 2016
    • Title: "Denoising Noisy Speech Signals using Probabilistic Model"
      Inventors: Le Roux, Jonathan; Hershey, John R.; Simsekli, Umut
      Patent No.: 9,324,338
      Issue Date: Apr 26, 2016
    • Title: "Method and System for Autonomously Delivering Information to Drivers"
      Inventors: Nikovski, Daniel N.; Harsham, Bret A.; Hershey, John R.; Brinkman, Dirk
      Patent No.: 9,305,306
      Issue Date: Apr 5, 2016
    • Title: "Method and Apparatus for Processing Text with Variations in Vocabulary Usage"
      Inventors: Hershey, John R.; Le Roux, Jonathan; Heakulani, Creighton K.
      Patent No.: 9,251,250
      Issue Date: Feb 2, 2016
    • Title: "Method for Localizing Sources of Signals in Reverberant Environments Using Sparse Optimization"
      Inventors: Boufounos, Petros T.; Le Roux, Jonathan; Kang, Kang; Hershey, John R.
      Patent No.: 9,251,436
      Issue Date: Feb 2, 2016
    • Title: "Determining Word Sequence Constraints for Low Cognitive Speech Recognition"
      Inventors: Harsham, Bret A.; Hershey, John R.
      Patent No.: 9,196,246
      Issue Date: Nov 24, 2015
    • Title: "Method and System for Dynamically Adapting user Interfaces in Vehicle Navigation Systems to Minimize Interaction Complexity"
      Inventors: Nikovski, Daniel N.; Hershey, John R.; Harsham, Bret A.; Le Roux, Jonathan
      Patent No.: 9,170,119
      Issue Date: Oct 27, 2015
    • Title: "System and Method for Recognizing Speech"
      Inventors: Harsham, Bret A.; Hershey, John R.
      Patent No.: 9,159,317
      Issue Date: Oct 13, 2015
    • Title: "Method of Text Classification Using Discriminative Topic Transformation"
      Inventors: Hershey, John R.; Le Roux, Jonathan
      Patent No.: 9,069,798
      Issue Date: Jun 30, 2015
    • Title: "Method and System for Reducing Interference and Noise in Speech Signals"
      Inventors: Hershey, John R.; Yu, Meng
      Patent No.: 9,048,942
      Issue Date: Jun 2, 2015
    • Title: "Indirect Model-Based Speech Enhancement"
      Inventors: Hershey, John R.; Le Roux, Jonathan
      Patent No.: 8,880,393
      Issue Date: Nov 4, 2014
    • Title: "Method and System for Registering an Object with a Probe Using Entropy-Based Motion Selection and Rao-Blackwellized Particle Filtering"
      Inventors: Taguchi, Yuichi; Marks, Tim; Hershey, John R.
      Patent No.: 8,510,078
      Issue Date: Aug 13, 2013
    See All Patents for MERL