Sameer Khurana

Sameer Khurana
  • Biography

    Sameer's research interests include multimodal, transfer and self-supervised learning applied to speech and audio domains. He conducted his Ph.D. research in the Spoken Language Systems Lab at MIT Computer Science and AI Lab (CSAIL), where he developed transfer learning methods for spoken language processing applications.

  • Recent News & Events

    •  TALK    [MERL Seminar Series 2024] Greta Tuckute presents talk titled Computational models of human auditory and language processing
      Date & Time: Wednesday, January 31, 2024; 12:00 PM
      Speaker: Greta Tuckute, MIT
      MERL Host: Sameer Khurana
      Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
      Abstract
      • Advances in machine learning have led to powerful models for audio and language, proficient in tasks like speech recognition and fluent language generation. Beyond their immense utility in engineering applications, these models offer valuable tools for cognitive science and neuroscience. In this talk, I will demonstrate how these artificial neural network models can be used to understand how the human brain processes language. The first part of the talk will cover how audio neural networks serve as computational accounts for brain activity in the auditory cortex. The second part will focus on the use of large language models, such as those in the GPT family, to non-invasively control brain activity in the human language system.
    •  
  • Awards

    •  AWARD    MERL team wins the Audio-Visual Speech Enhancement (AVSE) 2023 Challenge
      Date: December 16, 2023
      Awarded to: Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux
      MERL Contacts: François Germain; Chiori Hori; Sameer Khurana; Jonathan Le Roux; Zexu Pan; Gordon Wichern
      Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
      Brief
      • MERL's Speech & Audio team ranked 1st out of 12 teams in the 2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSE). The team was led by Zexu Pan, and also included Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux.

        The AVSE challenge aims to design better speech enhancement systems by harnessing the visual aspects of speech (such as lip movements and gestures) in a manner similar to the brain’s multi-modal integration strategies. MERL’s system was a scenario-aware audio-visual TF-GridNet, that incorporates the face recording of a target speaker as a conditioning factor and also recognizes whether the predominant interference signal is speech or background noise. In addition to outperforming all competing systems in terms of objective metrics by a wide margin, in a listening test, MERL’s model achieved the best overall word intelligibility score of 84.54%, compared to 57.56% for the baseline and 80.41% for the next best team. The Fisher’s least significant difference (LSD) was 2.14%, indicating that our model offered statistically significant speech intelligibility improvements compared to all other systems.
    •  
    See All Awards for MERL
  • Research Highlights

  • MERL Publications

    •  Pan, Z., Wichern, G., Masuyama, Y., Germain, F.G., Khurana, S., Hori, C., Le Roux, J., "Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction", IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), DOI: 10.1109/​ASRU57964.2023.10389618, December 2023.
      BibTeX TR2023-152 PDF
      • @inproceedings{Pan2023dec2,
      • author = {Pan, Zexu and Wichern, Gordon and Masuyama, Yoshiki and Germain, François G and Khurana, Sameer and Hori, Chiori and Le Roux, Jonathan},
      • title = {Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction},
      • booktitle = {IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)},
      • year = 2023,
      • month = dec,
      • doi = {10.1109/ASRU57964.2023.10389618},
      • isbn = {979-8-3503-0689-7},
      • url = {https://www.merl.com/publications/TR2023-152}
      • }
    •  Pan, Z., Wichern, G., Germain, F.G., Khurana, S., Le Roux, J., "NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection", arXiv, December 2023.
      BibTeX arXiv
      • @article{Pan2023dec,
      • author = {Pan, Zexu and Wichern, Gordon and Germain, François G and Khurana, Sameer and Le Roux, Jonathan},
      • title = {NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection},
      • journal = {arXiv},
      • year = 2023,
      • month = dec,
      • url = {https://arxiv.org/abs/2312.07513}
      • }
    •  Bralios, D., Wichern, G., Germain, F.G., Pan, Z., Khurana, S., Hori, C., Le Roux, J., "Generation or Replication: Auscultating Audio Latent Diffusion Models", arXiv, October 2023.
      BibTeX arXiv
      • @article{Bralios2023oct,
      • author = {Bralios, Dimitrios and Wichern, Gordon and Germain, François G and Pan, Zexu and Khurana, Sameer and Hori, Chiori and Le Roux, Jonathan},
      • title = {Generation or Replication: Auscultating Audio Latent Diffusion Models},
      • journal = {arXiv},
      • year = 2023,
      • month = oct,
      • url = {https://arxiv.org/abs/2310.10604}
      • }
    •  Khurana, S., Moritz, N., Hori, T., Le Roux, J., "Unsupervised Domain Adaptation For Speech Recognition via Uncertainty Driven Self-Training", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/​ICASSP39728.2021.9414299, June 2021, pp. 6553-6557.
      BibTeX TR2021-039 PDF
      • @inproceedings{Khurana2021jun,
      • author = {Khurana, Sameer and Moritz, Niko and Hori, Takaaki and Le Roux, Jonathan},
      • title = {Unsupervised Domain Adaptation For Speech Recognition via Uncertainty Driven Self-Training},
      • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      • year = 2021,
      • pages = {6553--6557},
      • month = jun,
      • doi = {10.1109/ICASSP39728.2021.9414299},
      • url = {https://www.merl.com/publications/TR2021-039}
      • }
  • Other Publications

    •  Yuan Gong, Sameer Khurana, Leonid Karlinsky and James Glass, "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers", Interspeech 2023, 2023.
      BibTeX
      • @Article{gong2023whisper,
      • author = {Gong, Yuan and Khurana, Sameer and Karlinsky, Leonid and Glass, James},
      • title = {Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers},
      • journal = {Interspeech 2023},
      • year = 2023
      • }
    •  Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote and James Glass, "Improved Cross-Lingual Transfer Learning For Automatic Speech Translation", Preprint 2023, 2023.
      BibTeX
      • @Article{khurana2023improved,
      • author = {Khurana, Sameer and Dawalatabad, Nauman and Laurent, Antoine and Vicente, Luis and Gimeno, Pablo and Mingote, Victoria and Glass, James},
      • title = {Improved Cross-Lingual Transfer Learning For Automatic Speech Translation},
      • journal = {Preprint 2023},
      • year = 2023
      • }
    •  Sameer Khurana, "Transfer Learning For Spoken Language Processing", 2023, Massachusetts Institute of Technology.
      BibTeX
      • @Phdthesis{khurana2023transfer,
      • author = {Khurana, Sameer},
      • title = {Transfer Learning For Spoken Language Processing},
      • school = {Massachusetts Institute of Technology},
      • year = 2023
      • }
    •  Antoine Laurent, Souhir Gahbiche, Ha Nguyen, Haroun Elleuch, Fethi Bougares, Antoine Thiol, Hugo Riguidel, Salima Mdhaffar, Gaëlle Laperrière, Lucas Maison and others, "ON-TRAC consortium systems for the IWSLT 2023 dialectal and low-resource speech translation tasks", IWSLT 2023, 2023.
      BibTeX
      • @Inproceedings{laurent2023trac,
      • author = {Laurent, Antoine and Gahbiche, Souhir and Nguyen, Ha and Elleuch, Haroun and Bougares, Fethi and Thiol, Antoine and Riguidel, Hugo and Mdhaffar, Salima and Laperri{\`e}re, Ga{\"e}lle and Maison, Lucas and others},
      • title = {ON-TRAC consortium systems for the IWSLT 2023 dialectal and low-resource speech translation tasks},
      • booktitle = {IWSLT 2023},
      • year = 2023
      • }
    •  Victoria Mingote, Pablo Gimeno, Luis Vicente, Sameer Khurana, Antoine Laurent and Jarod Duret, "Direct Text to Speech Translation System Using Acoustic Units", IEEE Signal Processing Letters 2023, 2023.
      BibTeX
      • @Article{mingote2023direct,
      • author = {Mingote, Victoria and Gimeno, Pablo and Vicente, Luis and Khurana, Sameer and Laurent, Antoine and Duret, Jarod},
      • title = {Direct Text to Speech Translation System Using Acoustic Units},
      • journal = {IEEE Signal Processing Letters 2023},
      • year = 2023,
      • publisher = {IEEE}
      • }
    •  Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury and James Glass, "Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages", Interspeech 2023, 2023.
      BibTeX
      • @Article{rouditchenko2023comparison,
      • author = {Rouditchenko, Andrew and Khurana, Sameer and Thomas, Samuel and Feris, Rogerio and Karlinsky, Leonid and Kuehne, Hilde and Harwath, David and Kingsbury, Brian and Glass, James},
      • title = {Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages},
      • journal = {Interspeech 2023},
      • year = 2023
      • }
    •  Nauman Dawalatabad, Yuan Gong, Sameer Khurana, Rhoda Au and James Glass, "Detecting Dementia from Long Neuropsychological Interviews", EMNLP 2022, 2022.
      BibTeX
      • @Inproceedings{dawalatabad2022detecting,
      • author = {Dawalatabad, Nauman and Gong, Yuan and Khurana, Sameer and Au, Rhoda and Glass, James},
      • title = {Detecting Dementia from Long Neuropsychological Interviews},
      • booktitle = {EMNLP 2022},
      • year = 2022
      • }
    •  Nauman Dawalatabad, Sameer Khurana, Antoine Laurent and James Glass, "On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration", ICASSP 2023, 2022.
      BibTeX
      • @Article{dawalatabad2022unsupervised,
      • author = {Dawalatabad, Nauman and Khurana, Sameer and Laurent, Antoine and Glass, James},
      • title = {On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration},
      • journal = {ICASSP 2023},
      • year = 2022
      • }
    •  Yuan Gong, Sameer Khurana, Andrew Rouditchenko and James Glass, "CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification", Preprint 2022, 2022.
      BibTeX
      • @Article{gong2022cmkd,
      • author = {Gong, Yuan and Khurana, Sameer and Rouditchenko, Andrew and Glass, James},
      • title = {CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification},
      • journal = {Preprint 2022},
      • year = 2022
      • }
    •  Sameer Khurana, Antoine Laurent and James Glass, "Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0", ICASSP 2022, 2022.
      BibTeX
      • @Inproceedings{khurana2022magic,
      • author = {Khurana, Sameer and Laurent, Antoine and Glass, James},
      • title = {Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0},
      • booktitle = {ICASSP 2022},
      • year = 2022,
      • organization = {IEEE}
      • }
    •  Sameer Khurana, Antoine Laurent and James Glass, "SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation", IEEE Journal of Selected Topics in Signal Processing 2022, 2022.
      BibTeX
      • @Article{khurana2022samu,
      • author = {Khurana, Sameer and Laurent, Antoine and Glass, James},
      • title = {SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation},
      • journal = {IEEE Journal of Selected Topics in Signal Processing 2022},
      • year = 2022
      • }
    •  Anthony Larcher, Yannick Estève, Mickael Rouvier, Natalia Tomashenko, Jarod Duret, Gaelle Laperriere, Santosh Kesijaru, Marek Sarvas, Renata Kohlova, Henry Li and others, "Multi-lingual Speech to Speech Translation for Under-Resourced Languages", 2022.
      BibTeX
      • @Inproceedings{larcher2022multi,
      • author = {Larcher, Anthony and Est{\`e}ve, Yannick and Rouvier, Mickael and Tomashenko, Natalia and Duret, Jarod and Laperriere, Gaelle and Kesijaru, Santosh and Sarvas, Marek and Kohlova, Renata and Li, Henry and others},
      • title = {Multi-lingual Speech to Speech Translation for Under-Resourced Languages},
      • year = 2022,
      • organization = {Jelinek Summer Workshop on Speech and Language Technology 2022}
      • }
    •  Sameer Khurana, Niko Moritz, Takaaki Hori and Jonathan Le Roux, "Unsupervised domain adaptation for speech recognition via uncertainty driven self-training", ICASSP 2021, 2021.
      BibTeX
      • @Inproceedings{khurana2021unsupervised,
      • author = {Khurana, Sameer and Moritz, Niko and Hori, Takaaki and Le Roux, Jonathan},
      • title = {Unsupervised domain adaptation for speech recognition via uncertainty driven self-training},
      • booktitle = {ICASSP 2021},
      • year = 2021,
      • organization = {IEEE}
      • }
    •  Cheng-I Jeff Lai, Yang Zhang, Alexander H Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox and Jim Glass, "Parp: Prune, adjust and re-prune for self-supervised speech recognition", NeurIPS 2021, 2021.
      BibTeX
      • @Article{lai2021parp,
      • author = {Lai, Cheng-I Jeff and Zhang, Yang and Liu, Alexander H and Chang, Shiyu and Liao, Yi-Lun and Chuang, Yung-Sung and Qian, Kaizhi and Khurana, Sameer and Cox, David and Glass, Jim},
      • title = {Parp: Prune, adjust and re-prune for self-supervised speech recognition},
      • journal = {NeurIPS 2021},
      • year = 2021
      • }
    •  Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer and James Glass, "A convolutional deep markov model for unsupervised speech representation learning", Interspeech 2020, 2020.
      BibTeX
      • @Article{khurana2020convolutional,
      • author = {Khurana, Sameer and Laurent, Antoine and Hsu, Wei-Ning and Chorowski, Jan and Lancucki, Adrian and Marxer, Ricard and Glass, James},
      • title = {A convolutional deep markov model for unsupervised speech representation learning},
      • journal = {Interspeech 2020},
      • year = 2020
      • }
    •  Sameer Khurana, Antoine Laurent and James Glass, "Cstnet: Contrastive speech translation network for self-supervised speech representation learning", Preprint, 2020.
      BibTeX
      • @Article{khurana2020cstnet,
      • author = {Khurana, Sameer and Laurent, Antoine and Glass, James},
      • title = {Cstnet: Contrastive speech translation network for self-supervised speech representation learning},
      • journal = {Preprint},
      • year = 2020
      • }
    •  Adrian Łańcucki, Jan Chorowski, Guillaume Sanchez, Ricard Marxer, Nanxin Chen, Hans JGA Dolfing, Sameer Khurana, Tanel Alumäe and Antoine Laurent, "Robust training of vector quantized bottleneck models", IJCNN 2020, 2020.
      BibTeX
      • @Inproceedings{lancucki2020robust,
      • author = {{\L}a{\'n}cucki, Adrian and Chorowski, Jan and Sanchez, Guillaume and Marxer, Ricard and Chen, Nanxin and Dolfing, Hans JGA and Khurana, Sameer and Alum{\"a}e, Tanel and Laurent, Antoine},
      • title = {Robust training of vector quantized bottleneck models},
      • booktitle = {IJCNN 2020},
      • year = 2020,
      • organization = {IEEE}
      • }
    •  Sameer Khurana, Ahmed Ali and James Glass, "DARTS: Dialectal Arabic transcription system", Preprint, 2019.
      BibTeX
      • @Article{khurana2019darts,
      • author = {Khurana, Sameer and Ali, Ahmed and Glass, James},
      • title = {DARTS: Dialectal Arabic transcription system},
      • journal = {Preprint},
      • year = 2019
      • }
    •  Sameer Khurana, Shafiq Rayhan Joty, Ahmed Ali and James Glass, "A Factorial Deep Markov Model For Unsupervised Disentangled Representation Learning From Speech", ICASSP 2019, 2019.
      BibTeX
      • @Article{khurana2019factorial,
      • author = {Khurana, Sameer and Joty, Shafiq Rayhan and Ali, Ahmed and Glass, James},
      • title = {A Factorial Deep Markov Model For Unsupervised Disentangled Representation Learning From Speech},
      • journal = {ICASSP 2019},
      • year = 2019
      • }
    •  Sameer Khurana, Reda Rawi, Khalid Kunji, Gwo-Yu Chuang, Halima Bensmail, Raghvendra Mall and Alfonso Valencia, "DeepSol: A Deep Learning Framework for Sequence-Based Protein Solubility Prediction", Bioinformatics 2018, 2018.
      BibTeX
      • @Article{khurana2018deepsol,
      • author = {Khurana, Sameer and Rawi, Reda and Kunji, Khalid and Chuang, Gwo-Yu and Bensmail, Halima and Mall, Raghvendra and Valencia, Alfonso},
      • title = {DeepSol: A Deep Learning Framework for Sequence-Based Protein Solubility Prediction},
      • journal = {Bioinformatics 2018},
      • year = 2018
      • }
    •  Maryam Najafian, Sameer Khurana, Suwon Shan, Ahmed Ali and James Glass, "Exploiting convolutional neural networks for phonotactic based dialect identification", ICASSP 2018, 2018.
      BibTeX
      • @Inproceedings{najafian2018exploiting,
      • author = {Najafian, Maryam and Khurana, Sameer and Shan, Suwon and Ali, Ahmed and Glass, James},
      • title = {Exploiting convolutional neural networks for phonotactic based dialect identification},
      • booktitle = {ICASSP 2018},
      • year = 2018,
      • organization = {IEEE}
      • }