TR2022-112

Momentum Pseudo-Labeling: Semi-Supervised ASR with Continuously Improving Pseudo-Labels


    •  Higuchi, Y., Moritz, N., Le Roux, J., Hori, T., "Momentum Pseudo-Labeling: Semi-Supervised ASR with Continuously Improving Pseudo-Labels", IEEE Journal of Selected Topics in Signal Processing, DOI: 10.1109/​JSTSP.2022.3195367, Vol. 16, No. 6, pp. 1424-1438, September 2022.
      BibTeX TR2022-112 PDF
      • @article{Higuchi2022sep,
      • author = {Higuchi, Yosuke and Moritz, Niko and Le Roux, Jonathan and Hori, Takaaki},
      • title = {Momentum Pseudo-Labeling: Semi-Supervised ASR with Continuously Improving Pseudo-Labels},
      • journal = {IEEE Journal of Selected Topics in Signal Processing},
      • year = 2022,
      • volume = 16,
      • number = 6,
      • pages = {1424--1438},
      • month = sep,
      • doi = {10.1109/JSTSP.2022.3195367},
      • issn = {1941-0484},
      • url = {https://www.merl.com/publications/TR2022-112}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

Safety and robustness are two desired properties for any reinforcement learning algorithm. Constrained Markov Decision Processes (CMDPs) can handle additional safety constraints and Robust Markov Decision Processes (RMDPs) can perform well under model uncertainties. In this chapter, we propose to unify these two frameworks resulting in Robust Constrained MDPs (RCMDPs). The motivation is to develop a framework that can satisfy safety constraints while also simultaneously offer robustness to model uncertainties. We develop the RCMDP objective, derive gradient update formula to optimize this objective and then propose policy gradient based algorithms. We also independently propose Lyapunov-based reward shaping for RCMDPs, yielding better stability and convergence properties.