Date & Time:
Friday, June 3, 2016; 1:30PM - 3:00PM
Speech signals covey various kinds of information, which are grouped into two kinds, linguistic and extra-linguistic information. Many speech applications, however, focus on only a single aspect of speech. For example, speech recognizers try to extract only word identity from speech and speaker recognizers extract only speaker identity. Here, irrelevant features are often treated as hidden or latent by applying the probability theory to a large number of samples or the irrelevant features are normalized to have quasi-standard values. In speech analysis, however, phases are usually removed, not hidden or normalized, and pitch harmonics are also removed, not hidden or normalized. The resulting speech spectrum still contains both linguistic information and extra-linguistic information. Is there any good method to remove extra-linguistic information from the spectrum? In this talk, our answer to that question is introduced, called speech structure. Extra-linguistic variation can be modeled as feature space transformation and our speech structure is based on the transform-invariance of f-divergence. This proposal was inspired by findings in classical studies of structural phonology and recent studies of developmental psychology. Speech structure has been applied to accent clustering, speech recognition, and language identification. These applications are also explained in the talk.
Nobuaki Minematsu and Daisuke Saito
The University of Tokyo
Dr. Nobuaki Minematsu received his doctorate in Engineering in 1995 from the University of Tokyo. From 2002 to 2003, he was a visiting researcher at KTH, Stockholm. Currently, he is a full professor at the University of Tokyo. He has a wide range of interests in speech communication ranging from science to engineering. He has published more than 400 papers, including conference papers. They are related to speech analysis, speech perception, speech recognition, speech synthesis, dialogue systems, language learning systems, etc. He has received scientific/technical awards from RISP (2007,2013), JSAI (2007), ICIST (2014), O-COCOSDA (2014), PSJ (2014) and IEICE (2016). He has given tutorial/keynote talks on CALL (Computer-Aided Language Learning) at APSIPA2011, INTERSPEECH2012, O-COCOSDA2014, PAAL2014, EJHIB2015, and ISAPh2016. Since 2015, he has been serving as distinguished lecturer for APSIPA.
Daisuke Saito received the B.E., M.S., and Dr. Eng. degrees from the University of Tokyo, Tokyo, Japan, in 2006, 2008, and 2011, respectively. From 2010 to 2011, he was a Research Fellow (DC2) of the Japan Society for the Promotion of Science. He is currently an Assistant Professor in the Graduate School of Information Science and Technology, University of Tokyo. He is interested in various areas of speech engineering, including voice conversion, speech synthesis, acoustic analysis, speaker/language recognition, and speech recognition. Dr. Saito is a member of the International Speech Communication Association (ISCA), the Acoustical Society of Japan (ASJ), the Information Processing Society of Japan (IPSJ), the Institute of Electronics, Information and Communication Engineers (IEICE), and the Institute of Image Information and Television Engineers (ITE). He received the ISCA Award for the best student paper of INTERSPEECH 2011, the Awaya Award from the ASJ in 2012, and the Itakura Award from ASJ in 2014.