This paper introduces Japanese end-to-end ASR system based on a joint CTC/attention scheme , which is an extension of attention-based ASR  by using multi-task learning to incorporate the Connectionist Temporal Classification (CTC) objective. Unlike the conventional Japanese ASR systems based on DNN/HMM hybrid  or end-to-end systems with Japanese syllable characters (i.e., hiragana or katakana) , this method directly predicts a Japanese sentence based on a standard Japanese character set including Kanji, hiragana, and katakana characters, Roman/Greek alphabets, Arabic numbers, and so on. Thus, the method does not use any pronunciation dictionary, which requires hand-crafted work by human. In addition, since it's based on character based recognition, it does not require a morphological analyzer to chunk a character sequence to a word sequence. Finally, attention mechanism itself holds a language-model-like function in the decoder network, unlike a Japanese end-to-end system based on CTC . Therefore, it does not require a separate language model module, which makes system construction and decoding process very simple.