Mitsubishi Electric Research Laboratories

Deterministic Part-of-Speech Tagging with Finite State Transducers

Date:
May 1994

Authors: Emmanuel Roche, Yves Schabes

Where Published: Computational Linguistics, March 1995

Abstract: Stochastic approaches to natural language processing have often been preferred to rule-based approaches because of their robustness and their automatic training capabilities. This was the case for part-of-speech tagging until Brill showed how state of the art part-of-speech tagging can be achieved by inferring a rule-based part-of-speech tagger from a training corpus. However current implementations of Brill's tagger run more slowly than previous approaches. In this paper, we present a finite-state tagger inspired by Brill's work which operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to deterministically follow a single path in a deterministic finite state machine. This result is achieved by encoding the application of the rules found in Brill's tagger as a non-deterministic finite state transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a part-of-speech tagger whose speed is dominated by the access time of mass storage devices.


 Read the full technical report (PDF: 376.7 kB)