TR96-07

Applying Winnow to Context-Sensitive Spelling Correction

- Andrew R. Golding, Dan Roth, "Applying Winnow to Context-Sensitive Spelling Correction", Tech. Rep. TR96-07, Mitsubishi Electric Research Laboratories, Cambridge, MA, April 1996.
  BibTeX TR96-07 PDF
  - @techreport{MERL_TR96-07,
  - author = {Andrew R. Golding, Dan Roth},
  - title = {Applying Winnow to Context-Sensitive Spelling Correction},
  - institution = {MERL - Mitsubishi Electric Research Laboratories},
  - address = {Cambridge, MA 02139},
  - number = {TR96-07},
  - month = apr,
  - year = 1996,
  - url = {https://www.merl.com/publications/TR96-07/}
  - }

Abstract:

Multiplicative weight-updating algorithms such as Winnow have been studied extensively in the COLT literature, but only recently have people started to use them in applications. In this paper, we apply a Winnow-based algorithm to a task in natural language: context-sensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting to for too, casual for causal, and so on. Previous approaches to this problem have been statistics-based; we compare Winnow to one of the more successful such approaches, which uses Bayesian classifiers. We find that: (1) When the standard (heavily-pruned) set of features is used to describe problem instances, Winnow performs comparably to the Bayesian method; (2) When the full (unpruned) set of features is used, Winnow is able to exploit the new features and convincingly outperform Bayes; and (3) When a test set is encountered that is dissimilar to the training set, Winnow is better than Bayes at adapting to the unfamiliar test set, using a strategy we will present for combining learning on the training set with unsupervised learning on the (noisy) test set.