Mitsubishi Electric Research Laboratories

Context-Sensitive Spelling Correction

Context-sensitive spelling correction is the task of fixing spelling mistakes that happen to result in a valid word -- a peace of cake, not to difficult, computer flies, etc. Such mistakes account for 25-50% of observed spelling errors, yet they go undetected by conventional spell checkers, which check a word merely by seeing whether it is in a dictionary. A completely different technology is needed here: one that analyzes the context to decide whether the observed word is plausible in that context, or whether some other word is more likely to have been intended. We have developed two main methods for this task -- one based on Bayesian classifiers, and the other on the Winnow algorithm. Both methods have achieved predictive accuracies in the 90-100% range (depending on the spelling error); the accuracies of the Winnow-based method are the highest reported in the literature.

Background & Objective:  Conventional spell checkers can only detect spelling errors that result in non-words, such as teh; however, 25-50% of observed spelling errors result in valid, but unintended words, and are thus undetectable by conventional means. Our objective is to develop methods that detect and correct such errors, leading to a new generation of spelling correctors with fundamentally better coverage.

Technical Discussion:  We model the problem in terms of confusion sets, which represent sets of words that users tend to confuse with one another, such as {desert, dessert}. Context-sensitive spelling correction can then be cast as a classification task: given an occurrence of, say, dessert, use the context to decide which word in its confusion set, desert or dessert, was actually intended.      In a system comparison, the Bayesian and Winnow-based methods were both found to outperform other methods in the literature, with the Winnow-based method achieving the highest performance of any algorithm tested. In addition, a variant of the Bayesian method was found to substantially outperform Microsoft Word (version 7.0).

Contact:  Joseph Katz

Technical Reports:
TR1998-007aA Winnow-Based Approach to Context-Sensitive Spelling Correction
TR1996-007 Applying Winnow to Context-Sensitive Spelling Correction
TR1996-003aCombining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction
TR1995-013 A Bayesian Hybrid Method for Context-Sensitive Spelling Correction

Technology Area:  Artificial Intelligence

Modification Date:  September 12, 2007