Mitsubishi Electric Research Laboratories

The Spam-Filtering Accuracy Plateau at 99.9 percent Accuracy and How to Get Past It

Citation: Yerazunis, W.S., "The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It", MIT Spam Conference, January 2004

Date:
January 2004

MERL Contact: William Yerazunis

Abstract: Bayesian filters have now become the standard for spam filtering; unfortunately most Bayesian filters seem to reach a plateau of accuracy at 99.9 percent. We experimentally compare the training methods TEFT, TOE, and TUNE, as well as pure Bayesian, token-bag, token-sequence, SBPH, and Markovian ddiscriminators. The results deomonstrate that TUNE is indeed best for training, but computationally exorbitant, and that Markovian discrimination is considerably more accurate than Bayesian, but not sufficient to reach four-nines accuracy, and that other techniques such as inoculation are needed.


 Read the full technical report (PDF: 92.9 kB)