TR2004-091

The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It


    •  Yerazunis, W.S., "The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It", MIT Spam Conference, January 2004.
      BibTeX TR2004-091 PDF
      • @inproceedings{Yerazunis2004jan,
      • author = {Yerazunis, W.S.},
      • title = {The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It},
      • booktitle = {MIT Spam Conference},
      • year = 2004,
      • month = jan,
      • url = {https://www.merl.com/publications/TR2004-091}
      • }
  • MERL Contact:
Abstract:

Bayesian filters have now become the standard for spam filtering; unfortunately most Bayesian filters seem to reach a plateau of accuracy at 99.9 percent. We experimentally compare the training methods TEFT, TOE, and TUNE, as well as pure Bayesian, token-bag, token-sequence, SBPH, and Markovian ddiscriminators. The results deomonstrate that TUNE is indeed best for training, but computationally exorbitant, and that Markovian discrimination is considerably more accurate than Bayesian, but not sufficient to reach four-nines accuracy, and that other techniques such as inoculation are needed.

 

  • Related News & Events