TR2005-162

CRM114 versus Mr. X: CRM114 Notes for the TREC 2005 Spam Track


    •  Assis, F., Yerazunis, W., Siefkes, C., Chhabra, S., "CRM114 versus Mr. X: CRM114 Notes for the TREC 2005 Spam Track", NIST Text REtrieval Conference (TREC), November 2005.
      BibTeX TR2005-162 PDF
      • @inproceedings{Assis2005nov,
      • author = {Assis, F. and Yerazunis, W. and Siefkes, C. and Chhabra, S.},
      • title = {CRM114 versus Mr. X: CRM114 Notes for the TREC 2005 Spam Track},
      • booktitle = {NIST Text REtrieval Conference (TREC)},
      • year = 2005,
      • month = nov,
      • url = {https://www.merl.com/publications/TR2005-162}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Data Analytics

This paper discusses the design decisions underlying the CRM114 Discriminator software, how it can be configured as a spam filter, and what we may glean from the preliminary TREC 2005 results. Unlike most other filters, CRM114 is not a fixed-purpose antispam filter; rather, it's a general purpose language meant to expedite the creation of text filters. The pluggable CRM114 architecture allows rapid prototyping and easy support of multiple classifier engines; rather than testing different cutoff parameters, the CRM114 TREC test set tested different classifier algorithms and learning protocols.

 

  • Related News & Events

    •  NEWS   TREC 2005: publication by William Yerazunis and others
      Date: November 15, 2005
      Where: NIST Text REtrieval Conference (TREC)
      MERL Contact: William Yerazunis
      Research Area: Data Analytics
      Brief
      • The paper "CRM114 versus Mr. X: CRM114 Notes for the TREC 2005 Spam Track" by Assis, F., Yerazunis, W., Siefkes, C. and Chhabra, S. was presented at the NIST Text REtrieval Conference (TREC)
    •  
    •  NEWS   MERL researcher's spam filter finds automobile safety defects at NHTSA
      Date: June 25, 2015
      MERL Contact: William Yerazunis
      Research Area: Data Analytics
      Brief
      • The CRM114 Discriminator, an open-source spam filter / text classifier created by William Yerazunis in MERL's Data Analytics group, continues to turn up in interesting places - and apparently one of them is in the US Department of Transportation's process for analysis of car safety defect reports.

        Although CRM114 is usually used as a spam filter, CRM114 has been used to analyze resumes for jobseekers, scanning outgoing emails to detect accidental confidential information leaks, perusing blogs for relevance, scanning syslog files for interesting events, and now, apparently, searching complaints sent to NHTSA to find safety-related vehicle malfunctions.
    •