TR2010-065

An Automatic Training Data Collection Method for Confidential E-Mail Detection


In this paper, we propose an automatic method for operating a confidential e-mail detection system which uses machine learning and keyword search. The recent information explosion has increased the necessity of the technology which enables the detection of the confidential information in the electronic data. Using methods based on machine learning is one of the way for high accuracy. However, it is difficult to prepare a lot of correct training data manually, and this often becomes a problem for practice. We restrict our attention to e-mail, and present an automatic training data collecting method using the domain information. It allows the automatic operation of the confidential e-mail detection system. We also show the effectiveness of our method through the implementation and the evaluation for an e-mail archive system.

 

  • Related News & Events

    •  NEWS   DEIM 2010: publication by William S. Yerazunis and others
      Date: February 28, 2010
      Where: The Forum on Data Engineering and Information Management (DEIM)
      MERL Contact: William Yerazunis
      Research Area: Data Analytics
      Brief
      • The paper "An Automatic Training Data Collection Method for Confidential E-mail Detection" by Shibata, H., Kato, M., Kori, M. and Yerazunis, W. was presented at the The Forum on Data Engineering and Information Management (DEIM)
    •  
    •  AWARD   DEIM 2010 Best Paper Award
      Date: February 1, 2010
      Awarded to: Hideya Shibata, Mamoru Kato, Mitsunori Kori and William Yerazunis
      Awarded for: "An Automatic Training Data Collection Method for Confidential E-mail Detection"
      Awarded by: The Forum on Data Engineering and Information Management (DEIM)
      MERL Contact: William Yerazunis
      Research Area: Data Analytics
    •