Mining Features for Sequence Classification

    •  Neal Lesh, Mohammed J. Zaki, Mitsunori Ogihara, "Mining Features for Sequence Classification", Tech. Rep. TR98-22, Mitsubishi Electric Research Laboratories, Cambridge, MA, December 1998.
      BibTeX TR98-22 PDF
      • @techreport{MERL_TR98-22,
      • author = {Neal Lesh, Mohammed J. Zaki, Mitsunori Ogihara},
      • title = {Mining Features for Sequence Classification},
      • institution = {MERL - Mitsubishi Electric Research Laboratories},
      • address = {Cambridge, MA 02139},
      • number = {TR98-22},
      • month = dec,
      • year = 1998,
      • url = {}
      • }
  • Research Areas:

    Data Analytics, Optimization


Classification algorithms are difficult to apply to sequential examples, such as plan executions or text, because there is a vast number of potentially useful features for describing each example. Past work on feature selection has focused on searching the space of all subsets of the available features which is intractable for large feature sets. We adapt data mining techniques to act as a preprocessor to select features for standard classification algorithms such as Naive Bayes and Winnow. We apply our algorithm to the task of predicting whether or not a plan will succeed or fail, during plan execution. The features produced by our algorithm improve classification accuracy by 10-50 percent in our experiments.