TR94-13

Tree Insertion Grammar: A Cubic-Time Parsable Formalism That Lexicalizes Context-Free Grammar without Changing the Trees Produced


    •  Schabes, Y.; Waters, R.C., "Tree Insertion Grammar: A Cubic-time Parsable Formalism That Lexicalizes Context-Free Grammar Without Changing the Trees Produced", Computational Linguistics, Vol. 21, No. 4, pp. 479-513, December 1995.
      BibTeX Download PDF
      • @article{Schabes1995dec,
      • author = {Schabes, Y. and Waters, R.C.},
      • title = {Tree Insertion Grammar: A Cubic-time Parsable Formalism That Lexicalizes Context-Free Grammar Without Changing the Trees Produced},
      • journal = {Computational Linguistics},
      • year = 1995,
      • volume = 21,
      • number = 4,
      • pages = {479--513},
      • month = dec,
      • url = {http://www.merl.com/publications/TR94-13}
      • }
  • MERL Contact:

Tree insertion grammar (TIG) is a tree-based formalism that makes use of tree substitution and tree adjunction. TIG is related to tree adjoining grammar. However, the adjunction permitted in TIG is sufficiently restricted that TIGs only derive context free languages and TIGs have the same cubic-time worst-case complexity bounds for recognition and parsing as context free grammars. An efficient Earley-style parser for TIGs is presented. Any context free grammar (CFG) can be converted into a lexicalized tree insertion grammar (LTIG) that generates the same trees. A constructive procedure is presented for converting a CFG into a left anchored (i.e., word initial) LTIG that preserves ambiguity and generates the same trees. The LTIG created can be represented very compactly by taking advantage of sharing between the elementary trees in it. Other methods for converting CFGs into a left anchored form, e.g., the methods of Greibach and Rosenkrantz, do not preserve the trees produced and result in very large output grammars. For the purpose of experimental evaluation, the LTIG lexicalization procedure was applied to eight different CFGs for subsets of English. The LTIGs created were smaller than the original CFGs. Using an implementation of the Earley-style TIG parser that was specialized for left anchored LTIGs, it was possible to parse more quickly with these LTIGs than with the original CFGs.