TR2012-050

Bayesian Networks for Matcher Composition in Automatic Schema Matching


    •  Nikovski, D.; Esenther, A.; Ye, X.; Shiba, M.; Takayama, S., "Bayesian Networks for Matcher Composition in Automatic Schema Matching", International Conference on Enterprise Information Systems (ICEIS), June 2012, vol. 1, pp. 48-55.
      BibTeX Download PDF
      • @inproceedings{Nikovski2012jun,
      • author = {Nikovski, D. and Esenther, A. and Ye, X. and Shiba, M. and Takayama, S.},
      • title = {Bayesian Networks for Matcher Composition in Automatic Schema Matching},
      • booktitle = {International Conference on Enterprise Information Systems (ICEIS)},
      • year = 2012,
      • volume = 1,
      • pages = {48--55},
      • month = jun,
      • url = {http://www.merl.com/publications/TR2012-050}
      • }
  • MERL Contact:
  • Research Area:

    Data Analytics


TR Image
Figure 2: Pair-wise correlations between all pairs of basic matchers, numbered as follows:

1: Edit Distance; 2: Substring Distance; 3: Bi-Gram Distance; 4: Tri-Gram Distance; 5: Quad-Gram Distance; 6: Cosine Similarity; 7: Hamming Distance; 8: Jaro Measure; 9: Affix Name; 10: Prefix Name; 11: Suffix Name; 12: Path Name; 13: Synonym.

We propose a method for accurate combining of evidence supplied by multiple individual matchers regarding whether two data schema elements match (refer to the same object or concept), or not, in the field of automatic schema matching. The method uses a Bayesian network to model correctly the statistical correlations between the similarity values produced by individual matchers that use the same or similar information, in order to avoid overconfidence in match probability estimates and improve the accuracy of matching. Experimental results under several testing protocols suggest that the matching accuracy of the Bayesian composite matcher can significantly exceed that of the individual component matchers.