A Large Scale Ontology Matching Tool Based On A Statistical Predictive Model
Abstract
Ontologies have become more pervasive in Computer Science and especially in the semantic web. They provide the consensual formal vocabulary to be shared between applications. Through ontologies, new generations of semantic applications such as semantic search, semantic portal, intelligent advisory systems, semantic middleware and semantic software engineering techniques have been developed. However, due to the decentralized nature of ontologies development, ontologies within a given domain naturally become heterogeneous hence limiting semantic integration of different applications. In this thesis, we implement a statistically based ontology matching system which effectively aligns two large heterogeneous ontologies. We integrate techniques in the ontology matching tool such that the space and time complexities challenge associated with large ontologies matching can be effectively minimized. By surveying the existing ontology matching tools and approaches, we identified a number of research gaps which limit their effectiveness during large ontology matching. Key among the challenges identified is the lack of adequate techniques to address the high space and time complexities associated with matching large ontologies. In order to reduce space complexity, this thesis implements an ontology partitioning technique using spectral clustering. To address the challenge of time complexity, we use the Single Instruction Multiple Data parallelization technique (SIMD). We finally implement a tag based alignment repair technique to ensure high quality mappings. In summary this thesis implements techniques that reduce the space, time complexity and ensure quality of entity mappings of two large ontologies.