3 research outputs found

    MPP-MLO: Multilevel Parallel Partitioning for Efficiently Matching Large Ontologies

    Get PDF
    221-229The growing usage of Semantic Web has resulted in an increasing number, size and heterogeneity of ontologies on the web. Therefore, the necessity of ontology matching techniques, which could solve these issues, is highly required. Due to high computational requirements, scalability is always a major concern in ontology matching system. In this work, a partition-based ontology matching system is proposed, which deals with parallel partitioning of the ontologies at multilevel. At first level, the root based ontology partitioning is proposed. Match able sub-ontology pair is generated using an efficient linguistic matcher (IEI-Sub) to uncover anchors and then based on maximum similarity values, pairs are generated. However, a distributed and parallel approach of Map Reduce-based IEI-sub process has been proposed to efficiently handle the anchor discovery process which is highly time-consuming. In second level partitioning, an efficient approach is proposed to form non-overlapping clusters. Extensive experimental evaluation is done by comparing existing approaches with the proposed approach, and the results shows that MPP-MLO turns out to be an efficient and scalable ontology matching system with 58.7% reduction in overall execution time

    MPP-MLO: Multilevel Parallel Partitioning for Efficiently Matching Large Ontologies

    Get PDF
    The growing usage of Semantic Web has resulted in an increasing number, size and heterogeneity of ontologies on the web. Therefore, the necessity of ontology matching techniques, which could solve these issues, is highly required. Due to high computational requirements, scalability is always a major concern in ontology matching system. In this work, a partition-based ontology matching system is proposed, which deals with parallel partitioning of the ontologies at multilevel. At first level, the root based ontology partitioning is proposed. Matchable Sub-ontologies pair is generated using an efficient linguistic matcher (IEI-Sub) to uncover anchors and then based on maximum similarity value, pairs are generated. However, a distributed and parallel approach of MapReduce-based SEI-sub process has been proposed to efficiently handle the anchor discovery process which is highly time-consuming. In second level partitioning, an efficient approach is proposed to form non overlapping clusters. Extensive experimental evaluation is done by comparing existing approaches with the proposed approach, and the results shows that MPP-MLO turns out to be an efficient and scalable ontology matching system

    CoPart: a context-based partitioning technique for big data

    Get PDF
    The MapReduce programming paradigm is frequently used in order to process and analyse a huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique, provided by systems like Hadoop or Spark, basically performs a random subdivision of the input records, without considering the nature and correlation between them. Even if such approach can be appropriate in the simplest case where all the input records have to be always analyzed, it becomes a limit for sophisticated analyses, in which correlations between records can be exploited to preliminarily prune unnecessary computations. In this paper we design a context-based multi-dimensional partitioning technique, called COPART, which takes care of data correlation in order to determine how records are subdivided between splits (i.e., units of work assigned to a computation node). More specifically, it considers not only the correlation of data w.r.t. contextual attributes, but also the distribution of each contextual dimension in the dataset. We experimentally compare our approach with existing ones, considering both quality criteria and the query execution times