152 research outputs found

    05441 Abstracts Collection -- Managing and Mining Genome Information: Frontiers in Bioinformatics

    Get PDF
    From 30.10.05 to 04.11.05, the Dagstuhl Seminar 05441 ``Managing and Mining Genome Information: Frontiers in Bioinformatics\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    MPP-MLO: Multilevel Parallel Partitioning for Efficiently Matching Large Ontologies

    Get PDF
    221-229The growing usage of Semantic Web has resulted in an increasing number, size and heterogeneity of ontologies on the web. Therefore, the necessity of ontology matching techniques, which could solve these issues, is highly required. Due to high computational requirements, scalability is always a major concern in ontology matching system. In this work, a partition-based ontology matching system is proposed, which deals with parallel partitioning of the ontologies at multilevel. At first level, the root based ontology partitioning is proposed. Match able sub-ontology pair is generated using an efficient linguistic matcher (IEI-Sub) to uncover anchors and then based on maximum similarity values, pairs are generated. However, a distributed and parallel approach of Map Reduce-based IEI-sub process has been proposed to efficiently handle the anchor discovery process which is highly time-consuming. In second level partitioning, an efficient approach is proposed to form non-overlapping clusters. Extensive experimental evaluation is done by comparing existing approaches with the proposed approach, and the results shows that MPP-MLO turns out to be an efficient and scalable ontology matching system with 58.7% reduction in overall execution time

    MPP-MLO: Multilevel Parallel Partitioning for Efficiently Matching Large Ontologies

    Get PDF
    The growing usage of Semantic Web has resulted in an increasing number, size and heterogeneity of ontologies on the web. Therefore, the necessity of ontology matching techniques, which could solve these issues, is highly required. Due to high computational requirements, scalability is always a major concern in ontology matching system. In this work, a partition-based ontology matching system is proposed, which deals with parallel partitioning of the ontologies at multilevel. At first level, the root based ontology partitioning is proposed. Matchable Sub-ontologies pair is generated using an efficient linguistic matcher (IEI-Sub) to uncover anchors and then based on maximum similarity value, pairs are generated. However, a distributed and parallel approach of MapReduce-based SEI-sub process has been proposed to efficiently handle the anchor discovery process which is highly time-consuming. In second level partitioning, an efficient approach is proposed to form non overlapping clusters. Extensive experimental evaluation is done by comparing existing approaches with the proposed approach, and the results shows that MPP-MLO turns out to be an efficient and scalable ontology matching system

    A Step Toward Improving Healthcare Information Integration & Decision Support: Ontology, Sustainability and Resilience

    Get PDF
    The healthcare industry is a complex system with numerous stakeholders, including patients, providers, insurers, and government agencies. To improve healthcare quality and population well-being, there is a growing need to leverage data and IT (Information Technology) to support better decision-making. Healthcare information systems (HIS) are developed to store, process, and disseminate healthcare data. One of the main challenges with HIS is effectively managing the large amounts of data to support decision-making. This requires integrating data from disparate sources, such as electronic health records, clinical trials, and research databases. Ontology is one approach to address this challenge. However, understanding ontology in the healthcare domain is complex and difficult. Another challenge is to use HIS on scheduling and resource allocation in a sustainable and resilient way that meets multiple conflicting objectives. This is especially important in times of crisis when demand for resources may be high, and supply may be limited. This research thesis aims to explore ontology theory and develop a methodology for constructing HIS that can effectively support better decision-making in terms of scheduling and resource allocation while considering system resiliency and social sustainability. The objectives of the thesis are: (1) studying the theory of ontology in healthcare data and developing a deep model for constructing HIS; (2) advancing our understanding of healthcare system resiliency and social sustainability; (3) developing a methodology for scheduling with multi-objectives; and (4) developing a methodology for resource allocation with multi-objectives. The following conclusions can be drawn from the research results: (1) A data model for rich semantics and easy data integration can be created with a clearer definition of the scope and applicability of ontology; (2) A healthcare system's resilience and sustainability can be significantly increased by the suggested design principles; (3) Through careful consideration of both efficiency and patients' experiences and a novel optimization algorithm, a scheduling problem can be made more patient-accessible; (4) A systematic approach to evaluating efficiency, sustainability, and resilience enables the simultaneous optimization of all three criteria at the system design stage, leading to more efficient distributions of resources and locations for healthcare facilities. The contributions of the thesis can be summarized as follows. Scientifically, this thesis work has expanded our knowledge of ontology and data modelling, as well as our comprehension of the healthcare system's resilience and sustainability. Technologically or methodologically, the work has advanced the state of knowledge for system modelling and decision-making. Overall, this thesis examines the characteristics of healthcare systems from a system viewpoint. Three ideas in this thesis—the ontology-based data modelling approach, multi-objective optimization models, and the algorithms for solving the models—can be adapted and used to affect different aspects of disparate systems

    Semantically aware hierarchical Bayesian network model for knowledge discovery in data : an ontology-based framework

    Get PDF
    Several mining algorithms have been invented over the course of recent decades. However, many of the invented algorithms are confined to generating frequent patterns and do not illustrate how to act upon them. Hence, many researchers have argued that existing mining algorithms have some limitations with respect to performance and workability. Quantity and quality are the main limitations of the existing mining algorithms. While quantity states that the generated patterns are abundant, quality indicates that they cannot be integrated into the business domain seamlessly. Consequently, recent research has suggested that the limitations of the existing mining algorithms are the result of treating the mining process as an isolated and autonomous data-driven trial-and-error process and ignoring the domain knowledge. Accordingly, the integration of domain knowledge into the mining process has become the goal of recent data mining algorithms. Domain knowledge can be represented using various techniques. However, recent research has stated that ontology is the natural way to represent knowledge for data mining use. The structural nature of ontology makes it a very strong candidate for integrating domain knowledge with data mining algorithms. It has been claimed that ontology can play the following roles in the data mining process: •Bridging the semantic gap. •Providing prior knowledge and constraints. •Formally representing the DM results. Despite the fact that a variety of research has used ontology to enrich different tasks in the data mining process, recent research has revealed that the process of developing a framework that systematically consolidates ontology and the mining algorithms in an intelligent mining environment has not been realised. Hence, this thesis proposes an automatic, systematic and flexible framework that integrates the Hierarchical Bayesian Network (HBN) and domain ontology. The ultimate aim of this thesis is to propose a data mining framework that implicitly caters for the underpinning domain knowledge and eventually leads to a more intelligent and accurate mining process. To a certain extent the proposed mining model will simulate the cognitive system in the human being. The similarity between ontology, the Bayesian Network (BN) and bioinformatics applications establishes a strong connection between these research disciplines. This similarity can be summarised in the following points: •Both ontology and BN have a graphical-based structure. •Biomedical applications are known for their uncertainty. Likewise, BN is a powerful tool for reasoning under uncertainty. •The medical data involved in biomedical applications is comprehensive and ontology is the right model for representing comprehensive data. Hence, the proposed ontology-based Semantically Aware Hierarchical Bayesian Network (SAHBN) is applied to eight biomedical data sets in the field of predicting the effect of the DNA repair gene in the human ageing process and the identification of hub protein. Consequently, the performance of SAHBN was compared with existing Bayesian-based classification algorithms. Overall, SAHBN demonstrated a very competitive performance. The contribution of this thesis can be summarised in the following points. •Proposed an automatic, systematic and flexible framework to integrate ontology and the HBN. Based on the literature review, and to the best of our knowledge, no such framework has been proposed previously. •The complexity of learning HBN structure from observed data is significant. Hence, the proposed SAHBN model utilized the domain knowledge in the form of ontology to overcome this challenge. •The proposed SAHBN model preserves the advantages of both ontology and Bayesian theory. It integrates the concept of Bayesian uncertainty with the deterministic nature of ontology without extending ontology structure and adding probability-specific properties that violate the ontology standard structure. •The proposed SAHBN utilized the domain knowledge in the form of ontology to define the semantic relationships between the attributes involved in the mining process, guides the HBN structure construction procedure, checks the consistency of the training data set and facilitates the calculation of the associated conditional probability tables (CPTs). •The proposed SAHBN model lay out a solid foundation to integrate other semantic relations such as equivalent, disjoint, intersection and union

    A hybrid algorithm for Bayesian network structure learning with application to multi-label learning

    Get PDF
    We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining
    • …
    corecore