243,466 research outputs found

    Structural advances for pattern discovery in multi-relational databases

    Get PDF
    With ever-growing storage needs and drift towards very large relational storage settings, multi-relational data mining has become a prominent and pertinent field for discovering unique and interesting relational patterns. As a consequence, a whole suite of multi-relational data mining techniques is being developed. These techniques may either be extensions to the already existing single-table mining techniques or may be developed from scratch. For the traditionalists, single-table mining algorithms can be used to work on multi-relational settings by making inelegant and time consuming joins of all target relations. However, complex relational patterns cannot be expressed in a single-table format and thus, cannot be discovered. This work presents a new multi-relational frequent pattern mining algorithm termed Multi-Relational Frequent Pattern Growth (MRFP Growth). MRFP Growth is capable of mining multiple relations, linked with referential integrity, for frequent patterns that satisfy a user specified support threshold. Empirical results on MRFP Growth performance and its comparison with the state-of-the-art multirelational data mining algorithms like WARMR and Decentralized Apriori are discussed at length. MRFP Growth scores over the latter two techniques in number of patterns generated and speed. The realm of multi-relational clustering is also explored in this thesis. A multi-Relational Item Clustering approach based on Hypergraphs (RICH) is proposed. Experimentally RICH combined with MRFP Growth proves to be a competitive approach for clustering multi-relational data. The performance and iii quality of clusters generated by RICH are compared with other clustering algorithms. Finally, the thesis demonstrates the applied utility of the theoretical implications of the above mentioned algorithms in an application framework for auto-annotation of images in an image database. The system is called CoMMA which stands for Combining Multi-relational Multimedia for Associations

    Mining Structural Databases: An Evolutionary Multi-Objetive Conceptual Clustering Methodology

    Get PDF
    The increased availability of biological databases contain ing representations of complex objects permits access to vast amounts of data. In spite of the recent renewed interest in knowledge-discovery tech niques (or data mining), there is a dearth of data analysis methods in tended to facilitate understanding of the represented objects and related systems by their most representative features and those relationship de rived from these features (i.e., structural data). In this paper we propose a conceptual clustering methodology termed EMO-CC for Evolution ary Multi-Objective Conceptual Clustering that uses multi-objective and multi-modal optimization techniques based on Evolutionary Algorithms that uncover representative substructures from structural databases. Be sides, EMO-CC provides annotations of the uncovered substructures, and based on them, applies an unsupervised classification approach to retrieve new members of previously discovered substructures. We apply EMO-CC to the Gene Ontology database to recover interesting sub structures that describes problems from different points of view and use them to explain inmuno-inflammatory responses measured in terms of gene expression profiles derived from the analysis of longitudinal blood expression profiles of human volunteers treated with intravenous endo toxin compared to placebo

    Fast Search for Dynamic Multi-Relational Graphs

    Full text link
    Acting on time-critical events by processing ever growing social media or news streams is a major technical challenge. Many of these data sources can be modeled as multi-relational graphs. Continuous queries or techniques to search for rare events that typically arise in monitoring applications have been studied extensively for relational databases. This work is dedicated to answer the question that emerges naturally: how can we efficiently execute a continuous query on a dynamic graph? This paper presents an exact subgraph search algorithm that exploits the temporal characteristics of representative queries for online news or social media monitoring. The algorithm is based on a novel data structure called the Subgraph Join Tree (SJ-Tree) that leverages the structural and semantic characteristics of the underlying multi-relational graph. The paper concludes with extensive experimentation on several real-world datasets that demonstrates the validity of this approach.Comment: SIGMOD Workshop on Dynamic Networks Management and Mining (DyNetMM), 201

    HLOC: Hints-Based Geolocation Leveraging Multiple Measurement Frameworks

    Full text link
    Geographically locating an IP address is of interest for many purposes. There are two major ways to obtain the location of an IP address: querying commercial databases or conducting latency measurements. For structural Internet nodes, such as routers, commercial databases are limited by low accuracy, while current measurement-based approaches overwhelm users with setup overhead and scalability issues. In this work we present our system HLOC, aiming to combine the ease of database use with the accuracy of latency measurements. We evaluate HLOC on a comprehensive router data set of 1.4M IPv4 and 183k IPv6 routers. HLOC first extracts location hints from rDNS names, and then conducts multi-tier latency measurements. Configuration complexity is minimized by using publicly available large-scale measurement frameworks such as RIPE Atlas. Using this measurement, we can confirm or disprove the location hints found in domain names. We publicly release HLOC's ready-to-use source code, enabling researchers to easily increase geolocation accuracy with minimum overhead.Comment: As published in TMA'17 conference: http://tma.ifip.org/main-conference

    Targeting the Hsp90 interactome using in silico polypharmacology approaches

    Get PDF
    In recent years, polypharmacology has gained popularity in drug discovery. [1] Especially for complex diseases such as cancer, the ability of a drug to bind to and interfere with multiple targets provides new opportunities for therapeutic intervention In this article, we focus on Hsp90 and its interactome, whose pivotal role in survival and proliferation of cancer cells renders this array of targets particularly attractive polypharmacological drug design strategies. The primary goal of our work is the identification and selection of suitable target proteins from the interactome that might be combined with Hsp90 to explore and exploit a multi-target inhibition approach. This task is accomplished by applying computational methods to mine the structural and biological information associated with potential ligands in public databases and assess the degree of structural similarity between known inhibitors of different targets. Therefore, we propose an integrated ligand- and structure-based approach to select small molecules from databases suitable for consideration as multi-target inhibitors
    • …
    corecore