Search CORE

243,466 research outputs found

Structural advances for pattern discovery in multi-relational databases

Author: Kanodia Juveria
Publication venue: RIT Scholar Works
Publication date: 01/01/2005
Field of study

With ever-growing storage needs and drift towards very large relational storage settings, multi-relational data mining has become a prominent and pertinent field for discovering unique and interesting relational patterns. As a consequence, a whole suite of multi-relational data mining techniques is being developed. These techniques may either be extensions to the already existing single-table mining techniques or may be developed from scratch. For the traditionalists, single-table mining algorithms can be used to work on multi-relational settings by making inelegant and time consuming joins of all target relations. However, complex relational patterns cannot be expressed in a single-table format and thus, cannot be discovered. This work presents a new multi-relational frequent pattern mining algorithm termed Multi-Relational Frequent Pattern Growth (MRFP Growth). MRFP Growth is capable of mining multiple relations, linked with referential integrity, for frequent patterns that satisfy a user specified support threshold. Empirical results on MRFP Growth performance and its comparison with the state-of-the-art multirelational data mining algorithms like WARMR and Decentralized Apriori are discussed at length. MRFP Growth scores over the latter two techniques in number of patterns generated and speed. The realm of multi-relational clustering is also explored in this thesis. A multi-Relational Item Clustering approach based on Hypergraphs (RICH) is proposed. Experimentally RICH combined with MRFP Growth proves to be a competitive approach for clustering multi-relational data. The performance and iii quality of clusters generated by RICH are compared with other clustering algorithms. Finally, the thesis demonstrates the applied utility of the theoretical implications of the above mentioned algorithms in an application framework for auto-annotation of images in an image database. The system is called CoMMA which stands for Combining Multi-relational Multimedia for Associations

RIT Scholar Works

Mining Structural Databases: An Evolutionary Multi-Objetive Conceptual Clustering Methodology

Author: Cordón Óscar
Harari Óscar
Romero Zaliz Rocío
Rubio Escudero Cristina
Val Coral del
Zwir Igor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The increased availability of biological databases contain ing representations of complex objects permits access to vast amounts of data. In spite of the recent renewed interest in knowledge-discovery tech niques (or data mining), there is a dearth of data analysis methods in tended to facilitate understanding of the represented objects and related systems by their most representative features and those relationship de rived from these features (i.e., structural data). In this paper we propose a conceptual clustering methodology termed EMO-CC for Evolution ary Multi-Objective Conceptual Clustering that uses multi-objective and multi-modal optimization techniques based on Evolutionary Algorithms that uncover representative substructures from structural databases. Be sides, EMO-CC provides annotations of the uncovered substructures, and based on them, applies an unsupervised classification approach to retrieve new members of previously discovered substructures. We apply EMO-CC to the Gene Ontology database to recover interesting sub structures that describes problems from different points of view and use them to explain inmuno-inflammatory responses measured in terms of gene expression profiles derived from the analysis of longitudinal blood expression profiles of human volunteers treated with intravenous endo toxin compared to placebo

idUS. Depósito de Investigación Universidad de Sevilla

Recommended from our members

Multi-class protein fold classification using a new ensemble machine learning approach.

Author: Deville Y
Gilbert D
Tan A
Publication venue: GIW
Publication date: 01/01/2003
Field of study

Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. Recent structural genomics initiatives and other high-throughput experiments have populated the biological databases at a rapid pace. The amount of structural data has made traditional methods such as manual inspection of the protein structure become impossible. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. This work proposes a novel ensemble machine learning method that improves the coverage of the classifiers under the multi-class imbalanced sample sets by integrating knowledge induced from different base classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have compared our approach with PART and show that our method improves the sensitivity of the classifier in protein fold classification. Furthermore, we have extended this method to learning over multiple data types, preserving the independence of their corresponding data sources, and show that our new approach performs at least as well as the traditional technique over a single joined data source. These experimental results are encouraging, and can be applied to other bioinformatics problems similarly characterised by multi-class imbalanced data sets held in multiple data sources

Brunel University Research Archive

Fast Search for Dynamic Multi-Relational Graphs

Author: Chin George
Choudhury Sutanay
Feo John
Holder Lawrence
Publication venue
Publication date: 01/01/2013
Field of study

Acting on time-critical events by processing ever growing social media or news streams is a major technical challenge. Many of these data sources can be modeled as multi-relational graphs. Continuous queries or techniques to search for rare events that typically arise in monitoring applications have been studied extensively for relational databases. This work is dedicated to answer the question that emerges naturally: how can we efficiently execute a continuous query on a dynamic graph? This paper presents an exact subgraph search algorithm that exploits the temporal characteristics of representative queries for online news or social media monitoring. The algorithm is based on a novel data structure called the Subgraph Join Tree (SJ-Tree) that leverages the structural and semantic characteristics of the underlying multi-relational graph. The paper concludes with extensive experimentation on several real-world datasets that demonstrates the validity of this approach.Comment: SIGMOD Workshop on Dynamic Networks Management and Mining (DyNetMM), 201

arXiv.org e-Print Archive

Crossref

HLOC: Hints-Based Geolocation Leveraging Multiple Measurement Frameworks

Author: bottger
durumeric
edmundson
gasser
snyder
wang
wong
zhang
Publication venue
Publication date: 28/06/2017
Field of study

Geographically locating an IP address is of interest for many purposes. There are two major ways to obtain the location of an IP address: querying commercial databases or conducting latency measurements. For structural Internet nodes, such as routers, commercial databases are limited by low accuracy, while current measurement-based approaches overwhelm users with setup overhead and scalability issues. In this work we present our system HLOC, aiming to combine the ease of database use with the accuracy of latency measurements. We evaluate HLOC on a comprehensive router data set of 1.4M IPv4 and 183k IPv6 routers. HLOC first extracts location hints from rDNS names, and then conducts multi-tier latency measurements. Configuration complexity is minimized by using publicly available large-scale measurement frameworks such as RIPE Atlas. Using this measurement, we can confirm or disprove the location hints found in domain names. We publicly release HLOC's ready-to-use source code, enabling researchers to easily increase geolocation accuracy with minimum overhead.Comment: As published in TMA'17 conference: http://tma.ifip.org/main-conference

arXiv.org e-Print Archive

Crossref

Targeting the Hsp90 interactome using in silico polypharmacology approaches

Author: Anighoro Andrew
Bajorath J.
Heikamp K.
Rastelli Giulio
Stumpfe D.
Publication venue
Publication date: 01/01/2013
Field of study

In recent years, polypharmacology has gained popularity in drug discovery. [1] Especially for complex diseases such as cancer, the ability of a drug to bind to and interfere with multiple targets provides new opportunities for therapeutic intervention In this article, we focus on Hsp90 and its interactome, whose pivotal role in survival and proliferation of cancer cells renders this array of targets particularly attractive polypharmacological drug design strategies. The primary goal of our work is the identification and selection of suitable target proteins from the interactome that might be combined with Hsp90 to explore and exploit a multi-target inhibition approach. This task is accomplished by applying computational methods to mine the structural and biological information associated with potential ligands in public databases and assess the degree of structural similarity between known inhibitors of different targets. Therefore, we propose an integrated ligand- and structure-based approach to select small molecules from databases suitable for consideration as multi-target inhibitors

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia