243,466 research outputs found
Structural advances for pattern discovery in multi-relational databases
With ever-growing storage needs and drift towards very large relational storage settings, multi-relational data mining has become a prominent and pertinent field for discovering unique and interesting relational patterns. As a consequence, a whole suite of multi-relational data mining techniques is being developed. These techniques may either be extensions to the already existing single-table mining techniques or may be developed from scratch. For the traditionalists, single-table mining algorithms can be used to work on multi-relational settings by making inelegant and time consuming joins of all target relations. However, complex relational patterns cannot be expressed in a single-table format and thus, cannot be discovered. This work presents a new multi-relational frequent pattern mining algorithm termed Multi-Relational Frequent Pattern Growth (MRFP Growth). MRFP Growth is capable of mining multiple relations, linked with referential integrity, for frequent patterns that satisfy a user specified support threshold. Empirical results on MRFP Growth performance and its comparison with the state-of-the-art multirelational data mining algorithms like WARMR and Decentralized Apriori are discussed at length. MRFP Growth scores over the latter two techniques in number of patterns generated and speed. The realm of multi-relational clustering is also explored in this thesis. A multi-Relational Item Clustering approach based on Hypergraphs (RICH) is proposed. Experimentally RICH combined with MRFP Growth proves to be a competitive approach for clustering multi-relational data. The performance and iii quality of clusters generated by RICH are compared with other clustering algorithms. Finally, the thesis demonstrates the applied utility of the theoretical implications of the above mentioned algorithms in an application framework for auto-annotation of images in an image database. The system is called CoMMA which stands for Combining Multi-relational Multimedia for Associations
Mining Structural Databases: An Evolutionary Multi-Objetive Conceptual Clustering Methodology
The increased availability of biological databases contain ing representations of complex objects permits access to vast amounts of
data. In spite of the recent renewed interest in knowledge-discovery tech niques (or data mining), there is a dearth of data analysis methods in tended to facilitate understanding of the represented objects and related
systems by their most representative features and those relationship de rived from these features (i.e., structural data). In this paper we propose
a conceptual clustering methodology termed EMO-CC for Evolution ary Multi-Objective Conceptual Clustering that uses multi-objective and
multi-modal optimization techniques based on Evolutionary Algorithms
that uncover representative substructures from structural databases. Be sides, EMO-CC provides annotations of the uncovered substructures,
and based on them, applies an unsupervised classification approach to
retrieve new members of previously discovered substructures. We apply
EMO-CC to the Gene Ontology database to recover interesting sub structures that describes problems from different points of view and use
them to explain inmuno-inflammatory responses measured in terms of
gene expression profiles derived from the analysis of longitudinal blood
expression profiles of human volunteers treated with intravenous endo toxin compared to placebo
Recommended from our members
Multi-class protein fold classification using a new ensemble machine learning approach.
Protein structure classification represents an important process in understanding the associations
between sequence and structure as well as possible functional and evolutionary relationships.
Recent structural genomics initiatives and other high-throughput experiments have populated the
biological databases at a rapid pace. The amount of structural data has made traditional methods
such as manual inspection of the protein structure become impossible. Machine learning has been
widely applied to bioinformatics and has gained a lot of success in this research area. This work
proposes a novel ensemble machine learning method that improves the coverage of the classifiers
under the multi-class imbalanced sample sets by integrating knowledge induced from different base
classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have
compared our approach with PART and show that our method improves the sensitivity of the
classifier in protein fold classification. Furthermore, we have extended this method to learning over
multiple data types, preserving the independence of their corresponding data sources, and show
that our new approach performs at least as well as the traditional technique over a single joined
data source. These experimental results are encouraging, and can be applied to other bioinformatics
problems similarly characterised by multi-class imbalanced data sets held in multiple data
sources
Fast Search for Dynamic Multi-Relational Graphs
Acting on time-critical events by processing ever growing social media or
news streams is a major technical challenge. Many of these data sources can be
modeled as multi-relational graphs. Continuous queries or techniques to search
for rare events that typically arise in monitoring applications have been
studied extensively for relational databases. This work is dedicated to answer
the question that emerges naturally: how can we efficiently execute a
continuous query on a dynamic graph? This paper presents an exact subgraph
search algorithm that exploits the temporal characteristics of representative
queries for online news or social media monitoring. The algorithm is based on a
novel data structure called the Subgraph Join Tree (SJ-Tree) that leverages the
structural and semantic characteristics of the underlying multi-relational
graph. The paper concludes with extensive experimentation on several real-world
datasets that demonstrates the validity of this approach.Comment: SIGMOD Workshop on Dynamic Networks Management and Mining (DyNetMM),
201
HLOC: Hints-Based Geolocation Leveraging Multiple Measurement Frameworks
Geographically locating an IP address is of interest for many purposes. There
are two major ways to obtain the location of an IP address: querying commercial
databases or conducting latency measurements. For structural Internet nodes,
such as routers, commercial databases are limited by low accuracy, while
current measurement-based approaches overwhelm users with setup overhead and
scalability issues. In this work we present our system HLOC, aiming to combine
the ease of database use with the accuracy of latency measurements. We evaluate
HLOC on a comprehensive router data set of 1.4M IPv4 and 183k IPv6 routers.
HLOC first extracts location hints from rDNS names, and then conducts
multi-tier latency measurements. Configuration complexity is minimized by using
publicly available large-scale measurement frameworks such as RIPE Atlas. Using
this measurement, we can confirm or disprove the location hints found in domain
names. We publicly release HLOC's ready-to-use source code, enabling
researchers to easily increase geolocation accuracy with minimum overhead.Comment: As published in TMA'17 conference:
http://tma.ifip.org/main-conference
Targeting the Hsp90 interactome using in silico polypharmacology approaches
In recent years, polypharmacology has gained popularity in drug discovery. [1] Especially for complex diseases such as cancer, the ability of a drug to bind to and interfere with multiple targets provides new opportunities for therapeutic intervention In this article, we focus on Hsp90 and its interactome, whose pivotal role in survival and proliferation of cancer cells renders this array of targets particularly attractive polypharmacological drug design strategies.
The primary goal of our work is the identification and selection of suitable target proteins from the interactome that might be combined with Hsp90 to explore and exploit a multi-target inhibition approach. This task is accomplished by applying computational methods to mine the structural and biological information associated with potential ligands in public databases and assess the degree of structural similarity between known inhibitors of different targets. Therefore, we propose an integrated ligand- and structure-based approach to select small molecules from databases suitable for consideration as multi-target inhibitors
- …