155,475 research outputs found
Structural advances for pattern discovery in multi-relational databases
With ever-growing storage needs and drift towards very large relational storage settings, multi-relational data mining has become a prominent and pertinent field for discovering unique and interesting relational patterns. As a consequence, a whole suite of multi-relational data mining techniques is being developed. These techniques may either be extensions to the already existing single-table mining techniques or may be developed from scratch. For the traditionalists, single-table mining algorithms can be used to work on multi-relational settings by making inelegant and time consuming joins of all target relations. However, complex relational patterns cannot be expressed in a single-table format and thus, cannot be discovered. This work presents a new multi-relational frequent pattern mining algorithm termed Multi-Relational Frequent Pattern Growth (MRFP Growth). MRFP Growth is capable of mining multiple relations, linked with referential integrity, for frequent patterns that satisfy a user specified support threshold. Empirical results on MRFP Growth performance and its comparison with the state-of-the-art multirelational data mining algorithms like WARMR and Decentralized Apriori are discussed at length. MRFP Growth scores over the latter two techniques in number of patterns generated and speed. The realm of multi-relational clustering is also explored in this thesis. A multi-Relational Item Clustering approach based on Hypergraphs (RICH) is proposed. Experimentally RICH combined with MRFP Growth proves to be a competitive approach for clustering multi-relational data. The performance and iii quality of clusters generated by RICH are compared with other clustering algorithms. Finally, the thesis demonstrates the applied utility of the theoretical implications of the above mentioned algorithms in an application framework for auto-annotation of images in an image database. The system is called CoMMA which stands for Combining Multi-relational Multimedia for Associations
Direct mining of subjectively interesting relational patterns
Data is typically complex and relational. Therefore, the development of relational data mining methods is an increasingly active topic of research. Recent work has resulted in new formalisations of patterns in relational data and in a way to quantify their interestingness in a subjective manner, taking into account the data analyst's prior beliefs about the data. Yet, a scalable algorithm to find such most interesting patterns is lacking. We introduce a new algorithm based on two notions: (1) the use of Constraint Programming, which results in a notably shorter development time, faster runtimes, and more flexibility for extensions such as branch-and-bound search, and (2), the direct search for the most interesting patterns only, instead of exhaustive enumeration of patterns before ranking them. Through empirical evaluation, we find that our novel bounds yield speedups up to several orders of magnitude, especially on dense data with a simple schema. This makes it possible to mine the most subjectively-interesting relational patterns present in databases where this was previously impractical or impossible
Analysing the behaviour of robot teams through relational sequential pattern mining
This report outlines the use of a relational representation in a Multi-Agent
domain to model the behaviour of the whole system. A desired property in this
systems is the ability of the team members to work together to achieve a common
goal in a cooperative manner. The aim is to define a systematic method to
verify the effective collaboration among the members of a team and comparing
the different multi-agent behaviours. Using external observations of a
Multi-Agent System to analyse, model, recognize agent behaviour could be very
useful to direct team actions. In particular, this report focuses on the
challenge of autonomous unsupervised sequential learning of the team's
behaviour from observations. Our approach allows to learn a symbolic sequence
(a relational representation) to translate raw multi-agent, multi-variate
observations of a dynamic, complex environment, into a set of sequential
behaviours that are characteristic of the team in question, represented by a
set of sequences expressed in first-order logic atoms. We propose to use a
relational learning algorithm to mine meaningful frequent patterns among the
relational sequences to characterise team behaviours. We compared the
performance of two teams in the RoboCup four-legged league environment, that
have a very different approach to the game. One uses a Case Based Reasoning
approach, the other uses a pure reactive behaviour.Comment: 25 page
Probabilistic Latent Tensor Factorization Model for Link Pattern Prediction in Multi-relational Networks
This paper aims at the problem of link pattern prediction in collections of
objects connected by multiple relation types, where each type may play a
distinct role. While common link analysis models are limited to single-type
link prediction, we attempt here to capture the correlations among different
relation types and reveal the impact of various relation types on performance
quality. For that, we define the overall relations between object pairs as a
\textit{link pattern} which consists in interaction pattern and connection
structure in the network, and then use tensor formalization to jointly model
and predict the link patterns, which we refer to as \textit{Link Pattern
Prediction} (LPP) problem. To address the issue, we propose a Probabilistic
Latent Tensor Factorization (PLTF) model by introducing another latent factor
for multiple relation types and furnish the Hierarchical Bayesian treatment of
the proposed probabilistic model to avoid overfitting for solving the LPP
problem. To learn the proposed model we develop an efficient Markov Chain Monte
Carlo sampling method. Extensive experiments are conducted on several real
world datasets and demonstrate significant improvements over several existing
state-of-the-art methods.Comment: 19pages, 5 figure
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
- …