Search CORE

6,657 research outputs found

Direct mining of subjectively interesting relational patterns

Author: Aknin Achille
De Bie Tijl
Guns Tias
Lijffijt Jefrey
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Data is typically complex and relational. Therefore, the development of relational data mining methods is an increasingly active topic of research. Recent work has resulted in new formalisations of patterns in relational data and in a way to quantify their interestingness in a subjective manner, taking into account the data analyst's prior beliefs about the data. Yet, a scalable algorithm to find such most interesting patterns is lacking. We introduce a new algorithm based on two notions: (1) the use of Constraint Programming, which results in a notably shorter development time, faster runtimes, and more flexibility for extensions such as branch-and-bound search, and (2), the direct search for the most interesting patterns only, instead of exhaustive enumeration of patterns before ranking them. Through empirical evaluation, we find that our novel bounds yield speedups up to several orders of magnitude, especially on dense data with a simple schema. This makes it possible to mine the most subjectively-interesting relational patterns present in databases where this was previously impractical or impossible

Crossref

Ghent University Academic Bibliography

Mining subjectively interesting attributed subgraphs

Author: Bendimerad Anes
De Bie Tijl
Lijffijt Jefrey
Mel Ahmad
Plantevit Marc
Robardet Céline
Publication venue
Publication date: 01/01/2018
Field of study

Community detection in graphs, data clustering, and local pattern mining are three mature fields of data mining and machine learning. In recent years, attributed subgraph mining is emerging as a new powerful data mining task in the intersection of these areas. Given a graph and a set of attributes for each vertex, attributed subgraph mining aims to find cohesive subgraphs for which (a subset of) the attribute values has exceptional values in some sense. While research on this task can borrow from the three abovementioned fields, the principled integration of graph and attribute data poses two challenges: the definition of a pattern language that is intuitive and lends itself to efficient search strategies, and the formalization of the interestingness of such patterns. We propose an integrated solution to both of these challenges. The proposed pattern language improves upon prior work in being both highly flexible and intuitive. We show how an effective and principled algorithm can enumerate patterns of this language. The proposed approach for quantifying interestingness of patterns of this language is rooted in information theory, and is able to account for prior knowledge on the data. Prior work typically quantifies interestingness based on the cohesion of the subgraph and for the exceptionality of its attributes separately, combining these in a parameterized trade-off. Instead, in our proposal this trade-off is implicitly handled in a principled, parameter-free manner. Extensive empirical results confirm the proposed pattern syntax is intuitive, and the interestingness measure aligns well with actual subjective interestingness

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Community Structure Characterization

Author: A Clauset
A Lancichinetti
A Lancichinetti
C Bothorel
F Radicchi
G Palla
GK Orman
Hongyun Cai
J Creusefond
J Shi
J Yang
L da Fontoura Costa
M Girvan
M Rosvall
M Rosvall
M Tumminello
MEJ Newman
MEJ Newman
MEJ Newman
MEJ Newman
MEJ Newman
N Dugué
N Kashtan
NR Mabroukeh
P Bródka
R Guimera
S Asur
S Fortunato
S Fortunato
T Aynaud
T-C Fu
V Labatut
Vincent Labatut
X Han
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This entry discusses the problem of describing some communities identified in a complex network of interest, in a way allowing to interpret them. We suppose the community structure has already been detected through one of the many methods proposed in the literature. The question is then to know how to extract valuable information from this first result, in order to allow human interpretation. This requires subsequent processing, which we describe in the rest of this entry

arXiv.org e-Print Archive

Crossref

Learning subjectively interesting data representations

Author: Kang Bo
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

Mining and modeling graphs using patterns and priors

Author: Adriaens Florian
Publication venue: Universiteit Gent. Faculteit Ingenieurswetenschappen en Architectuur
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography

Conditional network embeddings

Author: De Bie Tijl
Kang Bo
Lijffijt Jefrey
Publication venue
Publication date: 01/01/2019
Field of study

Network Embeddings (NEs) map the nodes of a given network into

d

-dimensional Euclidean space

\mathbb{R}^d

. Ideally, this mapping is such that 'similar' nodes are mapped onto nearby points, such that the NE can be used for purposes such as link prediction (if 'similar' means being 'more likely to be connected') or classification (if 'similar' means 'being more likely to have the same label'). In recent years various methods for NE have been introduced, all following a similar strategy: defining a notion of similarity between nodes (typically some distance measure within the network), a distance measure in the embedding space, and a loss function that penalizes large distances for similar nodes and small distances for dissimilar nodes. A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties: (approximate) multipartiteness, certain degree distributions, assortativity, etc. To overcome this, we introduce a conceptual innovation to the NE literature and propose to create \emph{Conditional Network Embeddings} (CNEs); embeddings that maximally add information with respect to given structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently. We demonstrate that CNEs are superior for link prediction and multi-label classification when compared to state-of-the-art methods, and this without adding significant mathematical or computational complexity. Finally, we illustrate the potential of CNE for network visualization

Ghent University Academic Bibliography

Subjectively interesting connecting trees and forests

Author: Adriaens Florian
De Bie Tijl
Lijffijt Jefrey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Consider a large graph or network, and a user-provided set of query vertices between which the user wishes to explore relations. For example, a researcher may want to connect research papers in a citation network, an analyst may wish to connect organized crime suspects in a communication network, or an internet user may want to organize their bookmarks given their location in the world wide web. A natural way to do this is to connect the vertices in the form of a tree structure that is present in the graph. However, in sufficiently dense graphs, most such trees will be large or somehow trivial (e.g. involving high degree vertices) and thus not insightful. Extending previous research, we define and investigate the new problem of mining subjectively interesting trees connecting a set of query vertices in a graph, i.e., trees that are highly surprising to the specific user at hand. Using information theoretic principles, we formalize the notion of interestingness of such trees mathematically, taking in account certain prior beliefs the user has specified about the graph. A remaining problem is efficiently fitting a prior belief model. We show how this can be done for a large class of prior beliefs. Given a specified prior belief model, we then propose heuristic algorithms to find the best trees efficiently. An empirical validation of our methods on a large real graphs evaluates the different heuristics and validates the interestingness of the given trees

Ghent University Academic Bibliography

Survey of data mining approaches to user modeling for adaptive hypermedia

Author: Chen SY
Frias-Martinez E
Liu X
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio

CiteSeerX

Crossref

Brunel University Research Archive