23,755 research outputs found
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
Relational Data Mining Through Extraction of Representative Exemplars
With the growing interest on Network Analysis, Relational Data Mining is
becoming an emphasized domain of Data Mining. This paper addresses the problem
of extracting representative elements from a relational dataset. After defining
the notion of degree of representativeness, computed using the Borda
aggregation procedure, we present the extraction of exemplars which are the
representative elements of the dataset. We use these concepts to build a
network on the dataset. We expose the main properties of these notions and we
propose two typical applications of our framework. The first application
consists in resuming and structuring a set of binary images and the second in
mining co-authoring relation in a research team
A framework for community detection in heterogeneous multi-relational networks
There has been a surge of interest in community detection in homogeneous
single-relational networks which contain only one type of nodes and edges.
However, many real-world systems are naturally described as heterogeneous
multi-relational networks which contain multiple types of nodes and edges. In
this paper, we propose a new method for detecting communities in such networks.
Our method is based on optimizing the composite modularity, which is a new
modularity proposed for evaluating partitions of a heterogeneous
multi-relational network into communities. Our method is parameter-free,
scalable, and suitable for various networks with general structure. We
demonstrate that it outperforms the state-of-the-art techniques in detecting
pre-planted communities in synthetic networks. Applied to a real-world Digg
network, it successfully detects meaningful communities.Comment: 27 pages, 10 figure
Survey of data mining approaches to user modeling for adaptive hypermedia
The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio
- …