482 research outputs found
Inferring offline hierarchical ties from online social networks
Social networks can represent many different types of relationships between actors, some explicit and some implicit. For example, email communications between users may be represented explicitly in a network, while managerial relationships may not. In this paper we focus on analyzing explicit interactions among actors in order to detect hierarchical social relationships that may be implicit. We start by employing three well-known ranking-based methods, PageRank, Degree Centrality, and Rooted-PageRank (RPR) to infer such implicit relationships from interactions between actors. Then we propose two novel
approaches which take into account the time-dimension of interactions in the process of detecting hierarchical ties. We experiment on two datasets, the Enron email dataset to infer manager-subordinate relationships from email exchanges, and a scientific publication co-authorship dataset to detect PhD advisor-advisee relationships from paper co-authorships. Our experiments show that time-based methods perform considerably better than ranking-based methods. In the Enron dataset, they detect 48% of manager-subordinate ties versus 32% found by Rooted-PageRank. Similarly, in co-author dataset, they detect 62% of advisor-advisee ties compared to only 39% by Rooted-PageRank
Detecting hierarchical relationships and roles from online interaction networks
In social networks, analysing the explicit interactions among users can help in
inferring hierarchical relationships and roles that may be implicit. In this thesis,
we focus on two objectives: detecting hierarchical relationships between users and
inferring the hierarchical roles of users interacting via the same online communication
medium. In both cases, we show that considering the temporal dimension of
interaction substantially improves the detection of relationships and roles.
The first focus of this thesis is on the problem of inferring implicit relationships
from interactions between users. Based on promising results obtained by standard
link-analysis methods such as PageRank and Rooted-PageRank (RPR), we introduce
three novel time-based approaches, \Time-F" based on a defined time function,
Filter and Refine (FiRe) which is a hybrid approach based on RPR and Time-F,
and Time-sensitive Rooted-PageRank (T-RPR) which applies RPR in a way that
takes into account the time-dimension of interactions in the process of detecting
hierarchical ties.
We experiment on two datasets, the Enron email dataset to infer managersubordinate
relationships from email exchanges, and a scientific publication coauthorship
dataset to detect PhD advisor-advisee relationships from paper co-authorships.
Our experiments demonstrate that time-based methods perform better in terms of
recall. In particular T-RPR turns out to be superior over most recent competitor
methods as well as all other approaches we propose.
The second focus of this thesis is examining the online communication behaviour
of users working on the same activity in order to identify the different hierarchical
roles played by the users. We propose two approaches. In the first approach, supervised
learning is used to train different classification algorithms. In the second
approach, we address the problem as a sequence classification problem. A novel
sequence classification framework is defined that generates time-dependent features based on frequent patterns at multiple levels of time granularity. Our framework is
a
exible technique for sequence classification to be applied in different domains.
We experiment on an educational dataset collected from an asynchronous communication
tool used by students to accomplish an underlying group project. Our
experimental findings show that the first supervised approach achieves the best mapping
of students to their roles when the individual attributes of the students, information
about the reply relationships among them as well as quantitative time-based
features are considered. Similarly, our multi-granularity pattern-based framework
shows competitive performance in detecting the students' roles. Both approaches
are significantly better than the baselines considered
Detecting hierarchical relationships and roles from online interaction networks
In social networks, analysing the explicit interactions among users can help in
inferring hierarchical relationships and roles that may be implicit. In this thesis,
we focus on two objectives: detecting hierarchical relationships between users and
inferring the hierarchical roles of users interacting via the same online communication
medium. In both cases, we show that considering the temporal dimension of
interaction substantially improves the detection of relationships and roles.
The first focus of this thesis is on the problem of inferring implicit relationships
from interactions between users. Based on promising results obtained by standard
link-analysis methods such as PageRank and Rooted-PageRank (RPR), we introduce
three novel time-based approaches, \Time-F" based on a defined time function,
Filter and Refine (FiRe) which is a hybrid approach based on RPR and Time-F,
and Time-sensitive Rooted-PageRank (T-RPR) which applies RPR in a way that
takes into account the time-dimension of interactions in the process of detecting
hierarchical ties.
We experiment on two datasets, the Enron email dataset to infer managersubordinate
relationships from email exchanges, and a scientific publication coauthorship
dataset to detect PhD advisor-advisee relationships from paper co-authorships.
Our experiments demonstrate that time-based methods perform better in terms of
recall. In particular T-RPR turns out to be superior over most recent competitor
methods as well as all other approaches we propose.
The second focus of this thesis is examining the online communication behaviour
of users working on the same activity in order to identify the different hierarchical
roles played by the users. We propose two approaches. In the first approach, supervised
learning is used to train different classification algorithms. In the second
approach, we address the problem as a sequence classification problem. A novel
sequence classification framework is defined that generates time-dependent features based on frequent patterns at multiple levels of time granularity. Our framework is
a
exible technique for sequence classification to be applied in different domains.
We experiment on an educational dataset collected from an asynchronous communication
tool used by students to accomplish an underlying group project. Our
experimental findings show that the first supervised approach achieves the best mapping
of students to their roles when the individual attributes of the students, information
about the reply relationships among them as well as quantitative time-based
features are considered. Similarly, our multi-granularity pattern-based framework
shows competitive performance in detecting the students' roles. Both approaches
are significantly better than the baselines considered
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Markov Chain Ontology Analysis (MCOA)
<p>Abstract</p> <p>Background</p> <p>Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data.</p> <p>Results</p> <p>In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods.</p> <p>Conclusion</p> <p>A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches.</p
Recommender Systems
The ongoing rapid expansion of the Internet greatly increases the necessity
of effective recommender systems for filtering the abundant information.
Extensive research for recommender systems is conducted by a broad range of
communities including social and computer scientists, physicists, and
interdisciplinary researchers. Despite substantial theoretical and practical
achievements, unification and comparison of different approaches are lacking,
which impedes further advances. In this article, we review recent developments
in recommender systems and discuss the major challenges. We compare and
evaluate available algorithms and examine their roles in the future
developments. In addition to algorithms, physical aspects are described to
illustrate macroscopic behavior of recommender systems. Potential impacts and
future directions are discussed. We emphasize that recommendation has a great
scientific depth and combines diverse research fields which makes it of
interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports
- …