534 research outputs found
Exploiting Social Network Structure for Person-to-Person Sentiment Analysis
Person-to-person evaluations are prevalent in all kinds of discourse and
important for establishing reputations, building social bonds, and shaping
public opinion. Such evaluations can be analyzed separately using signed social
networks and textual sentiment analysis, but this misses the rich interactions
between language and social context. To capture such interactions, we develop a
model that predicts individual A's opinion of individual B by synthesizing
information from the signed social network in which A and B are embedded with
sentiment analysis of the evaluative texts relating A to B. We prove that this
problem is NP-hard but can be relaxed to an efficiently solvable hinge-loss
Markov random field, and we show that this implementation outperforms text-only
and network-only versions in two very different datasets involving
community-level decision-making: the Wikipedia Requests for Adminship corpus
and the Convote U.S. Congressional speech corpus
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
On the Troll-Trust Model for Edge Sign Prediction in Social Networks
In the problem of edge sign prediction, we are given a directed graph
(representing a social network), and our task is to predict the binary labels
of the edges (i.e., the positive or negative nature of the social
relationships). Many successful heuristics for this problem are based on the
troll-trust features, estimating at each node the fraction of outgoing and
incoming positive/negative edges. We show that these heuristics can be
understood, and rigorously analyzed, as approximators to the Bayes optimal
classifier for a simple probabilistic model of the edge labels. We then show
that the maximum likelihood estimator for this model approximately corresponds
to the predictions of a Label Propagation algorithm run on a transformed
version of the original social graph. Extensive experiments on a number of
real-world datasets show that this algorithm is competitive against
state-of-the-art classifiers in terms of both accuracy and scalability.
Finally, we show that troll-trust features can also be used to derive online
learning algorithms which have theoretical guarantees even when edges are
adversarially labeled.Comment: v5: accepted to AISTATS 201
Modeling Rare Interactions in Time Series Data Through Qualitative Change: Application to Outcome Prediction in Intensive Care Units
Many areas of research are characterised by the deluge of large-scale
highly-dimensional time-series data. However, using the data available for
prediction and decision making is hampered by the current lag in our ability to
uncover and quantify true interactions that explain the outcomes.We are
interested in areas such as intensive care medicine, which are characterised by
i) continuous monitoring of multivariate variables and non-uniform sampling of
data streams, ii) the outcomes are generally governed by interactions between a
small set of rare events, iii) these interactions are not necessarily definable
by specific values (or value ranges) of a given group of variables, but rather,
by the deviations of these values from the normal state recorded over time, iv)
the need to explain the predictions made by the model. Here, while numerous
data mining models have been formulated for outcome prediction, they are unable
to explain their predictions.
We present a model for uncovering interactions with the highest likelihood of
generating the outcomes seen from highly-dimensional time series data.
Interactions among variables are represented by a relational graph structure,
which relies on qualitative abstractions to overcome non-uniform sampling and
to capture the semantics of the interactions corresponding to the changes and
deviations from normality of variables of interest over time. Using the
assumption that similar templates of small interactions are responsible for the
outcomes (as prevalent in the medical domains), we reformulate the discovery
task to retrieve the most-likely templates from the data.Comment: 8 pages, 3 figures. Accepted for publication in the European
Conference of Artificial Intelligence (ECAI 2020
Graph Representation Learning in Biomedicine
Biomedical networks are universal descriptors of systems of interacting
elements, from protein interactions to disease networks, all the way to
healthcare systems and scientific knowledge. With the remarkable success of
representation learning in providing powerful predictions and insights, we have
witnessed a rapid expansion of representation learning techniques into
modeling, analyzing, and learning with such networks. In this review, we put
forward an observation that long-standing principles of networks in biology and
medicine -- while often unspoken in machine learning research -- can provide
the conceptual grounding for representation learning, explain its current
successes and limitations, and inform future advances. We synthesize a spectrum
of algorithmic approaches that, at their core, leverage graph topology to embed
networks into compact vector spaces, and capture the breadth of ways in which
representation learning is proving useful. Areas of profound impact include
identifying variants underlying complex traits, disentangling behaviors of
single cells and their effects on health, assisting in diagnosis and treatment
of patients, and developing safe and effective medicines
Finding the best not the most: Regularized loss minimization subgraph selection for graph classification
© 2015 Elsevier Ltd. All rights reserved. Classification on structure data, such as graphs, has drawn wide interest in recent years. Due to the lack of explicit features to represent graphs for training classification models, extensive studies have been focused on extracting the most discriminative subgraphs features from the training graph dataset to transfer graphs into vector data. However, such filter-based methods suffer from two major disadvantages: (1) the subgraph feature selection is separated from the model learning process, so the selected most discriminative subgraphs may not best fit the subsequent learning model, resulting in deteriorated classification results; (2) all these methods rely on users to specify the number of subgraph features K, and suboptimally specified K values often result in significantly reduced classification accuracy. In this paper, we propose a new graph classification paradigm which overcomes the above disadvantages by formulating subgraph feature selection as learning a K-dimensional feature space from an implicit and large subgraph space, with the optimal K value being automatically determined. To achieve the goal, we propose a regularized loss minimization-driven (RLMD) feature selection method for graph classification. RLMD integrates subgraph selection and model learning into a unified framework to find discriminative subgraphs with guaranteed minimum loss w.r.t. the objective function. To automatically determine the optimal number of subgraphs K from the exponentially large subgraph space, an effective elastic net and a subgradient method are proposed to derive the stopping criterion, so that K can be automatically obtained once RLMD converges. The proposed RLMD method enjoys gratifying property including proved convergence and applicability to various loss functions. Experimental results on real-life graph datasets demonstrate significant performance gain
- …