10,052 research outputs found
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Multi-Instance Multi-Label Learning
In this paper, we propose the MIML (Multi-Instance Multi-Label learning)
framework where an example is described by multiple instances and associated
with multiple class labels. Compared to traditional learning frameworks, the
MIML framework is more convenient and natural for representing complicated
objects which have multiple semantic meanings. To learn from MIML examples, we
propose the MimlBoost and MimlSvm algorithms based on a simple degeneration
strategy, and experiments show that solving problems involving complicated
objects with multiple semantic meanings in the MIML framework can lead to good
performance. Considering that the degeneration process may lose information, we
propose the D-MimlSvm algorithm which tackles MIML problems directly in a
regularization framework. Moreover, we show that even when we do not have
access to the real objects and thus cannot capture more information from real
objects by using the MIML representation, MIML is still useful. We propose the
InsDif and SubCod algorithms. InsDif works by transforming single-instances
into the MIML representation for learning, while SubCod works by transforming
single-label examples into the MIML representation for learning. Experiments
show that in some tasks they are able to achieve better performance than
learning the single-instances or single-label examples directly.Comment: 64 pages, 10 figures; Artificial Intelligence, 201
Clustering based on Random Graph Model embedding Vertex Features
Large datasets with interactions between objects are common to numerous
scientific fields (i.e. social science, internet, biology...). The interactions
naturally define a graph and a common way to explore or summarize such dataset
is graph clustering. Most techniques for clustering graph vertices just use the
topology of connections ignoring informations in the vertices features. In this
paper, we provide a clustering algorithm exploiting both types of data based on
a statistical model with latent structure characterizing each vertex both by a
vector of features as well as by its connectivity. We perform simulations to
compare our algorithm with existing approaches, and also evaluate our method
with real datasets based on hyper-textual documents. We find that our algorithm
successfully exploits whatever information is found both in the connectivity
pattern and in the features
Deriving Verb Predicates By Clustering Verbs with Arguments
Hand-built verb clusters such as the widely used Levin classes (Levin, 1993)
have proved useful, but have limited coverage. Verb classes automatically
induced from corpus data such as those from VerbKB (Wijaya, 2016), on the other
hand, can give clusters with much larger coverage, and can be adapted to
specific corpora such as Twitter. We present a method for clustering the
outputs of VerbKB: verbs with their multiple argument types, e.g.
"marry(person, person)", "feel(person, emotion)." We make use of a novel
low-dimensional embedding of verbs and their arguments to produce high quality
clusters in which the same verb can be in different clusters depending on its
argument type. The resulting verb clusters do a better job than hand-built
clusters of predicting sarcasm, sentiment, and locus of control in tweets
The Evolution of Wikipedia's Norm Network
Social norms have traditionally been difficult to quantify. In any particular
society, their sheer number and complex interdependencies often limit a
system-level analysis. One exception is that of the network of norms that
sustain the online Wikipedia community. We study the fifteen-year evolution of
this network using the interconnected set of pages that establish, describe,
and interpret the community's norms. Despite Wikipedia's reputation for
\textit{ad hoc} governance, we find that its normative evolution is highly
conservative. The earliest users create norms that both dominate the network
and persist over time. These core norms govern both content and interpersonal
interactions using abstract principles such as neutrality, verifiability, and
assume good faith. As the network grows, norm neighborhoods decouple
topologically from each other, while increasing in semantic coherence. Taken
together, these results suggest that the evolution of Wikipedia's norm network
is akin to bureaucratic systems that predate the information age.Comment: 22 pages, 9 figures. Matches published version. Data available at
http://bit.ly/wiki_nor
- …