143,333 research outputs found
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A review of associative classification mining
Associative classification mining is a promising approach in data mining that utilizes the
association rule discovery techniques to construct classification systems, also known as
associative classifiers. In the last few years, a number of associative classification algorithms
have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms
employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule
evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative
classification techniques with regards to the above criteria. Finally, future directions in associative
classification, such as incremental learning and mining low-quality data sets, are also
highlighted in this paper
Efficient Regularized Least-Squares Algorithms for Conditional Ranking on Relational Data
In domains like bioinformatics, information retrieval and social network
analysis, one can find learning tasks where the goal consists of inferring a
ranking of objects, conditioned on a particular target object. We present a
general kernel framework for learning conditional rankings from various types
of relational data, where rankings can be conditioned on unseen data objects.
We propose efficient algorithms for conditional ranking by optimizing squared
regression and ranking loss functions. We show theoretically, that learning
with the ranking loss is likely to generalize better than with the regression
loss. Further, we prove that symmetry or reciprocity properties of relations
can be efficiently enforced in the learned models. Experiments on synthetic and
real-world data illustrate that the proposed methods deliver state-of-the-art
performance in terms of predictive power and computational efficiency.
Moreover, we also show empirically that incorporating symmetry or reciprocity
properties can improve the generalization performance
A kernel-based framework for learning graded relations from data
Driven by a large number of potential applications in areas like
bioinformatics, information retrieval and social network analysis, the problem
setting of inferring relations between pairs of data objects has recently been
investigated quite intensively in the machine learning community. To this end,
current approaches typically consider datasets containing crisp relations, so
that standard classification methods can be adopted. However, relations between
objects like similarities and preferences are often expressed in a graded
manner in real-world applications. A general kernel-based framework for
learning relations from data is introduced here. It extends existing approaches
because both crisp and graded relations are considered, and it unifies existing
approaches because different types of graded relations can be modeled,
including symmetric and reciprocal relations. This framework establishes
important links between recent developments in fuzzy set theory and machine
learning. Its usefulness is demonstrated through various experiments on
synthetic and real-world data.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Ultra accurate collaborative information filtering via directed user similarity
A key challenge of the collaborative filtering (CF) information filtering is
how to obtain the reliable and accurate results with the help of peers'
recommendation. Since the similarities from small-degree users to large-degree
users would be larger than the ones opposite direction, the large-degree users'
selections are recommended extensively by the traditional second-order CF
algorithms. By considering the users' similarity direction and the second-order
correlations to depress the influence of mainstream preferences, we present the
directed second-order CF (HDCF) algorithm specifically to address the challenge
of accuracy and diversity of the CF algorithm. The numerical results for two
benchmark data sets, MovieLens and Netflix, show that the accuracy of the new
algorithm outperforms the state-of-the-art CF algorithms. Comparing with the CF
algorithm based on random-walks proposed in the Ref.7, the average ranking
score could reach 0.0767 and 0.0402, which is enhanced by 27.3\% and 19.1\% for
MovieLens and Netflix respectively. In addition, the diversity, precision and
recall are also enhanced greatly. Without relying on any context-specific
information, tuning the similarity direction of CF algorithms could obtain
accurate and diverse recommendations. This work suggests that the user
similarity direction is an important factor to improve the personalized
recommendation performance.Comment: 6 pages, 4 figure
Probabilistic Models over Ordered Partitions with Application in Learning to Rank
This paper addresses the general problem of modelling and learning rank data
with ties. We propose a probabilistic generative model, that models the process
as permutations over partitions. This results in super-exponential
combinatorial state space with unknown numbers of partitions and unknown
ordering among them. We approach the problem from the discrete choice theory,
where subsets are chosen in a stagewise manner, reducing the state space per
each stage significantly. Further, we show that with suitable parameterisation,
we can still learn the models in linear time. We evaluate the proposed models
on the problem of learning to rank with the data from the recently held Yahoo!
challenge, and demonstrate that the models are competitive against well-known
rivals.Comment: 19 pages, 2 figure
- …