2,741 research outputs found

    Algorithms for item categorization based on ordinal ranking data

    Full text link
    We present a new method for identifying the latent categorization of items based on their rankings. Complimenting a recent work that uses a Dirichlet prior on preference vectors and variational inference, we show that this problem can be effectively dealt with using existing community detection algorithms, with the communities corresponding to item categories. In particular we convert the bipartite ranking data to a unipartite graph of item affinities, and apply community detection algorithms. In this context we modify an existing algorithm - namely the label propagation algorithm to a variant that uses the distance between the nodes for weighting the label propagation - to identify the categories. We propose and analyze a synthetic ordinal ranking model and show its relation to the recently much studied stochastic block model. We test our algorithms on synthetic data and compare performance with several popular community detection algorithms. We also test the method on real data sets of movie categorization from the Movie Lens database. In all of the cases our algorithm is able to identify the categories for a suitable choice of tuning parameter.Comment: To appear in IEEE Allerton conference on computing, communications and control, 201

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    Ranking relations using analogies in biological and information networks

    Get PDF
    Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects S={A(1):B(1),A(2):B(2),,A(N):B(N)}\mathbf{S}=\{A^{(1)}:B^{(1)},A^{(2)}:B^{(2)},\ldots,A^{(N)}:B ^{(N)}\}, measures how well other pairs A:B fit in with the set S\mathbf{S}. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in S\mathbf{S}? Such questions are particularly relevant in information retrieval, where an investigator might want to search for analogous pairs of objects that match the query set of interest. There are many ways in which objects can be related, making the task of measuring analogies very challenging. Our approach combines a similarity measure on function spaces with Bayesian analysis to produce a ranking. It requires data containing features of the objects of interest and a link matrix specifying which relationships exist; no further attributes of such relationships are necessary. We illustrate the potential of our method on text analysis and information networks. An application on discovering functional interactions between pairs of proteins is discussed in detail, where we show that our approach can work in practice even if a small set of protein pairs is provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Extracting information from informal communication

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (leaves 89-93).This thesis focuses on the problem of extracting information from informal communication. Textual informal communication, such as e-mail, bulletin boards and blogs, has become a vast information resource. However, such information is poorly organized and difficult for a computer to understand due to lack of editing and structure. Thus, techniques which work well for formal text, such as newspaper articles, may be considered insufficient on informal text. One focus of ours is to attempt to advance the state-of-the-art for sub-problems of the information extraction task. We make contributions to the problems of named entity extraction, co-reference resolution and context tracking. We channel our efforts toward methods which are particularly applicable to informal communication. We also consider a type of information which is somewhat unique to informal communication: preferences and opinions. Individuals often expression their opinions on products and services in such communication. Others' may read these "reviews" to try to predict their own experiences. However, humans do a poor job of aggregating and generalizing large sets of data. We develop techniques that can perform the job of predicting unobserved opinions.(cont.) We address both the single-user case where information about the items is known, and the multi-user case where we can generalize opinions without external information. Experiments on large-scale rating data sets validate our approach.by Jason D.M. Rennie.Ph.D

    Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs

    Full text link
    Crowdsourcing platforms are now extensively used for conducting subjective pairwise comparison studies. In this setting, a pairwise comparison dataset is typically gathered via random sampling, either \emph{with} or \emph{without} replacement. In this paper, we use tools from random graph theory to analyze these two random sampling methods for the HodgeRank estimator. Using the Fiedler value of the graph as a measurement for estimator stability (informativeness), we provide a new estimate of the Fiedler value for these two random graph models. In the asymptotic limit as the number of vertices tends to infinity, we prove the validity of the estimate. Based on our findings, for a small number of items to be compared, we recommend a two-stage sampling strategy where a greedy sampling method is used initially and random sampling \emph{without} replacement is used in the second stage. When a large number of items is to be compared, we recommend random sampling with replacement as this is computationally inexpensive and trivially parallelizable. Experiments on synthetic and real-world datasets support our analysis
    corecore