12 research outputs found

    Learning Mixtures of Plackett-Luce Models with Features from Top-ll Orders

    Full text link
    Plackett-Luce model (PL) is one of the most popular models for preference learning. In this paper, we consider PL with features and its mixture models, where each alternative has a vector of features, possibly different across agents. Such models significantly generalize the standard PL, but are not as well investigated in the literature. We extend mixtures of PLs with features to models that generate top-ll and characterize their identifiability. We further prove that when PL with features is identifiable, its MLE is consistent with a strictly concave objective function under mild assumptions. Our experiments on synthetic data demonstrate the effectiveness of MLE on PL with features with tradeoffs between statistical efficiency and computational efficiency when ll takes different values. For mixtures of PL with features, we show that an EM algorithm outperforms MLE in MSE and runtime.Comment: 16 pages, 2 figure

    Supervised Preference Models: Data and Storage, Methods, and Tools for Application

    Get PDF
    In this thesis, we present a variety of models commonly known as pairwise comparisons, discrete choice and learning to rank under one paradigm that we call preference models. We discuss these approaches together with the intention to show that these belong to the same family and show a unified notation to express these. We focus on supervised machine learning approaches to predict preferences, present existing approaches and identify gaps in the literature. We discuss reduction and aggregation, a key technique used in this field and identify that there are no existing guidelines for how to create probabilistic aggregations, which is a topic we begin exploring. We also identify that there are no machine learning interfaces in Python that can account well for hosting a variety of types of preference models and giving a seamless user experience when it comes to using commonly recurring concepts in preference models, specifically, reduction, aggregation and compositions of sequential decision making. Therefore, we present our idea of what such software should look like in Python and show the current state of the development of this package which we call skpref

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    Statistical models and inference for dynamic networks

    Get PDF
    Dyadic data are ubiquitous and arise in the fields of biology, epidemiology, sociology, and many more. Such dyadic data are often best understood within the framework of networks. Network data can vary in many ways. For example, one might have binary or weighted networks, directed or undirected networks, and static or longitudinal networks. This last type of network, also called a dynamic network, is the focus of this work, with the goal of developing important tools and methodology for the analysis of dynamic networks. A general framework is developed for modeling dynamic networks via a latent space approach. Using a latent space approach to model such networks allows the researcher to model both the local and global structure of the network, inherently accounts for transitivity, and yields rich and meaningful visualization which can easily be interpreted for qualitative inference on the network. A Markov chain Monte Carlo (MCMC) estimation method within a Bayesian setting is presented. Several useful tools for the researcher arise from this estimation method. First, a method of predicting future relations, or edges, is given. Second, missing data can easily be incorporated into the model, obtaining a posterior probability of each missing edge. Third, a novel concept called nodal influence is introduced which describes how one actor can influence the edges of another actor. Detection of such nodal influence is given via computationally efficient posterior estimation. This model is shown to outperform the existing method, as well as being able to handle richer and more complex data than the existing method. The MCMC algorithm is made scalable by utilizing a log likelihood approximation proposed in the literature, slightly adapted to allow for missing data. Many of the dynamic networks that arise inherently have weighted edges. The latent space model is extended to handle a variety of types of weighted edges which arise. In particular, the model is extended to account for relational data that can be viewed as, conditioning on the latent actor positions, having come from an exponential family of distributions. An example is also given which demonstrates how, through data augmentation, a similar strategy can be employed when this is not the case. The log likelihood approximation method is then extended to make the MCMC algorithms scalable for weighted networks. Of particular interest is Newcomb's fraternity data, a network which captures the evolution and formation of a network beginning in its most nascent form and and ending at a stabilized form. The previous model is modified in two non-trivial ways; the first allows for the modeling of rank-order data, which does not fall into the broad categories of weighted network data given previously, and the second allows for the estimation of the evolution of the stability of the network. Next, it is shown how to use the uncertainties associated with the posterior estimation for subgroup detection and for determining the time at which these subgroups formed. Finally, the model parameters are used to find the association between individual stability and popularity. A longitudinal mixture model is described which can be used to make hard or soft clustering assignments for p-dimensional real valued data. This model accounts for temporal dependence of both the clustering assignment and the object to be clustered. Additionally, the model allows for covariates which may aid in explaining the clustering assignments. The solutions for implementing the generalized EM algorithm are presented. Recursive relationships are derived which allow the computational cost to grow linearly with time rather than exponentially. The latent space framework and the longitudinal clustering model are combined to perform community detection within dynamic network data, where the communities' characteristics are fixed but the membership of each community can evolve over time. This method can handle directed or undirected weighted dynamic network data. For community detection within directed or undirected binary networks, a novel model is given along with an efficient variational Bayes estimation algorithm. Both methods are shown to have better performance than using community detection methodology which does not borrow information across time

    Generalized vec trick for fast learning of pairwise kernel models

    Get PDF
    Pairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. In this work, we present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects. Specifically, we consider the standard, symmetric and anti-symmetric Kronecker product kernels, metric-learning, Cartesian, ranking, as well as linear, polynomial and Gaussian kernels. Recently, a O(nm + nq) time generalized vec trick algorithm, where n, m, and q denote the number of pairs, drugs and targets, was introduced for training kernel methods with the Kronecker product kernel. This was a significant improvement over previous O(n(2)) training methods, since in most real-world applications m, q << n. In this work we show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation. In the experiments, we demonstrate how the introduced approach allows scaling pairwise kernels to much larger data sets than previously feasible, and provide an extensive comparison of the kernels on a number of biological interaction prediction tasks

    Proceedings - 29. Workshop Computational Intelligence, Dortmund, 28. - 29. November 2019

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 29. Workshops Computational Intelligence. Die Schwerpunkte sind Methoden, Anwendungen und Tools für Fuzzy-Systeme, Künstliche Neuronale Netze, Evolutionäre Algorithmen und Data-Mining-Verfahren sowie der Methodenvergleich anhand von industriellen und Benchmark-Problemen

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods
    corecore