14 research outputs found
Learning Mixtures of Plackett-Luce Models with Features from Top- Orders
Plackett-Luce model (PL) is one of the most popular models for preference
learning. In this paper, we consider PL with features and its mixture models,
where each alternative has a vector of features, possibly different across
agents. Such models significantly generalize the standard PL, but are not as
well investigated in the literature. We extend mixtures of PLs with features to
models that generate top- and characterize their identifiability. We further
prove that when PL with features is identifiable, its MLE is consistent with a
strictly concave objective function under mild assumptions. Our experiments on
synthetic data demonstrate the effectiveness of MLE on PL with features with
tradeoffs between statistical efficiency and computational efficiency when
takes different values. For mixtures of PL with features, we show that an EM
algorithm outperforms MLE in MSE and runtime.Comment: 16 pages, 2 figure
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
Supervised Preference Models: Data and Storage, Methods, and Tools for Application
In this thesis, we present a variety of models commonly known as pairwise comparisons, discrete choice and learning to rank under one paradigm that we call preference models. We discuss these approaches together with the intention to show that these belong to the same family and show a unified notation to express these. We focus on supervised machine learning approaches to predict preferences, present existing approaches and identify gaps in the literature. We discuss reduction and aggregation, a key technique used in this field and identify that there are no existing guidelines for how to create probabilistic aggregations, which is a topic we begin exploring. We also identify that there are no machine learning interfaces in Python that can account well for hosting a variety of types of preference models and giving a seamless user experience when it comes to using commonly recurring concepts in preference models, specifically, reduction, aggregation and compositions of sequential decision making. Therefore, we present our idea of what such software should look like in Python and show the current state of the development of this package which we call skpref
A recommender system for scientific datasets and analysis pipelines
Scientific datasets and analysis pipelines are increasingly being shared
publicly in the interest of open science.
However, mechanisms are lacking to reliably identify which pipelines
and datasets can appropriately be used together. Given the increasing number of high-quality public datasets and
pipelines, this lack of clear compatibility threatens the
findability and reusability of these resources. We investigate
the feasibility of a collaborative filtering system to recommend pipelines
and datasets based on provenance records from previous executions.
We evaluate our system using datasets and pipelines extracted from the
Canadian Open Neuroscience Platform, a national initiative for open
neuroscience. The recommendations provided by our system (AUC) are
significantly better than chance and outperform recommendations made by
domain experts using their previous knowledge as well as pipeline and dataset descriptions (AUC). In particular, domain experts often neglect
low-level technical aspects of a pipeline-dataset interaction, such as the level of pre-processing, which are
captured by a provenance-based system. We conclude that provenance-based
pipeline and dataset recommenders are feasible and beneficial to
the sharing and usage of open-science resources. Future
work will focus on the collection of more
comprehensive provenance traces, and on deploying the system in production
Generalized vec trick for fast learning of pairwise kernel models
Pairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. In this work, we present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects. Specifically, we consider the standard, symmetric and anti-symmetric Kronecker product kernels, metric-learning, Cartesian, ranking, as well as linear, polynomial and Gaussian kernels. Recently, a O(nm + nq) time generalized vec trick algorithm, where n, m, and q denote the number of pairs, drugs and targets, was introduced for training kernel methods with the Kronecker product kernel. This was a significant improvement over previous O(n(2)) training methods, since in most real-world applications m, q << n. In this work we show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation. In the experiments, we demonstrate how the introduced approach allows scaling pairwise kernels to much larger data sets than previously feasible, and provide an extensive comparison of the kernels on a number of biological interaction prediction tasks
Proceedings - 29. Workshop Computational Intelligence, Dortmund, 28. - 29. November 2019
Dieser Tagungsband enthält die Beiträge des 29. Workshops Computational Intelligence. Die Schwerpunkte sind Methoden, Anwendungen und Tools für Fuzzy-Systeme, Künstliche Neuronale Netze, Evolutionäre Algorithmen und Data-Mining-Verfahren sowie der Methodenvergleich anhand von industriellen und Benchmark-Problemen
A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium
When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available