12 research outputs found

    Discovering a taste for the unusual: exceptional models for preference mining

    Get PDF
    Exceptional preferences mining (EPM) is a crossover between two subfields of data mining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where some preference relations between labels significantly deviate from the norm. It is a variant of subgroup discovery, with rankings of labels as the target concept. We employ several quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes exceptional' varies with the quality measure: two measures look for exceptional overall ranking behavior, one measure indicates whether a particular label stands out from the rest, and a fourth measure highlights subgroups with unusual pairwise label ranking behavior. We explore a few datasets and compare with existing techniques. The results confirm that the new task EPM can deliver interesting knowledge.This research has received funding from the ECSEL Joint Undertaking, the framework programme for research and innovation Horizon 2020 (2014-2020) under Grant Agreement Number 662189-MANTIS-2014-1

    Exceptional Preferences Mining

    Get PDF
    Exceptional Preferences Mining (EPM) is a crossover between two subfields of datamining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where the preference relations between subsets of the labels significantly deviate from the norm; a variant of Subgroup Discovery, with rankings as the (complex) target concept. We employ three quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes 'exceptional' varies with the quality measure: the first gauges exceptional overall ranking behavior, the second indicates whether a particular label stands out from the rest, and the third highlights subgroups featuring unusual pairwise label ranking behavior. As proof of concept, we explore five datasets. The results confirm that the new task EPM can deliver interesting knowledge. The results also illustrate how the visualization of the preferences in a Preference Matrix can aid in interpreting exceptional preference subgroups

    Pattern Mining for Label Ranking

    Get PDF
    Preferences have always been present in many tasks in our daily lives. Buying the right car, choosing a suitable house or even deciding on the food to eat, are trivial examples of decisions that reveal information, explicitly or implicitly, about our preferences. The recent trend of collecting increasing amounts of data is also true for preference data. Extracting and modeling preferences can provide us with invaluable information about the choices of groups or individuals. In areas like e-commerce, which typically deal with decisions from thousands of users, the acquisition of preferences can be a difficult task. For these reasons, artificial intelligence (in particular, machine learning) methods have been increasingly important to the discovery and automatic learning of models about preferences. In this Ph.D. project, several approaches were analyzed and proposed to deal with the LR problem. Most of which has focused on pattern mining methods.Algorithms and the Foundations of Software technolog

    Pattern mining for label ranking

    Get PDF
    Preferences have always been present in many tasks in our daily lives. Buying the right car, choosing a suitable house or even deciding on the food to eat, are trivial examples of decisions that reveal information, explicitly or implicitly, about our preferences. The recent trend of collecting increasing amounts of data is also true for preference data. Extracting and modeling preferences can provide us with invaluable information about the choices of groups or individuals. In areas like e-commerce, which typically deal with decisions from thousands of users, the acquisition of preferences can be a difficult task. For these reasons, artificial intelligence (in particular, machine learning) methods have been increasingly important to the discovery and automatic learning of models about preferences. In this Ph.D. project, several approaches were analyzed and proposed to deal with the LR problem. Most of which has focused on pattern mining methods.Algorithms and the Foundations of Software technolog

    Preference Learning

    Get PDF
    This report documents the program and the outcomes of Dagstuhl Seminar 14101 “Preference Learning”. Preferences have recently received considerable attention in disciplines such as machine learning, knowledge discovery, information retrieval, statistics, social choice theory, multiple criteria decision making, decision under risk and uncertainty, operations research, and others. The motivation for this seminar was to showcase recent progress in these different areas with the goal of working towards a common basis of understanding, which should help to facilitate future synergies

    Mixture-Based Probabilistic Graphical Models for the Label Ranking Problem

    Get PDF
    The goal of the Label Ranking (LR) problem is to learn preference models that predict the preferred ranking of class labels for a given unlabeled instance. Different well-known machine learning algorithms have been adapted to deal with the LR problem. In particular, fine-tuned instance-based algorithms (e.g., k-nearest neighbors) and model-based algorithms (e.g., decision trees) have performed remarkably well in tackling the LR problem. Probabilistic Graphical Models (PGMs, e.g., Bayesian networks) have not been considered to deal with this problem because of the difficulty of modeling permutations in that framework. In this paper, we propose a Hidden Naive Bayes classifier (HNB) to cope with the LR problem. By introducing a hidden variable, we can design a hybrid Bayesian network in which several types of distributions can be combined: multinomial for discrete variables, Gaussian for numerical variables, and Mallows for permutations. We consider two kinds of probabilistic models: one based on a Naive Bayes graphical structure (where only univariate probability distributions are estimated for each state of the hidden variable) and another where we allow interactions among the predictive attributes (using a multivariate Gaussian distribution for the parameter estimation). The experimental evaluation shows that our proposals are competitive with the start-of-the-art algorithms in both accuracy and in CPU time requirements

    Review of Extreme Multilabel Classification

    Full text link
    Extreme multilabel classification or XML, is an active area of interest in machine learning. Compared to traditional multilabel classification, here the number of labels is extremely large, hence, the name extreme multilabel classification. Using classical one versus all classification wont scale in this case due to large number of labels, same is true for any other classifiers. Embedding of labels as well as features into smaller label space is an essential first step. Moreover, other issues include existence of head and tail labels, where tail labels are labels which exist in relatively smaller number of given samples. The existence of tail labels creates issues during embedding. This area has invited application of wide range of approaches ranging from bit compression motivated from compressed sensing, tree based embeddings, deep learning based latent space embedding including using attention weights, linear algebra based embeddings such as SVD, clustering, hashing, to name a few. The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.Comment: 46 pages, 13 figure

    Tackling the supervised label ranking problem by bagging weak learners

    Get PDF
    Preference learning is the branch of machine learning in charge of inducing preference models from data. In this paper we focus on the task known as label ranking problem, whose goal is to predict a ranking among the different labels the class variable can take. Our contribution is twofold: (i) taking as basis the tree-based algorithm LRT described in [1], we design weaker tree-based models which can be learnt more efficiently; and (ii) we show that bagging these weak learners improves not only the LRT algorithm, but also the state-of the-art one (IBLR [1]). Furthermore, the bagging algorithm which takes the weak LRT-based models as base classifiers is competitive in time with respect to LRT and IBLR methods. To check the good ness of our proposal, we conduct a broad experimental study over the standard benchmark used in the label ranking problem literature

    A Survey on Extreme Multi-label Learning

    Full text link
    Multi-label learning has attracted significant attention from both academic and industry field in recent decades. Although existing multi-label learning algorithms achieved good performance in various tasks, they implicitly assume the size of target label space is not huge, which can be restrictive for real-world scenarios. Moreover, it is infeasible to directly adapt them to extremely large label space because of the compute and memory overhead. Therefore, eXtreme Multi-label Learning (XML) is becoming an important task and many effective approaches are proposed. To fully understand XML, we conduct a survey study in this paper. We first clarify a formal definition for XML from the perspective of supervised learning. Then, based on different model architectures and challenges of the problem, we provide a thorough discussion of the advantages and disadvantages of each category of methods. For the benefit of conducting empirical studies, we collect abundant resources regarding XML, including code implementations, and useful tools. Lastly, we propose possible research directions in XML, such as new evaluation metrics, the tail label problem, and weakly supervised XML.Comment: A preliminary versio
    corecore