Search CORE

960 research outputs found

Dissimilarity-based Ensembles for Multiple Instance Learning

Author: Cheplygina Veronika
Loog Marco
Tax David M. J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

In multiple instance learning, objects are sets (bags) of feature vectors (instances) rather than individual feature vectors. In this paper we address the problem of how these bags can best be represented. Two standard approaches are to use (dis)similarities between bags and prototype bags, or between bags and prototype instances. The first approach results in a relatively low-dimensional representation determined by the number of training bags, while the second approach results in a relatively high-dimensional representation, determined by the total number of instances in the training set. In this paper a third, intermediate approach is proposed, which links the two approaches and combines their strengths. Our classifier is inspired by a random subspace ensemble, and considers subspaces of the dissimilarity space, defined by subsets of instances, as prototypes. We provide guidelines for using such an ensemble, and show state-of-the-art performances on a range of multiple instance learning problems.Comment: Submitted to IEEE Transactions on Neural Networks and Learning Systems, Special Issue on Learning in Non-(geo)metric Space

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Supervised Classification: Quite a Brief Overview

Author: Aizerman
Ben-David
Besag
Beygelzimer
Bishop
Boser
Bottou
Bradley
Braga-Neto
Breiman
Breiman
Carbonneau
Chandola
Chapelle
Cheplygina
Chow
Christianini
Cohen
Cohn
Cortes
Cortes
Cover
Devroye
Dietterich
Dietterich
Dubuisson
Duda
Duda
Duin
Duin
Duin
Duin
Dwork
Dwork
Efron
Efron
Efron
Fanelli
Fawcett
Fedorov
Fisher
Fix
Freund
Fu
Galar
Geman
Girosi
Guyon
Hand
Hand
Hastie
Hinton
Ho
Ho
Hoerl
Hoffgen
Ioannidis
Isaksson
Jahrer
Jain
Jain
Kahneman
Krijthe
Kuncheva
Lachenbruch
Lafferty
Landgrebe
Langley
Lavrač
Leek
Levine
Li
Li
Li
Li
Little
Loog
Loog
Loog
Loog
Loog
Markou
Maron
McLachlan
Minka
Moonesinghe
Nair
Niemeijer
Nissen
Pan
Poggio
Polikar
Provost
Pękalska
Pękalska
Pękalska
Quinlan
Quiñonero-Candela
Rasmussen
Ripley
Rosenblatt
Rubinstein
Schaffer
Schiavo
Schmidhuber
Schölkopf
Schölkopf
Schölkopf
Settles
Shrivastava
Smola
Suykens
Tax
Tibshirani
Vapnik
Wahba
Wahba
Wahba
Wald
White
Wolpert
Wolpert
Wolpert
Yang
Zhou
Zhu
Publication venue
Publication date: 25/10/2017
Field of study

The original problem of supervised classification considers the task of automatically assigning objects to their respective classes on the basis of numerical measurements derived from these objects. Classifiers are the tools that implement the actual functional mapping from these measurements---also called features or inputs---to the so-called class label---or output. The fields of pattern recognition and machine learning study ways of constructing such classifiers. The main idea behind supervised methods is that of learning from examples: given a number of example input-output relations, to what extent can the general mapping be learned that takes any new and unseen feature vector to its correct class? This chapter provides a basic introduction to the underlying ideas of how to come to a supervised classification problem. In addition, it provides an overview of some specific classification techniques, delves into the issues of object representation and classifier evaluation, and (very) briefly covers some variations on the basic supervised classification task that may also be of interest to the practitioner

arXiv.org e-Print Archive

Crossref

Multiple Instance Learning: A Survey of Problem Characteristics and Applications

Author: Carbonneau Marc-André
Cheplygina Veronika
Gagnon Ghyslain
Granger Eric
Publication venue: 'Elsevier BV'
Publication date: 10/12/2016
Field of study

Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

arXiv.org e-Print Archive

Distribution-Dissimilarities in Machine Learning

Author: Simon-Gabriel Carl-Johann
Publication venue: Universität Tübingen
Publication date: 01/01/2018
Field of study

Any binary classifier (or score-function) can be used to define a dissimilarity between two distributions. Many well-known distribution-dissimilarities are actually classifier-based: total variation, KL- or JS-divergence, Hellinger distance, etc. And many recent popular generative modeling algorithms compute or approximate these distribution-dissimilarities by explicitly training a classifier: e.g. generative adversarial networks (GAN) and their variants. This thesis introduces and studies such classifier-based distribution-dissimilarities. After a general introduction, the first part analyzes the influence of the classifiers' capacity on the dissimilarity's strength for the special case of maximum mean discrepancies (MMD) and provides applications. The second part studies applications of classifier-based distribution-dissimilarities in the context of generative modeling and presents two new algorithms: Wasserstein Auto-Encoders (WAE) and AdaGAN. The third and final part focuses on adversarial examples, i.e. targeted but imperceptible input-perturbations that lead to drastically different predictions of an artificial classifier. It shows that adversarial vulnerability of neural network based classifiers typically increases with the input-dimension, independently of the network topology

Publikationsserver der Universität Tübingen

MPG.PuRe

A CLUE for CLUster Ensembles

Author: Kurt Hornik
Publication venue
Publication date
Field of study

Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including methods for measuring proximity and obtaining consensus and "secondary" clusterings.

Research Papers in Economics

Machine learning in dam water research: an overview of applications and approaches

Author: Bashirah Mohd Fazli
Farashazillah Yahya
Hasimi Sallehudin
Izham Jaya
Publication venue: 'The World Academy of Research in Science and Engineering'
Publication date: 01/01/2020
Field of study

Dam plays a crucial role in water security. A sustainable dam intends to balance a range of resources involves within a dam operation. Among the factors to maintain sustainability is to maintain and manage the water assets in dams. Water asset management in dams includes a process to ensure the planned maintenance can be conducted and assets such as pipes, pumps and motors can be mended, substituted, or upgraded when needed within the allocated budgetary. Nowadays, most water asset management systems collect and process data for data analysis and decision-making. Machine learning (ML) is an emerging concept applied to fulfill the requirement in engineering applications such as dam water researches. ML can analyze vast volumes of data and through an ML model built from algorithms, ML can learn, recognize and produce accurate results and analysis. The result brings meaningful insights for water asset management specifically to strategize the optimal solution based on the forecast or prediction. For example, a preventive maintenance for replacing water assets according to the prediction from the ML model. We will discuss the approaches of machine learning in recent dam water research and review the emerging issues to manage water assets in dams in this paper

UMS Institutional Repository

Machine Learning (ML) module

Author: Barceló Ordinas José María
Ferrer Cid Pau
García Vidal Jorge
Publication venue: Universitat Politècnica de Catalunya
Publication date: 27/06/2023
Field of study

Lectures notes of the machine learning content of the course TOML (Topics on Optimization and Machine Learning) at Master in Innovation and Research in Informatics (MIRI) at FIB, UPC.2023/202

UPCommons. Portal del coneixement obert de la UPC

Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data

Author: Bianchi Filippo Maria
Jenssen Robert
Mikalsen Karl Øyvind
Soguero-Ruiz Cristina
Publication venue
Publication date: 01/01/2017
Field of study

Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust \emph{time series cluster kernel} (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data.Comment: 23 pages, 6 figure

arXiv.org e-Print Archive

Munin - Open Research Archive

NORA - Norwegian Open Research Archives