16,452 research outputs found
Multiple Imputation Ensembles (MIE) for dealing with missing data
Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases
Choosing Attribute Weights for Item Dissimilarity using Clikstream Data with an Application to a Product Catalog Map
In content- and knowledge-based recommender systems often a measure of (dis)similarity between items is used. Frequently, this measure is based on the attributes of the items. However, which attributes are important for the users of the system remains an important question to answer. In this paper, we present an approach to determine attribute weights in a dissimilarity measure using clickstream data of an e-commerce website. Counted is how many times products are sold and based on this a Poisson regression model is estimated. Estimates of this model are then used to determine the attribute weights in the dissimilarity measure. We show an application of this approach on a product catalog of MP3 players provided by Compare Group, owner of the Dutch price comparison site http://www.vergelijk.nl, and show how the dissimilarity measure can be used to improve 2D product catalog visualizations.dissimilarity measure;attribute weights;clickstream data;comparison
Imagination Based Sample Construction for Zero-Shot Learning
Zero-shot learning (ZSL) which aims to recognize unseen classes with no
labeled training sample, efficiently tackles the problem of missing labeled
data in image retrieval. Nowadays there are mainly two types of popular methods
for ZSL to recognize images of unseen classes: probabilistic reasoning and
feature projection. Different from these existing types of methods, we propose
a new method: sample construction to deal with the problem of ZSL. Our proposed
method, called Imagination Based Sample Construction (IBSC), innovatively
constructs image samples of target classes in feature space by mimicking human
associative cognition process. Based on an association between attribute and
feature, target samples are constructed from different parts of various
samples. Furthermore, dissimilarity representation is employed to select
high-quality constructed samples which are used as labeled data to train a
specific classifier for those unseen classes. In this way, zero-shot learning
is turned into a supervised learning problem. As far as we know, it is the
first work to construct samples for ZSL thus, our work is viewed as a baseline
for future sample construction methods. Experiments on four benchmark datasets
show the superiority of our proposed method.Comment: Accepted as a short paper in ACM SIGIR 201
Learning Task Relatedness in Multi-Task Learning for Images in Context
Multimedia applications often require concurrent solutions to multiple tasks.
These tasks hold clues to each-others solutions, however as these relations can
be complex this remains a rarely utilized property. When task relations are
explicitly defined based on domain knowledge multi-task learning (MTL) offers
such concurrent solutions, while exploiting relatedness between multiple tasks
performed over the same dataset. In most cases however, this relatedness is not
explicitly defined and the domain expert knowledge that defines it is not
available. To address this issue, we introduce Selective Sharing, a method that
learns the inter-task relatedness from secondary latent features while the
model trains. Using this insight, we can automatically group tasks and allow
them to share knowledge in a mutually beneficial way. We support our method
with experiments on 5 datasets in classification, regression, and ranking tasks
and compare to strong baselines and state-of-the-art approaches showing a
consistent improvement in terms of accuracy and parameter counts. In addition,
we perform an activation region analysis showing how Selective Sharing affects
the learned representation.Comment: To appear in ICMR 2019 (Oral + Lightning Talk + Poster
'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems
An examination of object recognition challenge leaderboards (ILSVRC,
PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small
differences amongst themselves in terms of error rate/mAP. To better
differentiate the top performers, additional criteria are required. Moreover,
the (test) images, on which the performance scores are based, predominantly
contain fully visible objects. Therefore, `harder' test images, mimicking the
challenging conditions (e.g. occlusion) in which humans routinely recognize
objects, need to be utilized for benchmarking. To address the concerns
mentioned above, we make two contributions. First, we systematically vary the
level of local object-part content, global detail and spatial context in images
from PASCAL VOC 2010 to create a new benchmarking dataset dubbed PPSS-12.
Second, we propose an object-part based benchmarking procedure which quantifies
classifiers' robustness to a range of visibility and contextual settings. The
benchmarking procedure relies on a semantic similarity measure that naturally
addresses potential semantic granularity differences between the category
labels in training and test datasets, thus eliminating manual mapping. We use
our procedure on the PPSS-12 dataset to benchmark top-performing classifiers
trained on the ILSVRC-2012 dataset. Our results show that the proposed
benchmarking procedure enables additional differentiation among
state-of-the-art object classifiers in terms of their ability to handle missing
content and insufficient object detail. Given this capability for additional
differentiation, our approach can potentially supplement existing benchmarking
procedures used in object recognition challenge leaderboards.Comment: Extended version of our ACCV-2016 paper. Author formatting modifie
Mixed Tree and Spatial Representation of Dissimilarity Judgments
Whereas previous research has shown that either tree or spatial representations of dissimilarity judgments may be appropriate, focussing on the comparative fit at the aggregate level, we investigate whether there is heterogeneity among subjects in the extent to which their dissimilarity judgments are better represented by ultrametric tree or spatial multidimensional scaling models. We develop a mixture model for the analysis of dissimilarity data, that is formulated in a stochastic context, and entails a representation and a measurement model component. The latter involves distributional assumptions on the measurement error, and enables estimation by maximum likelihood. The representation component allows dissimilarity judgments to be represented either by a tree structure or by a spatial configuration, or a mixture of both. In order to investigate the appropriateness of tree versus spatial representations, the model is applied to twenty empirical data sets. We compare the fit of our model with that of aggregate tree and spatial models, as well as with mixtures of pure trees and mixtures of pure spaces, respectively. We formulate some empirical generalizations on the relative importance of tree versus spatial structures in representing dissimilarity judgments at the individual level.Multidimensional scaling;tree models;mixture models;dissimilarity judgments
- âŠ