10,173 research outputs found
Learning Task Relatedness in Multi-Task Learning for Images in Context
Multimedia applications often require concurrent solutions to multiple tasks.
These tasks hold clues to each-others solutions, however as these relations can
be complex this remains a rarely utilized property. When task relations are
explicitly defined based on domain knowledge multi-task learning (MTL) offers
such concurrent solutions, while exploiting relatedness between multiple tasks
performed over the same dataset. In most cases however, this relatedness is not
explicitly defined and the domain expert knowledge that defines it is not
available. To address this issue, we introduce Selective Sharing, a method that
learns the inter-task relatedness from secondary latent features while the
model trains. Using this insight, we can automatically group tasks and allow
them to share knowledge in a mutually beneficial way. We support our method
with experiments on 5 datasets in classification, regression, and ranking tasks
and compare to strong baselines and state-of-the-art approaches showing a
consistent improvement in terms of accuracy and parameter counts. In addition,
we perform an activation region analysis showing how Selective Sharing affects
the learned representation.Comment: To appear in ICMR 2019 (Oral + Lightning Talk + Poster
A Framework to Adjust Dependency Measure Estimates for Chance
Estimating the strength of dependency between two variables is fundamental
for exploratory analysis and many other applications in data mining. For
example: non-linear dependencies between two continuous variables can be
explored with the Maximal Information Coefficient (MIC); and categorical
variables that are dependent to the target class are selected using Gini gain
in random forests. Nonetheless, because dependency measures are estimated on
finite samples, the interpretability of their quantification and the accuracy
when ranking dependencies become challenging. Dependency estimates are not
equal to 0 when variables are independent, cannot be compared if computed on
different sample size, and they are inflated by chance on variables with more
categories. In this paper, we propose a framework to adjust dependency measure
estimates on finite samples. Our adjustments, which are simple and applicable
to any dependency measure, are helpful in improving interpretability when
quantifying dependency and in improving accuracy on the task of ranking
dependencies. In particular, we demonstrate that our approach enhances the
interpretability of MIC when used as a proxy for the amount of noise between
variables, and to gain accuracy when ranking variables during the splitting
procedure in random forests.Comment: In Proceedings of the 2016 SIAM International Conference on Data
Minin
Learning from the machine: interpreting machine learning algorithms for point- and extended- source classification
We investigate star-galaxy classification for astronomical surveys in the
context of four methods enabling the interpretation of black-box machine
learning systems. The first is outputting and exploring the decision boundaries
as given by decision tree based methods, which enables the visualization of the
classification categories. Secondly, we investigate how the Mutual Information
based Transductive Feature Selection (MINT) algorithm can be used to perform
feature pre-selection. If one would like to provide only a small number of
input features to a machine learning classification algorithm, feature
pre-selection provides a method to determine which of the many possible input
properties should be selected. Third is the use of the tree-interpreter package
to enable popular decision tree based ensemble methods to be opened,
visualized, and understood. This is done by additional analysis of the tree
based model, determining not only which features are important to the model,
but how important a feature is for a particular classification given its value.
Lastly, we use decision boundaries from the model to revise an already existing
method of classification, essentially asking the tree based method where
decision boundaries are best placed and defining a new classification method.
We showcase these techniques by applying them to the problem of star-galaxy
separation using data from the Sloan Digital Sky Survey (hereafter SDSS). We
use the output of MINT and the ensemble methods to demonstrate how more complex
decision boundaries improve star-galaxy classification accuracy over the
standard SDSS frames approach (reducing misclassifications by up to
). We then show how tree-interpreter can be used to explore how
relevant each photometric feature is when making a classification on an object
by object basis.Comment: 12 pages, 8 figures, 8 table
The equivalence of information-theoretic and likelihood-based methods for neural dimensionality reduction
Stimulus dimensionality-reduction methods in neuroscience seek to identify a
low-dimensional space of stimulus features that affect a neuron's probability
of spiking. One popular method, known as maximally informative dimensions
(MID), uses an information-theoretic quantity known as "single-spike
information" to identify this space. Here we examine MID from a model-based
perspective. We show that MID is a maximum-likelihood estimator for the
parameters of a linear-nonlinear-Poisson (LNP) model, and that the empirical
single-spike information corresponds to the normalized log-likelihood under a
Poisson model. This equivalence implies that MID does not necessarily find
maximally informative stimulus dimensions when spiking is not well described as
Poisson. We provide several examples to illustrate this shortcoming, and derive
a lower bound on the information lost when spiking is Bernoulli in discrete
time bins. To overcome this limitation, we introduce model-based dimensionality
reduction methods for neurons with non-Poisson firing statistics, and show that
they can be framed equivalently in likelihood-based or information-theoretic
terms. Finally, we show how to overcome practical limitations on the number of
stimulus dimensions that MID can estimate by constraining the form of the
non-parametric nonlinearity in an LNP model. We illustrate these methods with
simulations and data from primate visual cortex
A Factor-Adjusted Multiple Testing Procedure with Application to Mutual Fund Selection
In this article, we propose a factor-adjusted multiple testing (FAT)
procedure based on factor-adjusted p-values in a linear factor model involving
some observable and unobservable factors, for the purpose of selecting skilled
funds in empirical finance. The factor-adjusted p-values were obtained after
extracting the latent common factors by the principal component method. Under
some mild conditions, the false discovery proportion can be consistently
estimated even if the idiosyncratic errors are allowed to be weakly correlated
across units. Furthermore, by appropriately setting a sequence of threshold
values approaching zero, the proposed FAT procedure enjoys model selection
consistency. Extensive simulation studies and a real data analysis for
selecting skilled funds in the U.S. financial market are presented to
illustrate the practical utility of the proposed method. Supplementary
materials for this article are available online
- …