11 research outputs found
On Estimation of Conditional Modes Using Multiple Quantile Regressions
We propose an estimation method for the conditional mode when the
conditioning variable is high-dimensional. In the proposed method, we first
estimate the conditional density by solving quantile regressions multiple
times. We then estimate the conditional mode by finding the maximum of the
estimated conditional density. The proposed method has two advantages in that
it is computationally stable because it has no initial parameter dependencies,
and it is statistically efficient with a fast convergence rate. Synthetic and
real-world data experiments demonstrate the better performance of the proposed
method compared to other existing ones.Comment: 26 pages, 3 figure
Modal Regression based Atomic Representation for Robust Face Recognition
Representation based classification (RC) methods such as sparse RC (SRC) have
shown great potential in face recognition in recent years. Most previous RC
methods are based on the conventional regression models, such as lasso
regression, ridge regression or group lasso regression. These regression models
essentially impose a predefined assumption on the distribution of the noise
variable in the query sample, such as the Gaussian or Laplacian distribution.
However, the complicated noises in practice may violate the assumptions and
impede the performance of these RC methods. In this paper, we propose a modal
regression based atomic representation and classification (MRARC) framework to
alleviate such limitation. Unlike previous RC methods, the MRARC framework does
not require the noise variable to follow any specific predefined distributions.
This gives rise to the capability of MRARC in handling various complex noises
in reality. Using MRARC as a general platform, we also develop four novel RC
methods for unimodal and multimodal face recognition, respectively. In
addition, we devise a general optimization algorithm for the unified MRARC
framework based on the alternating direction method of multipliers (ADMM) and
half-quadratic theory. The experiments on real-world data validate the efficacy
of MRARC for robust face recognition.Comment: 10 pages, 9 figure
Quantile regression approach to conditional mode estimation
In this paper, we consider estimation of the conditional mode of an outcome
variable given regressors. To this end, we propose and analyze a
computationally scalable estimator derived from a linear quantile regression
model and develop asymptotic distributional theory for the estimator.
Specifically, we find that the pointwise limiting distribution is a scale
transformation of Chernoff's distribution despite the presence of regressors.
In addition, we consider analytical and subsampling-based confidence intervals
for the proposed estimator. We also conduct Monte Carlo simulations to assess
the finite sample performance of the proposed estimator together with the
analytical and subsampling confidence intervals. Finally, we apply the proposed
estimator to predicting the net hourly electrical energy output using Combined
Cycle Power Plant Data.Comment: This paper supersedes "On estimation of conditional modes using
multiple quantile regressions" (Hirofumi Ohta and Satoshi Hara,
arXiv:1712.08754
Kernel Selection for Modal Linear Regression: Optimal Kernel and IRLS Algorithm
Modal linear regression (MLR) is a method for obtaining a conditional mode
predictor as a linear model. We study kernel selection for MLR from two
perspectives: "which kernel achieves smaller error?" and "which kernel is
computationally efficient?". First, we show that a Biweight kernel is optimal
in the sense of minimizing an asymptotic mean squared error of a resulting MLR
parameter. This result is derived from our refined analysis of an asymptotic
statistical behavior of MLR. Secondly, we provide a kernel class for which
iteratively reweighted least-squares algorithm (IRLS) is guaranteed to
converge, and especially prove that IRLS with an Epanechnikov kernel terminates
in a finite number of iterations. Simulation studies empirically verified that
using a Biweight kernel provides good estimation accuracy and that using an
Epanechnikov kernel is computationally efficient. Our results improve MLR of
which existing studies often stick to a Gaussian kernel and modal EM algorithm
specialized for it, by providing guidelines of kernel selection.Comment: 7 pages, 4 figures, published in the proceedings of the 18th IEEE
International Conference on Machine Learning and Applications - ICMLA 201
Neural-Kernelized Conditional Density Estimation
Conditional density estimation is a general framework for solving various
problems in machine learning. Among existing methods, non-parametric and/or
kernel-based methods are often difficult to use on large datasets, while
methods based on neural networks usually make restrictive parametric
assumptions on the probability densities. Here, we propose a novel method for
estimating the conditional density based on score matching. In contrast to
existing methods, we employ scalable neural networks, but do not make explicit
parametric assumptions on densities. The key challenge in applying score
matching to neural networks is computation of the first- and second-order
derivatives of a model for the log-density. We tackle this challenge by
developing a new neural-kernelized approach, which can be applied on large
datasets with stochastic gradient descent, while the reproducing kernels allow
for easy computation of the derivatives needed in score matching. We show that
the neural-kernelized function approximator has universal approximation
capability and that our method is consistent in conditional density estimation.
We numerically demonstrate that our method is useful in high-dimensional
conditional density estimation, and compares favourably with existing methods.
Finally, we prove that the proposed method has interesting connections to two
probabilistically principled frameworks of representation learning: Nonlinear
sufficient dimension reduction and nonlinear independent component analysis
Learning with Correntropy-induced Losses for Regression with Mixture of Symmetric Stable Noise
In recent years, correntropy and its applications in machine learning have
been drawing continuous attention owing to its merits in dealing with
non-Gaussian noise and outliers. However, theoretical understanding of
correntropy, especially in the statistical learning context, is still limited.
In this study, within the statistical learning framework, we investigate
correntropy based regression in the presence of non-Gaussian noise or outliers.
Motivated by the practical way of generating non-Gaussian noise or outliers, we
introduce mixture of symmetric stable noise, which include Gaussian noise,
Cauchy noise, and their mixture as special cases, to model non-Gaussian noise
or outliers. We demonstrate that under the mixture of symmetric stable noise
assumption, correntropy based regression can learn the conditional mean
function or the conditional median function well without resorting to the
finite-variance or even the finite first-order moment condition on the noise.
In particular, for the above two cases, we establish asymptotic optimal
learning rates for correntropy based regression estimators that are
asymptotically of type . These results justify the
effectiveness of the correntropy based regression estimators in dealing with
outliers as well as non-Gaussian noise. We believe that the present study
completes our understanding towards correntropy based regression from a
statistical learning viewpoint, and may also shed some light on robust
statistical learning for regression
An implicit function learning approach for parametric modal regression
For multi-valued functions---such as when the conditional distribution on
targets given the inputs is multi-modal---standard regression approaches are
not always desirable because they provide the conditional mean. Modal
regression algorithms address this issue by instead finding the conditional
mode(s). Most, however, are nonparametric approaches and so can be difficult to
scale. Further, parametric approximators, like neural networks, facilitate
learning complex relationships between inputs and targets. In this work, we
propose a parametric modal regression algorithm. We use the implicit function
theorem to develop an objective, for learning a joint function over inputs and
targets. We empirically demonstrate on several synthetic problems that our
method (i) can learn multi-valued functions and produce the conditional modes,
(ii) scales well to high-dimensional inputs, and (iii) can even be more
effective for certain uni-modal problems, particularly for high-frequency
functions. We demonstrate that our method is competitive in a real-world modal
regression problem and two regular regression datasets.Comment: Accepted to NeurIPS 202
Robust modal regression with direct log-density derivative estimation
Modal regression is aimed at estimating the global mode (i.e., global
maximum) of the conditional density function of the output variable given input
variables, and has led to regression methods robust against heavy-tailed or
skewed noises. The conditional mode is often estimated through maximization of
the modal regression risk (MRR). In order to apply a gradient method for the
maximization, the fundamental challenge is accurate approximation of the
gradient of MRR, not MRR itself. To overcome this challenge, in this paper, we
take a novel approach of directly approximating the gradient of MRR. To
approximate the gradient, we develop kernelized and neural-network-based
versions of the least-squares log-density derivative estimator, which directly
approximates the derivative of the log-density without density estimation. With
direct approximation of the MRR gradient, we first propose a modal regression
method with kernels, and derive a new parameter update rule based on a
fixed-point method. Then, the derived update rule is theoretically proved to
have a monotonic hill-climbing property towards the conditional mode.
Furthermore, we indicate that our approach of directly approximating the
gradient is compatible with recent sophisticated stochastic gradient methods
(e.g., Adam), and then propose another modal regression method based on neural
networks. Finally, the superior performance of the proposed methods is
demonstrated on various artificial and benchmark datasets
Modal Regression based Structured Low-rank Matrix Recovery for Multi-view Learning
Low-rank Multi-view Subspace Learning (LMvSL) has shown great potential in
cross-view classification in recent years. Despite their empirical success,
existing LMvSL based methods are incapable of well handling view discrepancy
and discriminancy simultaneously, which thus leads to the performance
degradation when there is a large discrepancy among multi-view data. To
circumvent this drawback, motivated by the block-diagonal representation
learning, we propose Structured Low-rank Matrix Recovery (SLMR), a unique
method of effectively removing view discrepancy and improving discriminancy
through the recovery of structured low-rank matrix. Furthermore, recent
low-rank modeling provides a satisfactory solution to address data contaminated
by predefined assumptions of noise distribution, such as Gaussian or Laplacian
distribution. However, these models are not practical since complicated noise
in practice may violate those assumptions and the distribution is generally
unknown in advance. To alleviate such limitation, modal regression is elegantly
incorporated into the framework of SLMR (term it MR-SLMR). Different from
previous LMvSL based methods, our MR-SLMR can handle any zero-mode noise
variable that contains a wide range of noise, such as Gaussian noise, random
noise and outliers. The alternating direction method of multipliers (ADMM)
framework and half-quadratic theory are used to efficiently optimize MR-SLMR.
Experimental results on four public databases demonstrate the superiority of
MR-SLMR and its robustness to complicated noise.Comment: This article has been accepted by IEEE Transactions on Neural
Networks and Learning System
A Framework of Learning Through Empirical Gain Maximization
We develop in this paper a framework of empirical gain maximization (EGM) to
address the robust regression problem where heavy-tailed noise or outliers may
present in the response variable. The idea of EGM is to approximate the density
function of the noise distribution instead of approximating the truth function
directly as usual. Unlike the classical maximum likelihood estimation that
encourages equal importance of all observations and could be problematic in the
presence of abnormal observations, EGM schemes can be interpreted from a
minimum distance estimation viewpoint and allow the ignorance of those
observations. Furthermore, it is shown that several well-known robust nonconvex
regression paradigms, such as Tukey regression and truncated least square
regression, can be reformulated into this new framework. We then develop a
learning theory for EGM, by means of which a unified analysis can be conducted
for these well-established but not fully-understood regression approaches.
Resulting from the new framework, a novel interpretation of existing bounded
nonconvex loss functions can be concluded. Within this new framework, the two
seemingly irrelevant terminologies, the well-known Tukey's biweight loss for
robust regression and the triweight kernel for nonparametric smoothing, are
closely related. More precisely, it is shown that the Tukey's biweight loss can
be derived from the triweight kernel. Similarly, other frequently employed
bounded nonconvex loss functions in machine learning such as the truncated
square loss, the Geman-McClure loss, and the exponential squared loss can also
be reformulated from certain smoothing kernels in statistics. In addition, the
new framework enables us to devise new bounded nonconvex loss functions for
robust learning