    On Estimation of Conditional Modes Using Multiple Quantile Regressions

    We propose an estimation method for the conditional mode when the conditioning variable is high-dimensional. In the proposed method, we first estimate the conditional density by solving quantile regressions multiple times. We then estimate the conditional mode by finding the maximum of the estimated conditional density. The proposed method has two advantages in that it is computationally stable because it has no initial parameter dependencies, and it is statistically efficient with a fast convergence rate. Synthetic and real-world data experiments demonstrate the better performance of the proposed method compared to other existing ones.Comment: 26 pages, 3 figure

    Modal Regression based Atomic Representation for Robust Face Recognition

    Representation based classification (RC) methods such as sparse RC (SRC) have shown great potential in face recognition in recent years. Most previous RC methods are based on the conventional regression models, such as lasso regression, ridge regression or group lasso regression. These regression models essentially impose a predefined assumption on the distribution of the noise variable in the query sample, such as the Gaussian or Laplacian distribution. However, the complicated noises in practice may violate the assumptions and impede the performance of these RC methods. In this paper, we propose a modal regression based atomic representation and classification (MRARC) framework to alleviate such limitation. Unlike previous RC methods, the MRARC framework does not require the noise variable to follow any specific predefined distributions. This gives rise to the capability of MRARC in handling various complex noises in reality. Using MRARC as a general platform, we also develop four novel RC methods for unimodal and multimodal face recognition, respectively. In addition, we devise a general optimization algorithm for the unified MRARC framework based on the alternating direction method of multipliers (ADMM) and half-quadratic theory. The experiments on real-world data validate the efficacy of MRARC for robust face recognition.Comment: 10 pages, 9 figure

    Quantile regression approach to conditional mode estimation

    In this paper, we consider estimation of the conditional mode of an outcome variable given regressors. To this end, we propose and analyze a computationally scalable estimator derived from a linear quantile regression model and develop asymptotic distributional theory for the estimator. Specifically, we find that the pointwise limiting distribution is a scale transformation of Chernoff's distribution despite the presence of regressors. In addition, we consider analytical and subsampling-based confidence intervals for the proposed estimator. We also conduct Monte Carlo simulations to assess the finite sample performance of the proposed estimator together with the analytical and subsampling confidence intervals. Finally, we apply the proposed estimator to predicting the net hourly electrical energy output using Combined Cycle Power Plant Data.Comment: This paper supersedes "On estimation of conditional modes using multiple quantile regressions" (Hirofumi Ohta and Satoshi Hara, arXiv:1712.08754

    Kernel Selection for Modal Linear Regression: Optimal Kernel and IRLS Algorithm

    Modal linear regression (MLR) is a method for obtaining a conditional mode predictor as a linear model. We study kernel selection for MLR from two perspectives: "which kernel achieves smaller error?" and "which kernel is computationally efficient?". First, we show that a Biweight kernel is optimal in the sense of minimizing an asymptotic mean squared error of a resulting MLR parameter. This result is derived from our refined analysis of an asymptotic statistical behavior of MLR. Secondly, we provide a kernel class for which iteratively reweighted least-squares algorithm (IRLS) is guaranteed to converge, and especially prove that IRLS with an Epanechnikov kernel terminates in a finite number of iterations. Simulation studies empirically verified that using a Biweight kernel provides good estimation accuracy and that using an Epanechnikov kernel is computationally efficient. Our results improve MLR of which existing studies often stick to a Gaussian kernel and modal EM algorithm specialized for it, by providing guidelines of kernel selection.Comment: 7 pages, 4 figures, published in the proceedings of the 18th IEEE International Conference on Machine Learning and Applications - ICMLA 201

    Neural-Kernelized Conditional Density Estimation

    Conditional density estimation is a general framework for solving various problems in machine learning. Among existing methods, non-parametric and/or kernel-based methods are often difficult to use on large datasets, while methods based on neural networks usually make restrictive parametric assumptions on the probability densities. Here, we propose a novel method for estimating the conditional density based on score matching. In contrast to existing methods, we employ scalable neural networks, but do not make explicit parametric assumptions on densities. The key challenge in applying score matching to neural networks is computation of the first- and second-order derivatives of a model for the log-density. We tackle this challenge by developing a new neural-kernelized approach, which can be applied on large datasets with stochastic gradient descent, while the reproducing kernels allow for easy computation of the derivatives needed in score matching. We show that the neural-kernelized function approximator has universal approximation capability and that our method is consistent in conditional density estimation. We numerically demonstrate that our method is useful in high-dimensional conditional density estimation, and compares favourably with existing methods. Finally, we prove that the proposed method has interesting connections to two probabilistically principled frameworks of representation learning: Nonlinear sufficient dimension reduction and nonlinear independent component analysis

    Learning with Correntropy-induced Losses for Regression with Mixture of Symmetric Stable Noise

    In recent years, correntropy and its applications in machine learning have been drawing continuous attention owing to its merits in dealing with non-Gaussian noise and outliers. However, theoretical understanding of correntropy, especially in the statistical learning context, is still limited. In this study, within the statistical learning framework, we investigate correntropy based regression in the presence of non-Gaussian noise or outliers. Motivated by the practical way of generating non-Gaussian noise or outliers, we introduce mixture of symmetric stable noise, which include Gaussian noise, Cauchy noise, and their mixture as special cases, to model non-Gaussian noise or outliers. We demonstrate that under the mixture of symmetric stable noise assumption, correntropy based regression can learn the conditional mean function or the conditional median function well without resorting to the finite-variance or even the finite first-order moment condition on the noise. In particular, for the above two cases, we establish asymptotic optimal learning rates for correntropy based regression estimators that are asymptotically of type O(n−1)\mathcal{O}(n^{-1}). These results justify the effectiveness of the correntropy based regression estimators in dealing with outliers as well as non-Gaussian noise. We believe that the present study completes our understanding towards correntropy based regression from a statistical learning viewpoint, and may also shed some light on robust statistical learning for regression

    An implicit function learning approach for parametric modal regression

    For multi-valued functions---such as when the conditional distribution on targets given the inputs is multi-modal---standard regression approaches are not always desirable because they provide the conditional mean. Modal regression algorithms address this issue by instead finding the conditional mode(s). Most, however, are nonparametric approaches and so can be difficult to scale. Further, parametric approximators, like neural networks, facilitate learning complex relationships between inputs and targets. In this work, we propose a parametric modal regression algorithm. We use the implicit function theorem to develop an objective, for learning a joint function over inputs and targets. We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions. We demonstrate that our method is competitive in a real-world modal regression problem and two regular regression datasets.Comment: Accepted to NeurIPS 202

    Robust modal regression with direct log-density derivative estimation

    Modal regression is aimed at estimating the global mode (i.e., global maximum) of the conditional density function of the output variable given input variables, and has led to regression methods robust against heavy-tailed or skewed noises. The conditional mode is often estimated through maximization of the modal regression risk (MRR). In order to apply a gradient method for the maximization, the fundamental challenge is accurate approximation of the gradient of MRR, not MRR itself. To overcome this challenge, in this paper, we take a novel approach of directly approximating the gradient of MRR. To approximate the gradient, we develop kernelized and neural-network-based versions of the least-squares log-density derivative estimator, which directly approximates the derivative of the log-density without density estimation. With direct approximation of the MRR gradient, we first propose a modal regression method with kernels, and derive a new parameter update rule based on a fixed-point method. Then, the derived update rule is theoretically proved to have a monotonic hill-climbing property towards the conditional mode. Furthermore, we indicate that our approach of directly approximating the gradient is compatible with recent sophisticated stochastic gradient methods (e.g., Adam), and then propose another modal regression method based on neural networks. Finally, the superior performance of the proposed methods is demonstrated on various artificial and benchmark datasets

    Modal Regression based Structured Low-rank Matrix Recovery for Multi-view Learning

    Low-rank Multi-view Subspace Learning (LMvSL) has shown great potential in cross-view classification in recent years. Despite their empirical success, existing LMvSL based methods are incapable of well handling view discrepancy and discriminancy simultaneously, which thus leads to the performance degradation when there is a large discrepancy among multi-view data. To circumvent this drawback, motivated by the block-diagonal representation learning, we propose Structured Low-rank Matrix Recovery (SLMR), a unique method of effectively removing view discrepancy and improving discriminancy through the recovery of structured low-rank matrix. Furthermore, recent low-rank modeling provides a satisfactory solution to address data contaminated by predefined assumptions of noise distribution, such as Gaussian or Laplacian distribution. However, these models are not practical since complicated noise in practice may violate those assumptions and the distribution is generally unknown in advance. To alleviate such limitation, modal regression is elegantly incorporated into the framework of SLMR (term it MR-SLMR). Different from previous LMvSL based methods, our MR-SLMR can handle any zero-mode noise variable that contains a wide range of noise, such as Gaussian noise, random noise and outliers. The alternating direction method of multipliers (ADMM) framework and half-quadratic theory are used to efficiently optimize MR-SLMR. Experimental results on four public databases demonstrate the superiority of MR-SLMR and its robustness to complicated noise.Comment: This article has been accepted by IEEE Transactions on Neural Networks and Learning System

    A Framework of Learning Through Empirical Gain Maximization

    We develop in this paper a framework of empirical gain maximization (EGM) to address the robust regression problem where heavy-tailed noise or outliers may present in the response variable. The idea of EGM is to approximate the density function of the noise distribution instead of approximating the truth function directly as usual. Unlike the classical maximum likelihood estimation that encourages equal importance of all observations and could be problematic in the presence of abnormal observations, EGM schemes can be interpreted from a minimum distance estimation viewpoint and allow the ignorance of those observations. Furthermore, it is shown that several well-known robust nonconvex regression paradigms, such as Tukey regression and truncated least square regression, can be reformulated into this new framework. We then develop a learning theory for EGM, by means of which a unified analysis can be conducted for these well-established but not fully-understood regression approaches. Resulting from the new framework, a novel interpretation of existing bounded nonconvex loss functions can be concluded. Within this new framework, the two seemingly irrelevant terminologies, the well-known Tukey's biweight loss for robust regression and the triweight kernel for nonparametric smoothing, are closely related. More precisely, it is shown that the Tukey's biweight loss can be derived from the triweight kernel. Similarly, other frequently employed bounded nonconvex loss functions in machine learning such as the truncated square loss, the Geman-McClure loss, and the exponential squared loss can also be reformulated from certain smoothing kernels in statistics. In addition, the new framework enables us to devise new bounded nonconvex loss functions for robust learning