17,220 research outputs found
Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery
We propose a calibrated multivariate regression method named CMR for fitting
high dimensional multivariate regression models. Compared with existing
methods, CMR calibrates regularization for each regression task with respect to
its noise level so that it simultaneously attains improved finite-sample
performance and tuning insensitiveness. Theoretically, we provide sufficient
conditions under which CMR achieves the optimal rate of convergence in
parameter estimation. Computationally, we propose an efficient smoothed
proximal gradient algorithm with a worst-case numerical rate of convergence
\cO(1/\epsilon), where is a pre-specified accuracy of the
objective function value. We conduct thorough numerical simulations to
illustrate that CMR consistently outperforms other high dimensional
multivariate regression methods. We also apply CMR to solve a brain activity
prediction problem and find that it is as competitive as a handcrafted model
created by human experts. The R package \texttt{camel} implementing the
proposed method is available on the Comprehensive R Archive Network
\url{http://cran.r-project.org/web/packages/camel/}.Comment: Journal of Machine Learning Research, 201
Recommended from our members
Self-organizing linear output map (SOLO): An artificial neural network suitable for hydrologic modeling and analysis
Artificial neural networks (ANNs) can be useful in the prediction of hydrologic variables, such as streamflow, particularly when the underlying processes have complex nonlinear interrelationships. However, conventional ANN structures suffer from network training issues that significantly limit their widespread application. This paper presents a multivariate ANN procedure entitled self-organizing linear output map (SOLO), whose structure has been designed for rapid, precise, and inexpensive estimation of network structure/parameters and system outputs. More important, SOLO provides features that facilitate insight into the underlying processes, thereby extending its usefulness beyond forecast applications as a tool for scientific investigations. These characteristics are demonstrated using a classic rainfall-runoff forecasting problem. Various aspects of model performance are evaluated in comparison with other commonly used modeling approaches, including multilayer feedforward ANNs, linear time series modeling, and conceptual rainfall-runoff modeling
Data-driven Soft Sensors in the Process Industry
In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work
Integration of survey data and big observational data for finite population inference using mass imputation
Multiple data sources are becoming increasingly available for statistical
analyses in the era of big data. As an important example in finite-population
inference, we consider an imputation approach to combining a probability sample
with big observational data. Unlike the usual imputation for missing data
analysis, we create imputed values for the whole elements in the probability
sample. Such mass imputation is attractive in the context of survey data
integration (Kim and Rao, 2012). We extend mass imputation as a tool for data
integration of survey data and big non-survey data. The mass imputation methods
and their statistical properties are presented. The matching estimator of
Rivers (2007) is also covered as a special case. Variance estimation with
mass-imputed data is discussed. The simulation results demonstrate the proposed
estimators outperform existing competitors in terms of robustness and
efficiency
A comparative investigation of the combined effects of pre-processing, wavelength selection and regression methods on near infrared calibration model performance
Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant role in the calibration while wavelength selection plays a marginal role and the combination of certain pre-processing, wavelength selection, and nonlinear regression methods can achieve superior performance over traditional linear regression-based calibration
- …