Search CORE

2,817 research outputs found

Machine Learning, Quantum Mechanics, and Chemical Compound Space

We review recent studies dealing with the generation of machine learning models of molecular and solid properties. The models are trained and validated using standard quantum chemistry results obtained for organic molecules and materials selected from chemical space at random

arXiv.org e-Print Archive

Crossref

edoc

Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery

Author: Liu Han
Wang Lie
Zhao Tuo
Publication venue
Publication date: 01/08/2015
Field of study

We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models. Compared with existing methods, CMR calibrates regularization for each regression task with respect to its noise level so that it simultaneously attains improved finite-sample performance and tuning insensitiveness. Theoretically, we provide sufficient conditions under which CMR achieves the optimal rate of convergence in parameter estimation. Computationally, we propose an efficient smoothed proximal gradient algorithm with a worst-case numerical rate of convergence \cO(1/\epsilon), where

\epsilon

is a pre-specified accuracy of the objective function value. We conduct thorough numerical simulations to illustrate that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR to solve a brain activity prediction problem and find that it is as competitive as a handcrafted model created by human experts. The R package \texttt{camel} implementing the proposed method is available on the Comprehensive R Archive Network \url{http://cran.r-project.org/web/packages/camel/}.Comment: Journal of Machine Learning Research, 201

arXiv.org e-Print Archive

Princeton University Open Access Repository

A fast semi-direct least squares algorithm for hierarchically block separable matrices

Author: Greengard Leslie
Ho Kenneth L.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2014
Field of study

We present a fast algorithm for linear least squares problems governed by hierarchically block separable (HBS) matrices. Such matrices are generally dense but data-sparse and can describe many important operators including those derived from asymptotically smooth radial kernels that are not too oscillatory. The algorithm is based on a recursive skeletonization procedure that exposes this sparsity and solves the dense least squares problem as a larger, equality-constrained, sparse one. It relies on a sparse QR factorization coupled with iterative weighted least squares methods. In essence, our scheme consists of a direct component, comprised of matrix compression and factorization, followed by an iterative component to enforce certain equality constraints. At most two iterations are typically required for problems that are not too ill-conditioned. For an

M \times N

HBS matrix with

M \geq N

having bounded off-diagonal block rank, the algorithm has optimal

\mathcal{O} (M + N)

complexity. If the rank increases with the spatial dimension as is common for operators that are singular at the origin, then this becomes

\mathcal{O} (M + N)

in 1D,

\mathcal{O} (M + N^{3/2})

in 2D, and

\mathcal{O} (M + N^{2})

in 3D. We illustrate the performance of the method on both over- and underdetermined systems in a variety of settings, with an emphasis on radial basis function approximation and efficient updating and downdating.Comment: 24 pages, 8 figures, 6 tables; to appear in SIAM J. Matrix Anal. App

arXiv.org e-Print Archive

CiteSeerX

Elastic net prefiltering for two class classification

Author: Chen Sheng
Harris Chris J.
Hong Xia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/07/2012
Field of study

A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model’s generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems

Central Archive at the University of Reading

CiteSeerX

Southampton (e-Prints Soton)