180 research outputs found

    Model selection and error estimation

    Get PDF
    We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical {\sc vc} dimension, empirical {\sc vc} entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.Complexity regularization, model selection, error estimation, concentration of measure

    Structured Random Matrices

    Full text link
    Random matrix theory is a well-developed area of probability theory that has numerous connections with other areas of mathematics and its applications. Much of the literature in this area is concerned with matrices that possess many exact or approximate symmetries, such as matrices with i.i.d. entries, for which precise analytic results and limit theorems are available. Much less well understood are matrices that are endowed with an arbitrary structure, such as sparse Wigner matrices or matrices whose entries possess a given variance pattern. The challenge in investigating such structured random matrices is to understand how the given structure of the matrix is reflected in its spectral properties. This chapter reviews a number of recent results, methods, and open problems in this direction, with a particular emphasis on sharp spectral norm inequalities for Gaussian random matrices.Comment: 46 pages; to appear in IMA Volume "Discrete Structures: Analysis and Applications" (Springer

    PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers

    Get PDF
    The aim of this paper is to generalize the PAC-Bayesian theorems proved by Catoni in the classification setting to more general problems of statistical inference. We show how to control the deviations of the risk of randomized estimators. A particular attention is paid to randomized estimators drawn in a small neighborhood of classical estimators, whose study leads to control the risk of the latter. These results allow to bound the risk of very general estimation procedures, as well as to perform model selection

    Mirror Descent and Convex Optimization Problems With Non-Smooth Inequality Constraints

    Full text link
    We consider the problem of minimization of a convex function on a simple set with convex non-smooth inequality constraint and describe first-order methods to solve such problems in different situations: smooth or non-smooth objective function; convex or strongly convex objective and constraint; deterministic or randomized information about the objective and constraint. We hope that it is convenient for a reader to have all the methods for different settings in one place. Described methods are based on Mirror Descent algorithm and switching subgradient scheme. One of our focus is to propose, for the listed different settings, a Mirror Descent with adaptive stepsizes and adaptive stopping rule. This means that neither stepsize nor stopping rule require to know the Lipschitz constant of the objective or constraint. We also construct Mirror Descent for problems with objective function, which is not Lipschitz continuous, e.g. is a quadratic function. Besides that, we address the problem of recovering the solution of the dual problem

    Utility of multispectral imaging for nuclear classification of routine clinical histopathology imagery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We present an analysis of the utility of multispectral versus standard RGB imagery for routine H&E stained histopathology images, in particular for pixel-level classification of nuclei. Our multispectral imagery has 29 spectral bands, spaced 10 nm within the visual range of 420–700 nm. It has been hypothesized that the additional spectral bands contain further information useful for classification as compared to the 3 standard bands of RGB imagery. We present analyses of our data designed to test this hypothesis.</p> <p>Results</p> <p>For classification using all available image bands, we find the best performance (equal tradeoff between detection rate and false alarm rate) is obtained from either the multispectral or our "ccd" RGB imagery, with an overall increase in performance of 0.79% compared to the next best performing image type. For classification using single image bands, the single best multispectral band (in the red portion of the spectrum) gave a performance increase of 0.57%, compared to performance of the single best RGB band (red). Additionally, red bands had the highest coefficients/preference in our classifiers. Principal components analysis of the multispectral imagery indicates only two significant image bands, which is not surprising given the presence of two stains.</p> <p>Conclusion</p> <p>Our results indicate that multispectral imagery for routine H&E stained histopathology provides minimal additional spectral information for a pixel-level nuclear classification task than would standard RGB imagery.</p

    Maximum-Reward Motion in a Stochastic Environment: The Nonequilibrium Statistical Mechanics Perspective

    Get PDF
    We consider the problem of computing the maximum-reward motion in a reward field in an online setting. We assume that the robot has a limited perception range, and it discovers the reward field on the fly. We analyze the performance of a simple, practical lattice-based algorithm with respect to the perception range. Our main result is that, with very little perception range, the robot can collect as much reward as if it could see the whole reward field, under certain assumptions. Along the way, we establish novel connections between this class of problems and certain fundamental problems of nonequilibrium statistical mechanics . We demonstrate our results in simulation examples

    Performance of LoRa-WAN Sensors for Precision Livestock Tracking and Biosensing Applications

    Get PDF
    This study investigated the integration of Long Range Wide Area Network (LoRa WAN) communication technology and sensors for use as Internet of Things (IoT) platform for Precision Livestock-Farming (PLF) applications. The research was conducted at New Mexico State University’s Clayton Livestock Research Centre. The functionality of LoRA WAN communication technology and performance of LoRa WAN motion and GPS sensors were tested using static sensors that were placed either, a) outdoors and at incremental distances from the LoRa WAN gateway antenna (Field, n=6), or b) housed indoors and close to the same LoRa WAN gateway antenna (Indoor, n=5). Accelerometer data, reported as motion intensity index, and GPS location were acquired, transmitted and logged at 1 and 15 minute intervals, respectively. We evaluated the tracker\u27s GPS accuracy (GPSBias as the euclidean distance between the actual and projected tracker location) and variables associated with the tracker’s data transmission capabilities. The results indicate that field trackers had a greater accuracy for remote sensing of GPS locations compared to indoor trackers facing increasing communication interference to acquire satellite signals (GPSBias; 5.20 vs. 17.76 m; P\u3c 0.01). Overall, the trackers and deployments appeared to have a comparable GPS accuracy to other tracking devices and systems available in the market. The total data packets that were successfully transmitted were similar between the indoor and field trackers, but the number of data packets that were processed varied between the two deployments (P=0.02). Due to the static deployment of indoor and field trackers, activity data was almost non-existent for most devices. However, same trackers embedded on collars that were mounted on mature cattle showed clear diurnal patterns consistent with time budgets exerted by grazing cattle. The pilot testing of GPS and accelerometer sensors using LoRa WAN technology revealed reasonable sensor sensitivity and reliability for integration in PLF platforms

    Estimation in high dimensions: a geometric perspective

    Full text link
    This tutorial provides an exposition of a flexible geometric framework for high dimensional estimation problems with constraints. The tutorial develops geometric intuition about high dimensional sets, justifies it with some results of asymptotic convex geometry, and demonstrates connections between geometric results and estimation problems. The theory is illustrated with applications to sparse recovery, matrix completion, quantization, linear and logistic regression and generalized linear models.Comment: 56 pages, 9 figures. Multiple minor change
    • …
    corecore