256 research outputs found

    Efficient error and variance estimation for randomized matrix computations

    Full text link
    Randomized matrix algorithms have become workhorse tools in scientific computing and machine learning. To use these algorithms safely in applications, they should be coupled with posterior error estimates to assess the quality of the output. To meet this need, this paper proposes two diagnostics: a leave-one-out error estimator for randomized low-rank approximations and a jackknife resampling method to estimate the variance of the output of a randomized matrix computation. Both of these diagnostics are rapid to compute for randomized low-rank approximation algorithms such as the randomized SVD and Nystr\"om, and they provide useful information that can be used to assess the quality of the computed output and guide algorithmic parameter choices.Comment: 33 pages, 10 figures. v3: substantial rewrite of the paper with a new title and new leave-one-out error estimato

    A Framework for Statistical Inference via Randomized Algorithms

    Full text link
    Randomized algorithms, such as randomized sketching or projections, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs, leading to the problem of evaluating their accuracy. In this paper, we develop a statistical inference framework for quantifying the uncertainty of the outputs of randomized algorithms. We develop appropriate statistical methods -- sub-randomization, multi-run plug-in and multi-run aggregation inference -- by using multiple runs of the same randomized algorithm, or by estimating the unknown parameters of the limiting distribution. As an example, we develop methods for statistical inference for least squares parameters via random sketching using matrices with i.i.d.entries, or uniform partial orthogonal matrices. For this, we characterize the limiting distribution of estimators obtained via sketch-and-solve as well as partial sketching methods. The analysis of i.i.d. sketches uses a trigonometric interpolation argument to establish a differential equation for the limiting expected characteristic function and find the dependence on the kurtosis of the entries of the sketching matrix. The results are supported via a broad range of simulations

    DART-ID increases single-cell proteome coverage.

    Get PDF
    Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30-50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net

    Automated neural network-based instrument validation system

    Get PDF
    In a complex control process, instrument calibration is periodically performed to maintain the instruments within the calibration range, which assures proper control and minimizes down time. Instruments are usually calibrated under out-of-service conditions using manual calibration methods, which may cause incorrect calibration or equipment damage. Continuous in-service calibration monitoring of sensors and instruments will reduce unnecessary instrument calibrations, give operators more confidence in instrument measurements, increase plant efficiency or product quality, and minimize the possibility of equipment damage during unnecessary manual calibrations. In this dissertation, an artificial neural network (ANN)-based instrument calibration verification system is designed to achieve the on-line monitoring and verification goal for scheduling maintenance. Since an ANN is a data-driven model, it can learn the relationships among signals without prior knowledge of the physical model or process, which is usually difficult to establish for the complex hon-linear systems. Furthermore, the ANNs provide a noise-reduced estimate of the signal measurement. More importantly, since a neural network learns the relationships among signals, it can give an unfaulted estimate of a faulty signal based on information provided by other unfaulted signals; that is, provide a correct estimate of a faulty signal. This ANN-based instrument verification system is capable of detecting small degradations or drifts occurring in instrumentation, and preclude false control actions or system damage caused by instrument degradation. In this dissertation, an automated scheme of neural network construction is developed. Previously, the neural network structure design required extensive knowledge of neural networks. An automated design methodology was developed so that a network structure can be created without expert interaction. This validation system was designed to monitor process sensors plant-wide. Due to the large number of sensors to be monitored and the limited computational capability of an artificial neural network model, a variable grouping process was developed for dividing the sensor variables into small correlated groups which the neural networks can handle. A modification of a statistical method, called Beta method, as well as a principal component analysis (PCA)-based method of estimating the number of neural network hidden nodes was developed. Another development in this dissertation is the sensor fault detection method. The commonly used Sequential Probability Ratio Test (SPRT) continuously measures the likelihood ratio to statistically determine if there is any significant calibration change. This method requires normally distributed signals for correct operation. In practice, the signals deviate from the normal distribution causing problems for the SPRT. A modified SPRT (MSPRT) was developed to suppress the possible intermittent alarms initiated by spurious spikes in network prediction errors. These methods were applied to data from the Tennessee Valley Authority (TVA) fossil power plant Unit 9 for testing. The results show that the average detectable drift level is about 2.5% for instruments in the boiler system and about 1% in the turbine system of the Unit 9 system. Approximately 74% of the process instruments can be monitored using the methodologies developed in this dissertation

    A Multivariate Approach to Functional Neuro Modeling

    Get PDF
    This Ph.D. thesis, A Multivariate Approach to Functional Neuro Modeling, deals with the analysis and modeling of data from functional neuro imaging experiments. A multivariate dataset description is provided which facilitates efficient representation of typical datasets and, more importantly, provides the basis for a generalization theoretical framework relating model performance to model complexity and dataset size. Briefly summarized the major topics discussed in the thesis include: ffl An introduction of the representation of functional datasets by pairs of neuronal activity patterns and overall conditions governing the functional experiment, via associated micro- and macroscopic variables. The description facilitates an efficient microscopic re-representation, as well as a handle on the link between brain and behavior; the latter is obtained by hypothesizing variations in the micro- and macroscopic variables to be manifestations of an underlying system. ffl A review of two micros..
    • …
    corecore