22 research outputs found

    Multiple Testing and Variable Selection along Least Angle Regression's path

    Full text link
    In this article, we investigate multiple testing and variable selection using Least Angle Regression (LARS) algorithm in high dimensions under the Gaussian noise assumption. LARS is known to produce a piecewise affine solutions path with change points referred to as knots of the LARS path. The cornerstone of the present work is the expression in closed form of the exact joint law of K-uplets of knots conditional on the variables selected by LARS, namely the so-called post-selection joint law of the LARS knots. Numerical experiments demonstrate the perfect fit of our finding. Our main contributions are three fold. First, we build testing procedures on variables entering the model along the LARS path in the general design case when the noise level can be unknown. This testing procedures are referred to as the Generalized t-Spacing tests (GtSt) and we prove that they have exact non-asymptotic level (i.e., Type I error is exactly controlled). In that way, we extend a work from (Taylor et al., 2014) where the Spacing test works for consecutive knots and known variance. Second, we introduce a new exact multiple false negatives test after model selection in the general design case when the noise level can be unknown. We prove that this testing procedure has exact non-asymptotic level for general design and unknown noise level. Last, we give an exact control of the false discovery rate (FDR) under orthogonal design assumption. Monte-Carlo simulations and a real data experiment are provided to illustrate our results in this case. Of independent interest, we introduce an equivalent formulation of LARS algorithm based on a recursive function.Comment: 62 pages; new: FDR control and power comparison between Knockoff, FCD, Slope and our proposed method; new: the introduction has been revised and now present a synthetic presentation of the main results. We believe that this introduction brings new insists compared to previous version

    Theoretical Model Testing with Latent Variables

    Get PDF
    Topics include improving Average Variance Extracted (AVE), estimating a latent variable with only two indicators, and speeding up the weeding (itemizing) of a measure so it is consistent ( fits the data ), is valid and reliable, yet has more than 3 indicators

    Simultaneous mass estimation and class classification of scrap metals using deep learning

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksWhile deep learning has helped improve the performance of classification, object detection, and segmentation in recycling, its potential for mass prediction has not yet been explored. Therefore, this study proposes a system for mass prediction with and without feature extraction and selection, including principal component analysis (PCA). These feature extraction methods are evaluated on a combined Cast (C), Wrought (W) and Stainless Steel (SS) image dataset using state-of-the-art machine learning and deep learning algorithms for mass prediction. After that, the best mass prediction framework is combined with a DenseNet classifier, resulting in multiple outputs that perform both object classification and object mass prediction. The proposed architecture consists of a DenseNet neural network for classification and a backpropagation neural network (BPNN) for mass prediction, which uses up to 24 features extracted from depth images. The proposed method obtained 0.82 R2, 0.2 RMSE, and 0.28 MAE for the regression for mass prediction with a classification performance of 95% for the C&W test dataset using the DenseNet+BPNN+PCA model. The DenseNet+BPNN+None model without the selected feature (None) used for the CW&SS test data had a lower performance for both classification of 80% and the regression (0.71 R2, 0.31 RMSE, and 0.32 MAE). The presented method has the potential to improve the monitoring of the mass composition of waste streams and to optimize robotic and pneumatic sorting systems by providing a better understanding of the physical properties of the objects being sorted.Peer ReviewedPostprint (author's final draft

    Proceedings, MSVSCC 2019

    Get PDF
    Old Dominion University Department of Modeling, Simulation & Visualization Engineering (MSVE) and the Virginia Modeling, Analysis and Simulation Center (VMASC) held the 13th annual Modeling, Simulation & Visualization (MSV) Student Capstone Conference on April 18, 2019. The Conference featured student research and student projects that are central to MSV. Also participating in the conference were faculty members who volunteered their time to impart direct support to their students’ research, facilitated the various conference tracks, served as judges for each of the tracks, and provided overall assistance to the conference. Appreciating the purpose of the conference and working in a cohesive, collaborative effort, resulted in a successful symposium for everyone involved. These proceedings feature the works that were presented at the conference. Capstone Conference Chair: Dr. Yuzhong Shen Capstone Conference Student Chair: Daniel Pere

    Deep Learning Based Malware Classification Using Deep Residual Network

    Get PDF
    The traditional malware detection approaches rely heavily on feature extraction procedure, in this paper we proposed a deep learning-based malware classification model by using a 18-layers deep residual network. Our model uses the raw bytecodes data of malware samples, converting the bytecodes to 3-channel RGB images and then applying the deep learning techniques to classify the malwares. Our experiment results show that the deep residual network model achieved an average accuracy of 86.54% by 5-fold cross validation. Comparing to the traditional methods for malware classification, our deep residual network model greatly simplify the malware detection and classification procedures, it achieved a very good classification accuracy as well. The dataset we used in this paper for training and testing is Malimg dataset, one of the biggest malware datasets released by vision research lab of UCSB

    A Comparison of Methods to Measure Crop Water Use in South Carolina

    Get PDF
    The objective of this thesis was to compare cost-effective methods of measuring crop water use, known as evapotranspiration (ET), in South Carolina’s humid climate. The methods analyzed were the surface renewal method (SR), the Eddy Covariance method (EC), large in-field weighing lysimeters, a newly developed pressure differential device (PDD), a Class A Evaporation pan, and the Penman-Monteith equation. In the first chapter, ET measurements obtained by SR were compared to ET measured by EC and weighing lysimeters. For reference, EC and SR track the energy budget to estimate ET, while the weighing lysimeters used in this study are box-like containers measured continuously for mass changes attributed to water gained or lost. Great agreement was observed between the surface renewal and EC methods (R2≥0.89), while agreement was weak or inconsistent between the surface renewal method and lysimeters. In the second chapter, a PDD was designed, fabricated, and tested in its ability to measure ET. Despite the PDD and its neighboring weighing lysimeter showing agreement in profile moisture changes, inferred PDD ET measurements showed little agreement with the lysimeter (R2\u3c0.2). The PDD appeared to be affected by a delay in measuring rainfall, among other factors, in comparison to the lysimeter. The study suggests that the PDD may not suit ET measurement but could be useful for subsoil measurements in other fields of study. In the third chapter, the Penman-Monteith equation and a Class A Evaporation Pan were analyzed. The two methods measured reference evapotranspiration (ETo), and showed good agreement with each other (R2=0.95). The results of the ETo comparison were further used to develop pan coefficient values (Kp) and compare these to Kp values estimated from equations recommended by the Food and Agriculture Organization (FAO). No significant difference was found in the Kp comparison. The Penman-Monteith ETo measurements were then used a third time with weighing lysimeter data from the cotton field to develop a crop coefficient curve (Kc). The obtained Kc values were compared to FAO recommended Kc values, showing no significant difference. The study suggests that FAO recommendations for ETo measurement, Kp estimation, and Kc values do apply to South Carolina

    Multiple Testing and Variable Selection along Least Angle Regression's path

    No full text
    39 pages, 7 figuresIn this article we investigate the outcomes of the standard Least Angle Regression (LAR) algorithm in high dimensions under the Gaussian noise assumption. We give the exact law of the sequence of knots conditional on the sequence of variables entering the model, i.e., the post-selection law of the knots of the LAR. Based on this result, we prove an exact of the False Discovery Rate (FDR) in the orthogonal design case and an exact control of the existence of false negatives in the general design case. First, we build a sequence of testing procedures on the variables entering the model and we give an exact control of the FDR in the orthogonal design case when the noise level can be unknown. Second, we introduce a new exact testing procedure on the existence of false negatives when the noise level can be unknown. This testing procedure can be deployed after any support selection procedure that will produce an estimation of the support (i.e., the indexes of nonzero coefficients) for any designs. The type~II error of the test can be exactly controlled as long as the selection procedure follows some elementary hypotheses, referred to as admissible selection procedures. These support selection procedures are such that the estimation of the support is given by the kk first variables entering the model where the random variable kk is a stopping time. Monte-Carlo simulations and a real data experiment are provided to illustrate our results

    Multiple Testing and Variable Selection along Least Angle Regression's path

    No full text
    39 pages, 7 figuresInternational audienceIn this article we investigate the outcomes of the standard Least Angle Regression (LAR) algorithm in high dimensions under the Gaussian noise assumption. We give the exact law of the sequence of knots conditional on the sequence of variables entering the model, i.e., the post-selection law of the knots of the LAR. Based on this result, we prove an exact of the False Discovery Rate (FDR) in the orthogonal design case and an exact control of the existence of false negatives in the general design case. First, we build a sequence of testing procedures on the variables entering the model and we give an exact control of the FDR in the orthogonal design case when the noise level can be unknown. Second, we introduce a new exact testing procedure on the existence of false negatives when the noise level can be unknown. This testing procedure can be deployed after any support selection procedure that will produce an estimation of the support (i.e., the indexes of nonzero coefficients) for any designs. The type~II error of the test can be exactly controlled as long as the selection procedure follows some elementary hypotheses, referred to as admissible selection procedures. These support selection procedures are such that the estimation of the support is given by the kk first variables entering the model where the random variable kk is a stopping time. Monte-Carlo simulations and a real data experiment are provided to illustrate our results

    Multiple Testing and Variable Selection along Least Angle Regression's path

    No full text
    39 pages, 7 figuresIn this article we investigate the outcomes of the standard Least Angle Regression (LAR) algorithm in high dimensions under the Gaussian noise assumption. We give the exact law of the sequence of knots conditional on the sequence of variables entering the model, i.e., the post-selection law of the knots of the LAR. Based on this result, we prove an exact of the False Discovery Rate (FDR) in the orthogonal design case and an exact control of the existence of false negatives in the general design case. First, we build a sequence of testing procedures on the variables entering the model and we give an exact control of the FDR in the orthogonal design case when the noise level can be unknown. Second, we introduce a new exact testing procedure on the existence of false negatives when the noise level can be unknown. This testing procedure can be deployed after any support selection procedure that will produce an estimation of the support (i.e., the indexes of nonzero coefficients) for any designs. The type~II error of the test can be exactly controlled as long as the selection procedure follows some elementary hypotheses, referred to as admissible selection procedures. These support selection procedures are such that the estimation of the support is given by the kk first variables entering the model where the random variable kk is a stopping time. Monte-Carlo simulations and a real data experiment are provided to illustrate our results

    Predictive decoding of neural data

    Get PDF
    In the last five decades the number of techniques available for non-invasive functional imaging has increased dramatically. Researchers today can choose from a variety of imaging modalities that include EEG, MEG, PET, SPECT, MRI, and fMRI. This doctoral dissertation offers a methodology for the reliable analysis of neural data at different levels of investigation. By using statistical learning algorithms the proposed approach allows single-trial analysis of various neural data by decoding them into variables of interest. Unbiased testing of the decoder on new samples of the data provides a generalization assessment of decoding performance reliability. Through consecutive analysis of the constructed decoder\u27s sensitivity it is possible to identify neural signal components relevant to the task of interest. The proposed methodology accounts for covariance and causality structures present in the signal. This feature makes it more powerful than conventional univariate methods which currently dominate the neuroscience field. Chapter 2 describes the generic approach toward the analysis of neural data using statistical learning algorithms. Chapter 3 presents an analysis of results from four neural data modalities: extracellular recordings, EEG, MEG, and fMRI. These examples demonstrate the ability of the approach to reveal neural data components which cannot be uncovered with conventional methods. A further extension of the methodology, Chapter 4 is used to analyze data from multiple neural data modalities: EEG and fMRI. The reliable mapping of data from one modality into the other provides a better understanding of the underlying neural processes. By allowing the spatial-temporal exploration of neural signals under loose modeling assumptions, it removes potential bias in the analysis of neural data due to otherwise possible forward model misspecification. The proposed methodology has been formalized into a free and open source Python framework for statistical learning based data analysis. This framework, PyMVPA, is described in Chapter 5
    corecore