112 research outputs found

    The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial

    Full text link
    In this tutorial paper, we first define mean squared error, variance, covariance, and bias of both random variables and classification/predictor models. Then, we formulate the true and generalization errors of the model for both training and validation/test instances where we make use of the Stein's Unbiased Risk Estimator (SURE). We define overfitting, underfitting, and generalization using the obtained true and generalization errors. We introduce cross validation and two well-known examples which are KK-fold and leave-one-out cross validations. We briefly introduce generalized cross validation and then move on to regularization where we use the SURE again. We work on both 2\ell_2 and 1\ell_1 norm regularizations. Then, we show that bootstrap aggregating (bagging) reduces the variance of estimation. Boosting, specifically AdaBoost, is introduced and it is explained as both an additive model and a maximum margin model, i.e., Support Vector Machine (SVM). The upper bound on the generalization error of boosting is also provided to show why boosting prevents from overfitting. As examples of regularization, the theory of ridge and lasso regressions, weight decay, noise injection to input/weights, and early stopping are explained. Random forest, dropout, histogram of oriented gradients, and single shot multi-box detector are explained as examples of bagging in machine learning and computer vision. Finally, boosting tree and SVM models are mentioned as examples of boosting.Comment: 23 pages, 9 figure

    Impact of impurities on thermo-physical properties of CO2-rich systems : experimental and modelling

    Get PDF
    Numerous industrial and academic communities have directed their efforts into developing technologies for reducing the emission of CO2 in the atmosphere. Carbon dioxide capture and storage (CCS) is one of the most promising technologies that can eliminate/reduce global warming, helping the world to move towards a low-carbon society. The process comprises of the separation of CO2 from industrial sources, transport to a storage location and then long-term isolation from the atmosphere. CO2- rich pipelines are a key part of any carbon capture and storage projects. Modelling of these types of pipelines are challenging due to the lack of thermo-physical properties of CO2 in presence of impurities. As these properties, particularly density and viscosity, have a significant impact on the sizing of equipment, therefore, it is crucial to investigate the impact of different impurities on the thermo-physical properties of CO2- rich systems. Densities and viscosities of pure CO2, two CO2 – H2 binary systems (with 5 and with 10 mol% H2), and 6 multi-component mixtures (MIX 1 with 5 mol% impurity, MIX 2 with 10 mol % impurity, MIX 3 with 30 mol % impurity, MIX 4 with 50 mol % impurity, MIX 5 with 4 mol % impurity and MIX 6 with 30 mol % impurity) were measured at pressures ranging from 10 to 1,400 bar (1 to 140 MPa) and six different temperatures, 0, 10, 25, 50, 100, 150 °C (273.15, 283.15, 298.15, 323.15, 373.15 and 423.15 K) in the gas, liquid, and supercritical regions using an Anton Paar densitometer and capillary tube technique for density and viscosity measurements, respectively. The experimental density data then were applied to evaluate the models using CO2 correction volume, Peneloux shift parameter and original equation of states (PR and SRK). Also, the obtained viscosity data were employed to tune the correlative Lohrenz-Bray-Clark (LBC) and CO2-LBC models and to evaluate the predictive models. The predictive models in this work are based on corresponding states (CS) theory models. The “One reference fluid” corresponding states model is based on the approach developed by Pedersen et al. and modified for CO2-rich fluids; the “two reference fluids” corresponding states models are based on the model proposed by Aasberg-Petersen (CS2) and CO2-CS2 models. Two models based on the extended corresponding states (ECS) theory, SUPERTRAP and CO2-SUPERTRAP models were also tested. The densities of 95%CO2-5%H2S and 95%CO2-5%SO2 systems were measured continuously using a high temperature and pressure Vibrating Tube Densitometer (VTD), Anton Paar DMA 512 at pressures up to 400 bar (40 MPa) at five different temperatures, 0, 10, 25, 50 and 80 °C (273.15, 283.15, 298.15, 323.15 and 353.15 K) in the gas, liquid and supercritical regions at Mines Paristech, France. The experimental data then were used to evaluate the new CO2 volume correction model by comparing to the original PR and PR-Peneloux equations of state. A good understanding of vapour-solid / vapour-liquid-solid / liquid-solid equilibrium of CO2 and CO2-mixtures at low temperature is an important issue regarding the safety assessment of CO2 pipelines and the possibility of solid or ‘dry ice’ discharge during an accidental release or rapid decompression. The frost points of some of the above systems were measured using the SETARAM BT 2.15 calorimeter at various pressures
    corecore