7 research outputs found

    Improving penalized logistic regression model with missing values in high-dimensional data

    Get PDF
    Analysis without adequate handling of missing values may lead to inconsistent and biased estimates. Despite multiple imputations becoming a widely used approach in handling missing data, manuscript researchers generally encounter missing data in their respective studies. In high-dimensional data, penalized regression is a popular technique for performing feature selection and coefficient estimation simultaneously. However, one of the most vital issues with high-dimensional data is that it often contains large quantities of missing data that common multiple imputation approaches may not work correctly. Therefore, this study uses imputations penalized regression models as an extension of the penalized methods to improve the performance and impute missing values in high-dimensional data. The method was applied to real-life highdimensional datasets for the different number of features, sample sizes, and missing dataset rates to evaluate its efficiency. The method was also compared with other existing imputation penalized methods for high-dimensional data. The comparative experimental results indicate that the proposed method outperforms its competitors by achieving higher sensitivity, specificity, and classification accuracy values

    Improving the diagnosis of breast cancer using regularized logistic regression with adaptive elastic net

    Get PDF
    Early diagnosis of breast cancer helps improve the patient's chance of survival. Therefore, cancer classification and feature selection are important research topics in medicine and biology. Recently, the adaptive elastic net was used effectively for feature-based cancer classification, allowing simultaneous feature selection and feature coefficient estimation. The adaptive elastic net basically employed elastic net estimates as the initial weight. Nevertheless, the elastic net estimator is inconsistent and biased in selecting features. Therefore, the regularized logistic regression with the adaptive elastic net (RLRAEN) was used to handle the inconsistency problem by employing the adjusted variances of features as weights within the L1- regularization of the elastic net model. The proposed method was applied to the Wisconsin Breast Cancer dataset of the UCI repository and compared to the other existing penalized methods that were also applied to the same dataset. Based on the experimental study, the RLRAEN was more efficient in terms of feature selection and classification accuracy than the other competing methods. Therefore, it can be concluded that RLRAEN is a better method in breast cancer classification

    Weighted L1-norm logistic regression for gene selection of microarray gene expression classification

    Get PDF
    The classification of cancer is a significant application of the DNA microarray data. Gene selection methods are ordinarily used handle the issue of high-dimensionality of microarray data to enable experts to diagnose and classify cancer with high accuracy. The penalized logistic regression (PLR) technique is usually used in the dimensionality reduction of the high-dimensional gene expression data sets to remove irrelevant and redundant predictors from the binary logistic regression model. One of the regularization techniques used to achieve this goal is the least absolute shrinkage and selection operator (Lasso). However, this technique has been criticized for being biased in the selection of genes. The adaptive Lasso was usually proposed by assigning an initial weight to each gene to address the selection bias. This paper is concerned with adapting PLR to improve its capability in classification and gene selection, in the sense of accuracy, by introducing the one-dimensional weighted Mahalanobis distance (1-DWM) for each gene as an initial weight inside L1-norm. By experiments, this proposed method, denoted by adaptive penalized logistic regression (APLR), gives more accurate results compared with other famous methods in this regard. The proposed method is applied to some real high-dimensional gene expression data sets in order to demonstrate its efficiency in terms of classification accuracy and selection of gene. Therefore, the proposed method could be utilized in other studies implementing gene selection in the area of classification of high dimensional cancer data sets

    Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data

    Get PDF
    Analysis without adequate handling of missing values may lead to inconsistent and biased estimates. Despite multiple imputations becoming a widely used approach in handling missing data, manuscript researchers generally encounter missing data in their respective studies. In high-dimensional data, penalized regression is a popular technique for performing feature selection and coefficient estimation simultaneously. However, one of the most vital issues with high-dimensional data is that it often contains large quantities of missing data that common multiple imputation approaches may not work correctly. Therefore, this study uses imputations penalized regression models as an extension of the penalized methods to improve the performance and impute missing values in high-dimensional data. The method was applied to real-life high-dimensional datasets for the different number of features, sample sizes, and missing dataset rates to evaluate its efficiency. The method was also compared with other existing imputation penalized methods for high-dimensional data. The comparative experimental results indicate that the proposed method outperforms its competitors by achieving higher sensitivity, specificity, and classification accuracy values

    Gene selection and classification of microarray gene expression data based on a new adaptive L1-norm elastic net penalty

    Get PDF
    The removal of irrelevant and insignificant genes has always been a major step in microarray data analysis. The application of gene selection methods in biological datasets has greatly increased, supporting expert systems in cancer diagnostic capability with high classification accuracy. Penalized logistic regression (PLR) using the elastic net (EN) has been widely used in high-dimensional cancer classification in recent years to estimate the gene coefficients and perform gene selection simultaneously. However, the EN estimator does not satisfy the oracle properties. This paper proposes the PLR using the adaptive elastic net (AEN), abbreviated as PLRAEN, to address the inconsistency. Our method employs the ratio (BWR) as an initial weight inside the L1-norm of the EN model. Several experiments were performed on a simulation study for a different number of predictor variables, sample sizes, and correlation coefficients and also on three public gene expression datasets to evaluate the effectiveness. Experimental results demonstrate that the proposed method consistently outperforms two other contemporary penalized methods regarding classification accuracy and the number of selected genes. Therefore, we conclude that PLRAEN is a better method to implement gene selection in the high-dimensional cancer classification field

    Thermally radiative bioconvective nanofluid flow on a wavy cylinder with buongiorno model: A sensitivity analysis using response surface methodology

    No full text
    The significant impact of nanotechnology has turned the ordinary into the excellent in the ever-changing environment of science and technology. Recent times have seen a dramatic transformation as advances in science and technology continue to push us into the domain of nanoscale innovation. Fluid flow over a range of geometries is involved in the physical processes of heat and mass transfer subject to the constraints. These phase change systems are also converted into the latest technology by improving the thermal conductivity of the fluids through the mixing of nanoparticles in the ordinary base fluid. This approach makes things finer and quicker from the perspective of efficiency and structure. It has many applications in various domains especially in nano-medicines, chemotherapy, microprocessors, refrigeration, and biotechnology. In this work, nanofluid flow through a wavy cylinder is considered in the existence of thermal radiation, activation energy, and motile microorganisms. The physical model is computationally solved by developing a system of PDE's (partial differential equations) and then transformed into ODE's (ordinary differential equations) by the smooth implementation of similarity variables. The resultant ODE's numerically treated by the bvp4c built package of MATLAB and get the required results. These results are discussed and graphically visualized in the analysis section. For the validation of results, the statistical approach is implemented on the acquired results and shows the fitted model, contour plots, surface plots, residual plots, and streamlines of the involved parameters and their impacts on the model. The impact of involving physical quantities on flow velocity, thermal, concentration, and microorganism's density profiles also discussed. From the results, it is noted that the velocity profile increases by increasing the counts of mixed convection. Thermal distribution enhanced due to boosting the values of thermophoresis and Brownian motion. The concentration of nanoparticles increased by increasing the magnetic field strength. The larger values of peclet number minimize the density of microorganisms. Skin friction coefficient is increased by around 28% and mass transport going to be increased by 36% due to the existence of microorganisms. The analysis of variance shows that our model is significant and the fitted summary also shows the fitness of model
    corecore