9 research outputs found

    A modified memetic algorithm with an application to gene selection in a sheep body weight study

    Get PDF
    Selecting the minimal best subset out of a huge number of factors for influencing the response is a fundamental and very challenging NP-hard problem because the presence of many redundant genes results in over-fitting easily while missing an important gene can more detrimental impact on predictions, and computation is prohibitive for exhaust search. We propose a modified memetic algorithm (MA) based on an improved splicing method to overcome the problems in the traditional genetic algorithm exploitation capability and dimension reduction in the predictor variables. The new algorithm accelerates the search in identifying the minimal best subset of genes by incorporating it into the new local search operator and hence improving the splicing method. The improvement is also due to another two novel aspects: (a) updating subsets of genes iteratively until the no more reduction in the loss function by splicing and increasing the probability of selecting the true subsets of genes; and (b) introducing add and del operators based on backward sacrifice into the splicing method to limit the size of gene subsets. Additionally, according to the experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms. Moreover, the mutation operator is replaced by it to enhance exploitation capability and initial individuals are improved by it to enhance efficiency of search. A dataset of the body weight of Hu sheep was used to evaluate the superiority of the modified MA against the genetic algorithm. According to our experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including the most advanced adaptive best-subset selection algorithm

    A working likelihood approach for robust regression

    Get PDF
    Robust approach is often desirable in presence of outliers for more efficient parameter estimation. However, the choice of the regularization parameter value impacts the efficiency of the parameter estimators. To maximize the estimation efficiency, we construct a likelihood function for simultaneously estimating the regression parameters and the tuning parameter. The “working” likelihood function is deemed as a vehicle for efficient regression parameter estimation, because we do not assume the data are generated from this likelihood function. The proposed method can effectively find a value of the regularization parameter based on the extent of contamination in the data. We carry out extensive simulation studies in a variety of cases to investigate the performance of the proposed method. The simulation results show that the efficiency can be enhanced as much as 40% when the data follow a heavy-tailed distribution, and reaches as high as 468% for the heteroscedastic variance cases compared to the traditional Huber’s method with a fixed regularization parameter. For illustration, we also analyzed two datasets: one from a diabetics study and the other from a mortality study

    Parameter estimation for univariate Skew-Normal distribution based on the modified empirical characteristic function

    No full text
    Parameter estimation for the skew-normal distribution is challenging, since the profile likelihood function of shape parameter has a stationary point at zero, which hampers the use of traditional methods, such as maximum likelihood method. We present a modified empirical characteristic function method to perform parameter estimation for the skew-normal distribution. The proposed approach is flexible and easy to implement. We show that the estimators converge to the true values in probability. The simulation study and data analysis suggest that the proposed method performs well, even for the case of small sample size.</p

    Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis

    No full text
    New technologies have produced increasingly complex and massive datasets, such as next generation sequencing and microarray data in biology, dynamic treatment regimes in clinical trials and long-term wide-scale studies in the social sciences. Each study exhibits its unique data structure within individuals, clusters and possibly across time and space. In order to draw valid conclusion from such large dimensional data, we must account for intracluster correlations, varying cluster sizes, and outliers in response and/or covariate domains to achieve valid and efficient inferences. A weighted rank-based method is proposed for selecting variables and estimating parameters simultaneously. The main contribution of the proposed method is four fold: (1) variable selection using adaptive lasso is extended to robust rank regression so that protection against outliers in both response and predictor variables is obtained; (2) within-subject correlations are incorporated so that efficiency of parameter estimation is improved; (3) the computation is convenient via the existing function in statistical software R. (4) the proposed method is proved to have desirable asymptotic properties for fixed number of covariates (p). Simulation studies are carried out to evaluate the proposed method for a number of scenarios including the cases when p equals to the number of subjects. The simulation results indicate that the proposed method is efficient and robust. A hormone dataset is analyzed for illustration. By adding additional redundant variables as covariates, the penalty approach and weighting schemes are proven to be effective.</p

    Bias reduction in the two-stage method for degradation data analysis

    Get PDF
    Degradation data are usually collected for assessing the reliability of the product. We propose a new two-stage method to analyze degradation data. The degradation path is fitted by the nonlinear mixed effects model in the first stage, and the parameters in lifetime distribution are estimated by maximizing the asymptotic marginal distribution of pseudo lifetimes in the second stage. The new method has many advantages: (i) it does not require the distributions on random effects, (ii) the historical information about lifetime distribution of the product can be incorporated easily, and thus the estimated lifetime distribution has a closed form, (iii) bias correction term is automatically embedded into the asymptotic marginal distribution of pseudo lifetime. Finally, simulation studies and real data analysis are performed for illustration

    A statistical learning framework for spatial-temporal feature selection and application to air quality index forecasting

    Get PDF
    Accurate air quality index (AQI) forecasting makes a difference to public health, local economic development, and ecological environment. As a typical geographical datum, the spatial autocorrelation (SAC) of the AQI is often ignored, which may violate the assumptions of some models, such as machine learning which requires variables to be independent and identically distributed. Considering the strong SAC of the AQI, this study proposes a novel statistical learning framework integrating SAC variables, feature selection, and support vector regression (SVR) for AQI prediction in which correlation analysis and time series analysis are used to extract the spatial-temporal features. In addition, the historical AQI series of the target site is adjusted by using trigonometric regression to eliminate the non-stationarity. To further improve prediction accuracy, a feature selection method combining reinforcement learning with a heuristic algorithm is adopted. To demonstrate the effectiveness of our proposed framework, we select the AQI data of 34 cities from the Yangtze River Delta, which is one of the most polluted areas in eastern China, and focus on the three largest cities, Nanjing, Hangzhou, and Shanghai. We compared the proposed framework with several baselines, and the experiment illustrates that the forecasting accuracy of the proposed framework is significantly better than the baselines at all selected key sites that can provide accurate predictions for air quality

    A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID-19 pandemic

    Get PDF
    Abstract China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns on air quality index (AQI) using a deep learning framework. In addition to historical pollutant concentrations and meteorological factors, we incorporate social and spatio-temporal influences in the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data. Our deep learning analysis obtained the estimates of the lockdown effects as − 25.88 in Wuhan and − 20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by 67% for Shanghai, which enables much more reliable AQI forecasts for both cities

    A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID-19 pandemic

    No full text
    China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns on air quality index (AQI) using a deep learning framework. In addition to historical pollutant concentrations and meteorological factors, we incorporate social and spatio-temporal influences in the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data. Our deep learning analysis obtained the estimates of the lockdown effects as − 25.88 in Wuhan and − 20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by 67% for Shanghai, which enables much more reliable AQI forecasts for both cities
    corecore