Search CORE

9 research outputs found

A modified memetic algorithm with an application to gene selection in a sheep body weight study

Author: Cai Fengjing
Miao Maoxuan
Wang You-Gan
Wu Jinran
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Selecting the minimal best subset out of a huge number of factors for influencing the response is a fundamental and very challenging NP-hard problem because the presence of many redundant genes results in over-fitting easily while missing an important gene can more detrimental impact on predictions, and computation is prohibitive for exhaust search. We propose a modified memetic algorithm (MA) based on an improved splicing method to overcome the problems in the traditional genetic algorithm exploitation capability and dimension reduction in the predictor variables. The new algorithm accelerates the search in identifying the minimal best subset of genes by incorporating it into the new local search operator and hence improving the splicing method. The improvement is also due to another two novel aspects: (a) updating subsets of genes iteratively until the no more reduction in the loss function by splicing and increasing the probability of selecting the true subsets of genes; and (b) introducing add and del operators based on backward sacrifice into the splicing method to limit the size of gene subsets. Additionally, according to the experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms. Moreover, the mutation operator is replaced by it to enhance exploitation capability and initial individuals are improved by it to enhance efficiency of search. A dataset of the body weight of Hu sheep was used to evaluate the superiority of the modified MA against the genetic algorithm. According to our experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including the most advanced adaptive best-subset selection algorithm

Multidisciplinary Digital Publishing Institute

ACU Research Bank

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

A working likelihood approach for robust regression

Author: Cai Fengjing
Fu Liya
Wang You-Gan
Publication venue: 'SAGE Publications'
Publication date: 01/01/2020
Field of study

Robust approach is often desirable in presence of outliers for more efficient parameter estimation. However, the choice of the regularization parameter value impacts the efficiency of the parameter estimators. To maximize the estimation efficiency, we construct a likelihood function for simultaneously estimating the regression parameters and the tuning parameter. The “working” likelihood function is deemed as a vehicle for efficient regression parameter estimation, because we do not assume the data are generated from this likelihood function. The proposed method can effectively find a value of the regularization parameter based on the extent of contamination in the data. We carry out extensive simulation studies in a variety of cases to investigate the performance of the proposed method. The simulation results show that the efficiency can be enhanced as much as 40% when the data follow a heavy-tailed distribution, and reaches as high as 468% for the heteroscedastic variance cases compared to the traditional Huber’s method with a fixed regularization parameter. For illustration, we also analyzed two datasets: one from a diabetics study and the other from a mortality study

ACU Research Bank

Queensland University of Technology ePrints Archive

Parameter estimation for univariate Skew-Normal distribution based on the modified empirical characteristic function

Author: Cai Fengjing
Hou Gege
Wang You Gan
Xu Ancha
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2022
Field of study

Parameter estimation for the skew-normal distribution is challenging, since the profile likelihood function of shape parameter has a stationary point at zero, which hampers the use of traditional methods, such as maximum likelihood method. We present a modified empirical characteristic function method to perform parameter estimation for the skew-normal distribution. The proposed approach is flexible and easy to implement. We show that the estimators converge to the true values in probability. The simulation study and data analysis suggest that the proposed method performs well, even for the case of small sample size.</p

ACU Research Bank

Queensland University of Technology ePrints Archive

Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis

Author: Cai Fengjing
Fu Liya
Wang You Gan
Yang Zhuoran
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

New technologies have produced increasingly complex and massive datasets, such as next generation sequencing and microarray data in biology, dynamic treatment regimes in clinical trials and long-term wide-scale studies in the social sciences. Each study exhibits its unique data structure within individuals, clusters and possibly across time and space. In order to draw valid conclusion from such large dimensional data, we must account for intracluster correlations, varying cluster sizes, and outliers in response and/or covariate domains to achieve valid and efficient inferences. A weighted rank-based method is proposed for selecting variables and estimating parameters simultaneously. The main contribution of the proposed method is four fold: (1) variable selection using adaptive lasso is extended to robust rank regression so that protection against outliers in both response and predictor variables is obtained; (2) within-subject correlations are incorporated so that efficiency of parameter estimation is improved; (3) the computation is convenient via the existing function in statistical software R. (4) the proposed method is proved to have desirable asymptotic properties for fixed number of covariates (p). Simulation studies are carried out to evaluate the proposed method for a number of scenarios including the cases when p equals to the number of subjects. The simulation results indicate that the proposed method is efficient and robust. A hormone dataset is analyzed for illustration. By adding additional redundant variables as covariates, the penalty approach and weighting schemes are proven to be effective.</p

ACU Research Bank

Queensland University of Technology ePrints Archive

Bias reduction in the two-stage method for degradation data analysis

Author: Cai Fengjing
Wang You-Gan
Xu Ancha
Zheng Shurong
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Degradation data are usually collected for assessing the reliability of the product. We propose a new two-stage method to analyze degradation data. The degradation path is fitted by the nonlinear mixed effects model in the first stage, and the parameters in lifetime distribution are estimated by maximizing the asymptotic marginal distribution of pseudo lifetimes in the second stage. The new method has many advantages: (i) it does not require the distributions on random effects, (ii) the historical information about lifetime distribution of the product can be incorporated easily, and thus the estimated lifetime distribution has a closed form, (iii) bias correction term is automatically embedded into the asymptotic marginal distribution of pseudo lifetime. Finally, simulation studies and real data analysis are performed for illustration

ACU Research Bank

Queensland University of Technology ePrints Archive

A statistical learning framework for spatial-temporal feature selection and application to air quality index forecasting

Author: Cai Fengjing
Wang You-Gan
Wu Jinran
Zhang Shaotong
Zhao Zixi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Accurate air quality index (AQI) forecasting makes a difference to public health, local economic development, and ecological environment. As a typical geographical datum, the spatial autocorrelation (SAC) of the AQI is often ignored, which may violate the assumptions of some models, such as machine learning which requires variables to be independent and identically distributed. Considering the strong SAC of the AQI, this study proposes a novel statistical learning framework integrating SAC variables, feature selection, and support vector regression (SVR) for AQI prediction in which correlation analysis and time series analysis are used to extract the spatial-temporal features. In addition, the historical AQI series of the target site is adjusted by using trigonometric regression to eliminate the non-stationarity. To further improve prediction accuracy, a feature selection method combining reinforcement learning with a heuristic algorithm is adopted. To demonstrate the effectiveness of our proposed framework, we select the AQI data of 34 cities from the Yangtze River Delta, which is one of the most polluted areas in eastern China, and focus on the three largest cities, Nanjing, Hangzhou, and Shanghai. We compared the proposed framework with several baselines, and the experiment illustrates that the forecasting accuracy of the proposed framework is significantly better than the baselines at all selected key sites that can provide accurate predictions for air quality

ACU Research Bank

Queensland University of Technology ePrints Archive

A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID-19 pandemic

Author: Fengjing Cai
Jinran Wu
Shaotong Zhang
You-Gan Wang
Zixi Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

Abstract China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns on air quality index (AQI) using a deep learning framework. In addition to historical pollutant concentrations and meteorological factors, we incorporate social and spatio-temporal influences in the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data. Our deep learning analysis obtained the estimates of the lockdown effects as − 25.88 in Wuhan and − 20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by 67% for Shanghai, which enables much more reliable AQI forecasts for both cities

ACU Research Bank

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

A hybrid deep learning framework for air quality prediction with spatial autocorrelation during the COVID-19 pandemic

Author: Cai Fengjing
Wang You-Gan
Wu Jinran
Zhang Shaotong
Zhao Zixi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/01/2023
Field of study

China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns on air quality index (AQI) using a deep learning framework. In addition to historical pollutant concentrations and meteorological factors, we incorporate social and spatio-temporal influences in the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data. Our deep learning analysis obtained the estimates of the lockdown effects as − 25.88 in Wuhan and − 20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by 67% for Shanghai, which enables much more reliable AQI forecasts for both cities

Queensland University of Technology ePrints Archive

GPS data in urban online ride-hailing: A comparative analysis on fuel consumption and emissions

Author: Agency
Cai
Caulfield
Changying Wang
Chengdu
CHINA T.S.C.o.T.P.S.R.O.
Cramer
Fengjing Shao
Grimm
Hao
Haoran Zhang
Harding
Jiang
Kan
Kousoulidou
Lang
Leng
Li
Li
Li
Liu
Liu
Lopez
Luo
Ma
Martin
Maya Ben Dror
Meng Yuan
Miao
Molina
Nie
Ntziachristos
Ntziachristos
Nyhan
Pant
Rechengcheng Sun
Ryosuke Shibasaki
Shujing Li
Sun
Wang
Wu
Xiang Yu
Xinhua
Xuan Song
Yao Li
Yi Sui
Zhang
Zhang
Zhang
Zhang
Zhang
Zhao
Zhao
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref