48,098 research outputs found

    Bias and variance reduction procedures in non-parametric regression

    Get PDF
    The purpose of this study is to determine the effect of three improvement methods on nonparametric kernel regression estimators. The improvement methods are applied to the Nadaraya-Watson estimator with cross-validation bandwidth selection, the Nadaraya-Watson estimator with plug-in bandwidth selection, the local linear estimator with plug-in bandwidth selection and a bias corrected nonparametric estimator proposed by Yao (2012), based on cross-validation bandwith selection. The performance of the different resulting estimators are evaluated by empirically calculating their mean integrated squared error (MISE), a global discrepancy measure. The first two improvement methods proposed in this study are based on bootstrap bagging and bootstrap bragging procedures, which were originally introduced and studied by Swanepoel (1988, 1990), and hereafter applied, e.g., by Breiman (1996) in machine learning. Bagging and bragging are primarily variance reduction tools. The third improvement method, referred to as boosting, aims to reduce the bias of an estimator and is based on a procedure originally proposed by Tukey (1977). The behaviour of the classical Nadaraya-Watson estimator with plug-in estimator turns out to be a new recommendable nonparametric regression estimator, since it is not only as precise and accurate as any of the other estimators, but it is also computationally much faster than any other nonparametric regression estimator considered in this study

    Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth

    Get PDF
    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services

    Deep Learning and Radiomics Based PET/CT Image Feature Extraction from Auto Segmented Tumor Volumes for Recurrence-Free Survival Prediction in Oropharyngeal Cancer Patients

    Get PDF
    Aim: The development and evaluation of deep learning (DL) and radiomics based models for recurrence-free survival (RFS) prediction in oropharyngeal squamous cell carcinoma (OPSCC) patients based on clinical features, positron emission tomography (PET) and computed tomography (CT) scans and GTV (Gross Tumor Volume) contours of primary tumors and pathological lymph nodes. Methods: A DL auto-segmentation algorithm generated the GTV contours (task 1) that were used for imaging biomarkers (IBMs) extraction and as input for the DL model. Multivariable cox regression analysis was used to develop radiomics models based on clinical and IBMs features. Clinical features with a significant correlation with the endpoint in a univariable analysis were selected. The most promising IBMs were selected by forward selection in 1000 times bootstrap resampling in five-fold cross validation. To optimize the DL models, different combinations of clinical features, PET/CT imaging, GTV contours, the selected radiomics features and the radiomics model predictions were used as input. The combination with the best average performance in five-fold cross validation was taken as the final input for the DL model. The final prediction in the test set, was an ensemble average of the predictions from the five models for the different folds. Results: The average C-index in the five-fold cross validation of the radiomics model and the DL model were 0.7069 and 0.7575, respectively. The radiomics and final DL models showed C-indexes of 0.6683 and 0.6455, respectively in the test set. Conclusion: The radiomics model for recurrence free survival prediction based on clinical, GTV and CT image features showed the best predictive performance in the test set with a C-index of 0.6683.</p

    Generative Adversarial Networks for Financial Trading Strategies Fine-Tuning and Combination

    Get PDF
    Systematic trading strategies are algorithmic procedures that allocate assets aiming to optimize a certain performance criterion. To obtain an edge in a highly competitive environment, the analyst needs to proper fine-tune its strategy, or discover how to combine weak signals in novel alpha creating manners. Both aspects, namely fine-tuning and combination, have been extensively researched using several methods, but emerging techniques such as Generative Adversarial Networks can have an impact into such aspects. Therefore, our work proposes the use of Conditional Generative Adversarial Networks (cGANs) for trading strategies calibration and aggregation. To this purpose, we provide a full methodology on: (i) the training and selection of a cGAN for time series data; (ii) how each sample is used for strategies calibration; and (iii) how all generated samples can be used for ensemble modelling. To provide evidence that our approach is well grounded, we have designed an experiment with multiple trading strategies, encompassing 579 assets. We compared cGAN with an ensemble scheme and model validation methods, both suited for time series. Our results suggest that cGANs are a suitable alternative for strategies calibration and combination, providing outperformance when the traditional techniques fail to generate any alpha
    • …
    corecore