34 research outputs found
An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF++
Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license
Aspergillus Infections and Progression of Structural Lung Disease in Children with Cystic Fibrosis.
Rationale: Recent data show that Aspergillus species are prevalent respiratory infections in children with cystic fibrosis (CF). The biological significance of these infections is unknown.Objectives: We aimed to evaluate longitudinal associations between Aspergillus infections and lung disease in young children with CF.Methods: Longitudinal data on 330 children participating in the Australian Respiratory Early Surveillance Team for Cystic Fibrosis surveillance program between 2000 and 2018 who underwent annual chest computed tomography (CT) imaging and BAL were used to determine the association between Aspergillus infections and the progression of structural lung disease. Results were adjusted for the effects of other common infections, associated variables, and repeated visits. Secondary outcomes included inflammatory markers in BAL, respiratory symptoms, and admissions for exacerbations.Measurements and Main Results: Haemophilus influenzae, Staphylococcus aureus, Pseudomonas aeruginosa, and Aspergillus infections were all associated with worse CT scores in the same year (Poverall < 0.05). Only P. aeruginosa and Aspergillus were associated with progression in CT scores in the year after an infection and worse CT scores at the end of the observation period. P. aeruginosa was most significantly associated with development of bronchiectasis (difference, 0.9; 95% confidence interval, 0.3-1.6; P = 0.003) and Aspergillus with trapped air (difference, 3.2; 95% confidence interval, 1.0-5.4; P = 0.004). Aspergillus infections were also associated with markers of neutrophilic inflammation (P < 0.001) and respiratory admissions risk (P = 0.008).Conclusions: Lower respiratory Aspergillus infections are associated with the progression of structural lung disease in young children with CF. This study highlights the need to further evaluate early Aspergillus species infections and the feasibility, risk, and benefit of eradication regimens