154 research outputs found

    An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF++

    Get PDF
    Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license

    Orbital effects of a monochromatic plane gravitational wave with ultra-low frequency incident on a gravitationally bound two-body system

    Full text link
    We analytically compute the long-term orbital variations of a test particle orbiting a central body acted upon by an incident monochromatic plane gravitational wave. We assume that the characteristic size of the perturbed two-body system is much smaller than the wavelength of the wave. Moreover, we also suppose that the wave's frequency is much smaller than the particle's orbital one. We make neither a priori assumptions about the direction of the wavevector nor on the orbital geometry of the planet. We find that, while the semi-major axis is left unaffected, the eccentricity, the inclination, the longitude of the ascending node, the longitude of pericenter and the mean anomaly undergo non-vanishing long-term changes. They are not secular trends because of the slow modulation introduced by the tidal matrix coefficients and by the orbital elements themselves. They could be useful to indepenedently constrain the ultra-low frequency waves which may have been indirectly detected in the BICEP2 experiment. Our calculation holds, in general, for any gravitationally bound two-body system whose characteristic frequency is much larger than the frequency of the external wave. It is also valid for a generic perturbation of tidal type with constant coefficients over timescales of the order of the orbital period of the perturbed particle.Comment: LaTex2e, 24 pages, no figures, no tables. Changes suggested by the referees include

    Tuning of Electrical and Optical Properties of Highly Conducting and Transparent Ta-Doped TiO2 Polycrystalline Films

    Get PDF
    We present a detailed study on polycrystalline transparent conducting Ta-doped TiO2 films, obtained by room temperature pulsed laser deposition followed by an annealing treatment at 550Β°C in vacuum. The effect of Ta as a dopant element and of different synthesis conditions are explored in order to assess the relationship between material structure and functional properties, i.e. electrical conductivity and optical transparency. We show that for the doped samples it is possible to achieve low resistivity (of the order of 5Γ—10-4 Ξ©cm) coupled with transmittance values exceeding 80% in the visible range, showing the potential of polycrystalline Ta:TiO2 for application as a transparent electrode in novel photovoltaic devices. The presence of trends in the structural (crystalline domain size, anatase cell parameters), electrical (resistivity, charge carrier density and mobility) and optical (transmittance, optical band gap, effective mass) properties as a function of the oxygen background pressures and laser fluence used during the deposition process and of the annealing atmosphere is discussed, and points towards a complex defect chemistry ruling the material behavior. The large mobility values obtained in this work for Ta:TiO2 polycrystalline films (up to 13 cm2V-1s-1) could represent a definitive advantage with respect to the more studied Nb-doped TiO2

    Heavy reliance on plants for Romanian cave bears evidenced by amino acid nitrogen isotope analysis

    Get PDF
    Heavy reliance on plants is rare in Carnivora and mostly limited to relatively small species in subtropical settings. The feeding behaviors of extinct cave bears living during Pleistocene cold periods at middle latitudes have been intensely studied using various approaches including isotopic analyses of fossil collagen. In contrast to cave bears from all other regions in Europe, some individuals from Romania show exceptionally high Ξ΄15N values that might be indicative of meat consumption. Herbivory on plants with high Ξ΄15N values cannot be ruled out based on this method, however. Here we apply an approach using the Ξ΄15N values of individual amino acids from collagen that offsets the baseline Ξ΄15N variation among environments. The analysis yielded strong signals of reliance on plants for Romanian cave bears based on the Ξ΄15N values of glutamate and phenylalanine. These results could suggest that the high variability in bulk collagen Ξ΄15N values observed among cave bears in Romania reflects niche partitioning but in a general trophic context of herbivory

    Metabolomics-Based Discovery of Diagnostic Biomarkers for Onchocerciasis

    Get PDF
    Onchocerciasis, caused by the filarial parasite Onchocerca volvulus, afflicts millions of people, causing such debilitating symptoms as blindness and acute dermatitis. There are no accurate, sensitive means of diagnosing O. volvulus infection. Clinical diagnostics are desperately needed in order to achieve the goals of controlling and eliminating onchocerciasis and neglected tropical diseases in general. In this study, a metabolomics approach is introduced for the discovery of small molecule biomarkers that can be used to diagnose O. volvulus infection. Blood samples from O. volvulus infected and uninfected individuals from different geographic regions were compared using liquid chromatography separation and mass spectrometry identification. Thousands of chromatographic mass features were statistically compared to discover 14 mass features that were significantly different between infected and uninfected individuals. Multivariate statistical analysis and machine learning algorithms demonstrated how these biomarkers could be used to differentiate between infected and uninfected individuals and indicate that the diagnostic may even be sensitive enough to assess the viability of worms. This study suggests a future potential of these biomarkers for use in a field-based onchocerciasis diagnostic and how such an approach could be expanded for the development of diagnostics for other neglected tropical diseases

    Individualized markers optimize class prediction of microarray data

    Get PDF
    BACKGROUND: Identification of molecular markers for the classification of microarray data is a challenging task. Despite the evident dissimilarity in various characteristics of biological samples belonging to the same category, most of the marker – selection and classification methods do not consider this variability. In general, feature selection methods aim at identifying a common set of genes whose combined expression profiles can accurately predict the category of all samples. Here, we argue that this simplified approach is often unable to capture the complexity of a disease phenotype and we propose an alternative method that takes into account the individuality of each patient-sample. RESULTS: Instead of using the same features for the classification of all samples, the proposed technique starts by creating a pool of informative gene-features. For each sample, the method selects a subset of these features whose expression profiles are most likely to accurately predict the sample's category. Different subsets are utilized for different samples and the outcomes are combined in a hierarchical framework for the classification of all samples. Moreover, this approach can innately identify subgroups of samples within a given class which share common feature sets thus highlighting the effect of individuality on gene expression. CONCLUSION: In addition to high classification accuracy, the proposed method offers a more individualized approach for the identification of biological markers, which may help in better understanding the molecular background of a disease and emphasize the need for more flexible medical interventions

    A Multi-Cancer Mesenchymal Transition Gene Expression Signature Is Associated with Prolonged Time to Recurrence in Glioblastoma

    Get PDF
    A stage-associated gene expression signature of coordinately expressed genes, including the transcription factor Slug (SNAI2) and other epithelial-mesenchymal transition (EMT) markers has been found present in samples from publicly available gene expression datasets in multiple cancer types, including nonepithelial cancers. The expression levels of the co-expressed genes vary in a continuous and coordinate manner across the samples, ranging from absence of expression to strong co-expression of all genes. These data suggest that tumor cells may pass through an EMT-like process of mesenchymal transition to varying degrees. Here we show that, in glioblastoma multiforme (GBM), this signature is associated with time to recurrence following initial treatment. By analyzing data from The Cancer Genome Atlas (TCGA), we found that GBM patients who responded to therapy and had long time to recurrence had low levels of the signature in their tumor samples (Pβ€Š=β€Š3Γ—10βˆ’7). We also found that the signature is strongly correlated in gliomas with the putative stem cell marker CD44, and is highly enriched among the differentially expressed genes in glioblastomas vs. lower grade gliomas. Our results suggest that long delay before tumor recurrence is associated with absence of the mesenchymal transition signature, raising the possibility that inhibiting this transition might improve the durability of therapy in glioma patients

    Heritability in the Efficiency of Nonsense-Mediated mRNA Decay in Humans

    Get PDF
    BACKGROUND: In eukaryotes mRNA transcripts of protein-coding genes in which an intron has been retained in the coding region normally result in premature stop codons and are therefore degraded through the nonsense-mediated mRNA decay (NMD) pathway. There is evidence in the form of selective pressure for in-frame stop codons in introns and a depletion of length three introns that this is an important and conserved quality-control mechanism. Yet recent reports have revealed that the efficiency of NMD varies across tissues and between individuals, with important clinical consequences. PRINCIPAL FINDINGS: Using previously published Affymetrix exon microarray data from cell lines genotyped as part of the International HapMap project, we investigated whether there are heritable, inter-individual differences in the abundance of intron-containing transcripts, potentially reflecting differences in the efficiency of NMD. We identified intronic probesets using EST data and report evidence of heritability in the extent of intron expression in 56 HapMap trios. We also used a genome-wide association approach to identify genetic markers associated with intron expression. Among the top candidates was a SNP in the DCP1A gene, which forms part of the decapping complex, involved in NMD. CONCLUSIONS: While we caution that some of the apparent inter-individual difference in intron expression may be attributable to different handling or treatments of cell lines, we hypothesize that there is significant polymorphism in the process of NMD, resulting in heritable differences in the abundance of intronic mRNA. Part of this phenotype is likely to be due to a polymorphism in a decapping enzyme on human chromosome 3
    • …
    corecore