146 research outputs found

    Bias in random forest variable importance measures: Illustrations, sources and a solution

    Get PDF
    BACKGROUND: Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. RESULTS: Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. CONCLUSION: We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research

    COX inhibitors and breast cancer

    Get PDF
    There is considerable evidence to suggest that prostaglandins play an important role in the development and growth of cancer. The enzyme cyclo-oxygenase (COX) catalyses the conversion of arachidonic acid to prostaglandins. In recent years, there has been interest in a possible role for COX inhibitors in the prevention and treatment of malignancy. Cyclo-oxygenase-2 (COX-2) is overexpressed in several epithelial tumours, including breast cancer. Preclinical evidence favours an antitumour role for COX inhibitors in breast cancer. However, the epidemiological evidence for an association is conflicting. Trials are being conducted to study the use of COX inhibitors alone and in combination with other agents in the chemoprevention of breast cancer, and in the neo-adjuvant, adjuvant, and metastatic treatment settings. In evaluating the potential use of these agents particularly in cancer chemoprophylaxis, the safety profile is as important as their efficacy. Concern over the cardiovascular safety of both selective and nonselective COX-inhibitors has recently been highlighted

    Functional Brain Network Modularity Captures Inter- and Intra-Individual Variation in Working Memory Capacity

    Get PDF
    Cognitive abilities, such as working memory, differ among people; however, individuals also vary in their own day-to-day cognitive performance. One potential source of cognitive variability may be fluctuations in the functional organization of neural systems. The degree to which the organization of these functional networks is optimized may relate to the effective cognitive functioning of the individual. Here we specifically examine how changes in the organization of large-scale networks measured via resting state functional connectivity MRI and graph theory track changes in working memory capacity.Twenty-two participants performed a test of working memory capacity and then underwent resting-state fMRI. Seventeen subjects repeated the protocol three weeks later. We applied graph theoretic techniques to measure network organization on 34 brain regions of interest (ROI). Network modularity, which measures the level of integration and segregation across sub-networks, and small-worldness, which measures global network connection efficiency, both predicted individual differences in memory capacity; however, only modularity predicted intra-individual variation across the two sessions. Partial correlations controlling for the component of working memory that was stable across sessions revealed that modularity was almost entirely associated with the variability of working memory at each session. Analyses of specific sub-networks and individual circuits were unable to consistently account for working memory capacity variability.The results suggest that the intrinsic functional organization of an a priori defined cognitive control network measured at rest provides substantial information about actual cognitive performance. The association of network modularity to the variability in an individual's working memory capacity suggests that the organization of this network into high connectivity within modules and sparse connections between modules may reflect effective signaling across brain regions, perhaps through the modulation of signal or the suppression of the propagation of noise

    Implications from a Network-Based Topological Analysis of Ubiquitin Unfolding Simulations

    Get PDF
    BACKGROUND: The architectural organization of protein structures has been the focus of intense research since it can hopefully lead to an understanding of how proteins fold. In earlier works we had attempted to identify the inherent structural organization in proteins through a study of protein topology. We obtained a modular partitioning of protein structures with the modules correlating well with experimental evidence of early folding units or "foldons". Residues that connect different modules were shown to be those that were protected during the transition phase of folding. METHODOLOGY/PRINCIPAL FINDINGS: In this work, we follow the topological path of ubiquitin through molecular dynamics unfolding simulations. We observed that the use of recurrence quantification analysis (RQA) could lead to the identification of the transition state during unfolding. Additionally, our earlier contention that the modules uncovered through our graph partitioning approach correlated well with early folding units was vindicated through our simulations. Moreover, residues identified from native structure as connector hubs and which had been shown to be those that were protected during the transition phase of folding were indeed more stable (less flexible) well beyond the transition state. Further analysis of the topological pathway suggests that the all pairs shortest path in a protein is minimized during folding. CONCLUSIONS: We observed that treating a protein native structure as a network by having amino acid residues as nodes and the non-covalent interactions among them as links allows for the rationalization of many aspects of the folding process. The possibility to derive this information directly from 3D structure opens the way to the prediction of important residues in proteins, while the confirmation of the minimization of APSP for folding allows for the establishment of a potentially useful proxy for kinetic optimality in the validation of sequence-structure predictions

    Targeting KSHV/HHV-8 Latency with COX-2 Selective Inhibitor Nimesulide: A Potential Chemotherapeutic Modality for Primary Effusion Lymphoma

    Get PDF
    The significance of inflammation in KSHV biology and tumorigenesis prompted us to examine the role of COX-2 in primary effusion lymphoma (PEL), an aggressive AIDS-linked KSHV-associated non-Hodgkin's lymphoma (NHL) using nimesulide, a well-known COX-2 specific NSAID. We demonstrate that (1) nimesulide is efficacious in inducing proliferation arrest in PEL (KSHV+/EBV-; BCBL-1 and BC-3, KSHV+/EBV+; JSC-1), EBV-infected (KSHV-/EBV+; Raji) and non-infected (KSHV-/EBV-; Akata, Loukes, Ramos, BJAB) high malignancy human Burkitt's lymphoma (BL) as well as KSHV-/EBV+ lymphoblastoid (LCL) cell lines; (2) nimesulide is selectively toxic to KSHV infected endothelial cells (TIVE-LTC) compared to TIVE and primary endothelial cells (HMVEC-d); (3) nimesulide reduced KSHV latent gene expression, disrupted p53-LANA-1 protein complexes, and activated the p53/p21 tumor-suppressor pathway; (4) COX-2 inhibition down-regulated cell survival kinases (p-Akt and p-GSK-3β), an angiogenic factor (VEGF-C), PEL defining genes (syndecan-1, aquaporin-3, and vitamin-D3 receptor) and cell cycle proteins such as cyclins E/A and cdc25C; (5) nimesulide induced sustained cell death and G1 arrest in BCBL-1 cells; (6) nimesulide substantially reduced the colony forming capacity of BCBL-1 cells. Overall, our studies provide a comprehensive molecular framework linking COX-2 with PEL pathogenesis and identify the chemotherapeutic potential of nimesulide in treating PEL

    Evidence-Based Annotation of the Malaria Parasite's Genome Using Comparative Expression Profiling

    Get PDF
    A fundamental problem in systems biology and whole genome sequence analysis is how to infer functions for the many uncharacterized proteins that are identified, whether they are conserved across organisms of different phyla or are phylum-specific. This problem is especially acute in pathogens, such as malaria parasites, where genetic and biochemical investigations are likely to be more difficult. Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages. We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms. We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function. These analyses, together with protein-protein interaction data, provide probabilistic models that predict the function of 926 uncharacterized malaria genes and also suggest that malaria parasites may provide a simple model system for the study of some human processes. These data also provide a foundation for further studies of transcriptional regulation in malaria parasites

    Tracking the Expression of Excitatory and Inhibitory Neurotransmission-Related Proteins and Neuroplasticity Markers after Noise Induced Hearing Loss

    Get PDF
    Excessive exposure to loud noise can damage the cochlea and create a hearing loss. These pathologies coincide with a range of CNS changes including reorganisation of frequency representation, alterations in the pattern of spontaneous activity and changed expression of excitatory and inhibitory neurotransmitters. Moreover, damage to the cochlea is often accompanied by acoustic disorders such as hyperacusis and tinnitus, suggesting that one or more of these neuronal changes may be involved in these disorders, although the mechanisms remain unknown. We tested the hypothesis that excessive noise exposure increases expression of markers of excitation and plasticity, and decreases expression of inhibitory markers over a 32-day recovery period. Adult rats (n = 25) were monaurally exposed to a loud noise (16 kHz, 1/10th octave band pass (115 dB SPL)) for 1-hour, or left as non-exposed controls (n = 5). Animals were euthanased at either 0, 4, 8, 16 or 32 days following acoustic trauma. We used Western Blots to quantify protein levels of GABAA receptor subunit α1 (GABAAα1), Glutamic-Acid Decarboxylase-67 (GAD-67), N-Methyl-D-Aspartate receptor subunit 2A (NR2A), Calbindin (Calb1) and Growth Associated Protein 43 (GAP-43) in the Auditory Cortex (AC), Inferior Colliculus (IC) and Dorsal Cochlear Nucleus (DCN). Compared to sham-exposed controls, noise-exposed animals had significantly (p<0.05): lower levels of GABAAα1 in the contralateral AC at day-16 and day-32, lower levels of GAD-67 in the ipsilateral DCN at day-4, lower levels of Calb1 in the ipsilateral DCN at day-0, lower levels of GABAAα1 in the ipsilateral AC at day-4 and day-32. GAP-43 was reduced in the ipsilateral AC for the duration of the experiment. These complex fluctuations in protein expression suggests that for at least a month following acoustic trauma the auditory system is adapting to a new pattern of sensory input

    Principal variable selection to explain grain yield variation in winter wheat from features extracted from UAV imagery

    Get PDF
    Background: Automated phenotyping technologies are continually advancing the breeding process. However, collecting various secondary traits throughout the growing season and processing massive amounts of data still take great efforts and time. Selecting a minimum number of secondary traits that have the maximum predictive power has the potential to reduce phenotyping efforts. The objective of this study was to select principal features extracted from UAV imagery and critical growth stages that contributed the most in explaining winter wheat grain yield. Five dates of multispectral images and seven dates of RGB images were collected by a UAV system during the spring growing season in 2018. Two classes of features (variables), totaling to 172 variables, were extracted for each plot from the vegetation index and plant height maps, including pixel statistics and dynamic growth rates. A parametric algorithm, LASSO regression (the least angle and shrinkage selection operator), and a non-parametric algorithm, random forest, were applied for variable selection. The regression coefficients estimated by LASSO and the permutation importance scores provided by random forest were used to determine the ten most important variables influencing grain yield from each algorithm. Results: Both selection algorithms assigned the highest importance score to the variables related with plant height around the grain filling stage. Some vegetation indices related variables were also selected by the algorithms mainly at earlier to mid growth stages and during the senescence. Compared with the yield prediction using all 172 variables derived from measured phenotypes, using the selected variables performed comparable or even better. We also noticed that the prediction accuracy on the adapted NE lines (r = 0.58–0.81) was higher than the other lines (r = 0.21–0.59) included in this study with different genetic backgrounds. Conclusions: With the ultra-high resolution plot imagery obtained by the UAS-based phenotyping we are now able to derive more features, such as the variation of plant height or vegetation indices within a plot other than just an averaged number, that are potentially very useful for the breeding purpose. However, too many features or variables can be derived in this way. The promising results from this study suggests that the selected set from those variables can have comparable prediction accuracies on the grain yield prediction than the full set of them but possibly resulting in a better allocation of efforts and resources on phenotypic data collection and processing

    Genetic and Non-Genetic Influences during Pregnancy on Infant Global and Site Specific DNA Methylation: Role for Folate Gene Variants and Vitamin B12

    Get PDF
    Inter-individual variation in patterns of DNA methylation at birth can be explained by the influence of environmental, genetic and stochastic factors. This study investigates the genetic and non-genetic determinants of variation in DNA methylation in human infants. Given its central role in provision of methyl groups for DNA methylation, this study focuses on aspects of folate metabolism. Global (LUMA) and gene specific (IGF2, ZNT5, IGFBP3) DNA methylation were quantified in 430 infants by Pyrosequencing®. Seven polymorphisms in 6 genes (MTHFR, MTRR, FOLH1, CβS, RFC1, SHMT) involved in folate absorption and metabolism were analysed in DNA from both infants and mothers. Red blood cell folate and serum vitamin B12 concentrations were measured as indices of vitamin status. Relationships between DNA methylation patterns and several covariates viz. sex, gestation length, maternal and infant red cell folate, maternal and infant serum vitamin B12, maternal age, smoking and genotype were tested. Length of gestation correlated positively with IGF2 methylation (rho = 0.11, p = 0.032) and inversely with ZNT5 methylation (rho = −0.13, p = 0.017). Methylation of the IGFBP3 locus correlated inversely with infant vitamin B12 concentration (rho = −0.16, p = 0.007), whilst global DNA methylation correlated inversely with maternal vitamin B12 concentrations (rho = 0.18, p = 0.044). Analysis of common genetic variants in folate pathway genes highlighted several associations including infant MTRR 66G>A genotype with DNA methylation (χ2 = 8.82, p = 0.003) and maternal MTHFR 677C>T genotype with IGF2 methylation (χ2 = 2.77, p = 0.006). These data support the hypothesis that both environmental and genetic factors involved in one-carbon metabolism influence DNA methylation in infants. Specifically, the findings highlight the importance of vitamin B12 status, infant MTRR genotype and maternal MTHFR genotype, all of which may influence the supply of methyl groups for DNA methylation. In addition, gestational length appears to be an important determinant of infant DNA methylation patterns
    • …
    corecore