94 research outputs found

    Identification of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests

    Get PDF
    The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes

    Growth Strategies of Tropical Tree Species: Disentangling Light and Size Effects

    Get PDF
    An understanding of the drivers of tree growth at the species level is required to predict likely changes of carbon stocks and biodiversity when environmental conditions change. Especially in species-rich tropical forests, it is largely unknown how species differ in their response of growth to resource availability and individual size. We use a hierarchical Bayesian approach to quantify the impact of light availability and tree diameter on growth of 274 woody species in a 50-ha long-term forest census plot in Barro Colorado Island, Panama. Light reaching each individual tree was estimated from yearly vertical censuses of canopy density. The hierarchical Bayesian approach allowed accounting for different sources of error, such as negative growth observations, and including rare species correctly weighted by their abundance. All species grew faster at higher light. Exponents of a power function relating growth to light were mostly between 0 and 1. This indicates that nearly all species exhibit a decelerating increase of growth with light. In contrast, estimated growth rates at standardized conditions (5 cm dbh, 5% light) varied over a 9-fold range and reflect strong growth-strategy differentiation between the species. As a consequence, growth rankings of the species at low (2%) and high light (20%) were highly correlated. Rare species tended to grow faster and showed a greater sensitivity to light than abundant species. Overall, tree size was less important for growth than light and about half the species were predicted to grow faster in diameter when bigger or smaller, respectively. Together light availability and tree diameter only explained on average 12% of the variation in growth rates. Thus, other factors such as soil characteristics, herbivory, or pathogens may contribute considerably to shaping tree growth in the tropics

    Is EC class predictable from reaction mechanism?

    Get PDF
    We thank the Scottish Universities Life Sciences Alliance (SULSA) and the Scottish Overseas Research Student Awards Scheme of the Scottish Funding Council (SFC) for financial support.Background: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. Results: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. Conclusions: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways. The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.Publisher PDFPeer reviewe

    Detection of recurrent copy number alterations in the genome: taking among-subject heterogeneity seriously

    Get PDF
    Se adjunta un fichero pdf con los datos de investigación titulado "Supplementary Material for \Detection of Recurrent Copy Number Alterations in the Genome: taking among-subject heterogeneity seriously"Background: Alterations in the number of copies of genomic DNA that are common or recurrent among diseased individuals are likely to contain disease-critical genes. Unfortunately, defining common or recurrent copy number alteration (CNA) regions remains a challenge. Moreover, the heterogeneous nature of many diseases requires that we search for common or recurrent CNA regions that affect only some subsets of the samples (without knowledge of the regions and subsets affected), but this is neglected by most methods. Results: We have developed two methods to define recurrent CNA regions from aCGH data. Our methods are unique and qualitatively different from existing approaches: they detect regions over both the complete set of arrays and alterations that are common only to some subsets of the samples (i.e., alterations that might characterize previously unknown groups); they use probabilities of alteration as input and return probabilities of being a common region, thus allowing researchers to modify thresholds as needed; the two parameters of the methods have an immediate, straightforward, biological interpretation. Using data from previous studies, we show that we can detect patterns that other methods miss and that researchers can modify, as needed, thresholds of immediate interpretability and develop custom statistics to answer specific research questions. Conclusion: These methods represent a qualitative advance in the location of recurrent CNA regions, highlight the relevance of population heterogeneity for definitions of recurrence, and can facilitate the clustering of samples with respect to patterns of CNA. Ultimately, the methods developed can become important tools in the search for genomic regions harboring disease-critical genesFunding provided by Fundación de Investigación Médica Mutua Madrileña. Publication charges covered by projects CONSOLIDER: CSD2007-00050 of the Spanish Ministry of Science and Innovation and by RTIC COMBIOMED RD07/0067/0014 of the Spanish Health Ministr

    Individualized markers optimize class prediction of microarray data

    Get PDF
    BACKGROUND: Identification of molecular markers for the classification of microarray data is a challenging task. Despite the evident dissimilarity in various characteristics of biological samples belonging to the same category, most of the marker – selection and classification methods do not consider this variability. In general, feature selection methods aim at identifying a common set of genes whose combined expression profiles can accurately predict the category of all samples. Here, we argue that this simplified approach is often unable to capture the complexity of a disease phenotype and we propose an alternative method that takes into account the individuality of each patient-sample. RESULTS: Instead of using the same features for the classification of all samples, the proposed technique starts by creating a pool of informative gene-features. For each sample, the method selects a subset of these features whose expression profiles are most likely to accurately predict the sample's category. Different subsets are utilized for different samples and the outcomes are combined in a hierarchical framework for the classification of all samples. Moreover, this approach can innately identify subgroups of samples within a given class which share common feature sets thus highlighting the effect of individuality on gene expression. CONCLUSION: In addition to high classification accuracy, the proposed method offers a more individualized approach for the identification of biological markers, which may help in better understanding the molecular background of a disease and emphasize the need for more flexible medical interventions

    The global abundance of tree palms

    Get PDF
    Aim: Palms are an iconic, diverse and often abundant component of tropical ecosystems that provide many ecosystem services. Being monocots, tree palms are evolutionarily, morphologically and physiologically distinct from other trees, and these differences have important consequences for ecosystem services (e.g., carbon sequestration and storage) and in terms of responses to climate change. We quantified global patterns of tree palm relative abundance to help improve understanding of tropical forests and reduce uncertainty about these ecosystems under climate change. Location: Tropical and subtropical moist forests. Time period: Current. Major taxa studied: Palms (Arecaceae). Methods: We assembled a pantropical dataset of 2,548 forest plots (covering 1,191 ha) and quantified tree palm (i.e., ≥10 cm diameter at breast height) abundance relative to co‐occurring non‐palm trees. We compared the relative abundance of tree palms across biogeographical realms and tested for associations with palaeoclimate stability, current climate, edaphic conditions and metrics of forest structure. Results: On average, the relative abundance of tree palms was more than five times larger between Neotropical locations and other biogeographical realms. Tree palms were absent in most locations outside the Neotropics but present in >80% of Neotropical locations. The relative abundance of tree palms was more strongly associated with local conditions (e.g., higher mean annual precipitation, lower soil fertility, shallower water table and lower plot mean wood density) than metrics of long‐term climate stability. Life‐form diversity also influenced the patterns; palm assemblages outside the Neotropics comprise many non‐tree (e.g., climbing) palms. Finally, we show that tree palms can influence estimates of above‐ground biomass, but the magnitude and direction of the effect require additional work. Conclusions: Tree palms are not only quintessentially tropical, but they are also overwhelmingly Neotropical. Future work to understand the contributions of tree palms to biomass estimates and carbon cycling will be particularly crucial in Neotropical forests

    Gene selection for cancer classification with the help of bees

    Full text link
    corecore