290 research outputs found

    Predicting gene function using hierarchical multi-label decision tree ensembles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>S. cerevisiae</it>, <it>A. thaliana </it>and <it>M. musculus </it>are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability.</p> <p>Results</p> <p>We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use.</p> <p>Conclusions</p> <p>Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.</p

    Gene Function Classification Using Bayesian Models with Hierarchy-Based Priors

    Get PDF
    We investigate the application of hierarchical classification schemes to the annotation of gene function based on several characteristics of protein sequences including phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and a MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. The results from all three models show substantial improvement over previous methods, which were based on the C5 algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining these sources of information, our approach results in a higher accuracy rate when compared to models that use each data source alone. Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information

    Deciphering a subgroup of breast carcinomas with putative progression of grade during carcinogenesis revealed by comparative genomic hybridisation (CGH) and immunohistochemistry

    Get PDF
    Distinct parallel cytogenetic pathways in breast carcinogenesis could be identified in recent years. Nevertheless, it remained unclear as to which tumours may have progressed in grade or which patterns of cytogenetic alteration may define the switch from an in situ towards an invasive lesion. In order to gain more detailed insights into cytogenetic mechanisms of the pathogenesis of breast cancer, the chromosomal imbalances of 206 invasive breast cancer cases were characterised by means of comparative genomic hybridisation (CGH). CGH data were subjected to hierarchical cluster analysis and the results were further compared with immunohistochemical findings on tissue arrays from the same breast cancer cases. The combined analysis of immunohistochemical and cytogenetic data provided evidence that carcinomas with gains of 7p, and to a lesser extent losses of 9q and gains of 5p, are a distinct subgroup within the spectrum of ductal invasive grade 3 breast carcinomas. These aberrations were associated with a high degree of cytogenetic instability (16.6 alterations per case on average), 16q-losses in over 70% of these cases, strong oestrogen receptor expression and absence of strong expression of p53, c-erbB2 and Ck 5. These characteristics provide strong support for the hypothesis that these tumours may develop through stages of well- and perhaps intermediately differentiated breast cancers. Our results therefore underline the existence of several parallel and also stepwise progression pathways towards breast cancer

    Constraining modern day silicon cycling in Lake Baikal

    Get PDF
    Constraining the continental silicon cycle is a key requirement in attempts to understand both nutrient fluxes to the ocean and linkages between silicon and carbon cycling over different timescales. Silicon isotope data of dissolved silica (δ30SiDSi) are presented here from Lake Baikal and its catchment in central Siberia. As well as being the world's oldest and voluminous lake, Lake Baikal lies within the seventh largest drainage basin in the world and exports significant amounts of freshwater into the Arctic Ocean. Data from river waters accounting for c. 92% of annual river inflow to the lake suggest no seasonal alteration or anthropogenic impact on river δ30SiDSi composition. The absence of a change in δ30SiDSi within the Selenga Delta, through which 62% of riverine flow passes, suggest a net balance between biogenic uptake and dissolution in this system. A key feature of this study is the use of δ30SiDSi to examine seasonal and spatial variations in DSi utilisation and export across the lake. Using an open system model against deep water δ30SiDSi values from the lake, we estimate that 20-24% of DSi entering Lake Baikal is exported into the sediment record. Whilst highlighting the impact that lakes may have upon the sequestration of continental DSi, mixed layer δ30SiDSi values from 2003 and 2013 show significant spatial variability in the magnitude of spring bloom nutrient utilisation with lower rates in the north relative to south basin

    Histoplasma capsulatum Encodes a Dipeptidyl Peptidase Active against the Mammalian Immunoregulatory Peptide, Substance P

    Get PDF
    The pathogenic fungus Histoplasma capsulatum secretes dipeptidyl peptidase (Dpp) IV enzyme activity and has two putative DPPIV homologs (HcDPPIVA and HcDPPIVB). We previously showed that HcDPPIVB is the gene responsible for the majority of secreted DppIV activity in H. capsulatum culture supernatant, while we could not detect any functional contribution from HcDPPIVA. In order to determine whether HcDPPIVA encodes a functional DppIV enzyme, we expressed HcDPPIVA in Pichia pastoris and purified the recombinant protein. The recombinant enzyme cleaved synthetic DppIV substrates and had similar biochemical properties to other described DppIV enzymes, with temperature and pH optima of 42°C and 8, respectively. Recombinant HcDppIVA cleaved the host immunoregulatory peptide substance P, indicating the enzyme has the potential to affect the immune response during infection. Expression of HcDPPIVA under heterologous regulatory sequences in H. capsulatum resulted in increased secreted DppIV activity, indicating that the encoded protein can be expressed and secreted by its native organism. However, HcDPPIVA was not required for virulence in a murine model of histoplasmosis. This work reports a fungal enzyme that can function to cleave the immunomodulatory host peptide substance P

    CCR2 and CXCR3 agonistic chemokines are differently expressed and regulated in human alveolar epithelial cells type II

    Get PDF
    The attraction of leukocytes from circulation to inflamed lungs depends on the activation of both the leukocytes and the resident cells within the lung. In this study we determined gene expression and secretion patterns for monocyte chemoattractant protein-1 (MCP-1/CCL2) and T-cell specific CXCR3 agonistic chemokines (Mig/CXCL9, IP-10/CXCL10, and I-TAC/CXCL11) in TNF-α-, IFN-γ-, and IL-1β-stimulated human alveolar epithelial cells type II (AEC-II). AEC-II constitutively expressed high level of CCL2 mRNA in vitro and in situ , and released CCL2 protein in vitro . Treatment of AEC-II with proinflammatory cytokines up-regulated both CCL2 mRNA expression and release of immunoreactive CCL2, whereas IFN-γ had no effect on CCL2 release. In contrast, CXCR3 agonistic chemokines were not detected in freshly isolated AEC-II or in non-stimulated epithelial like cell line A549. IFN-γ, alone or in combination with IL-1β and TNF-α resulted in an increase in CXCL10, CXCL11, and CXCL9 mRNA expression and generation of CXCL10 protein by AEC-II or A549 cells. CXCL10 gene expression and secretion were induced in dose-dependent manner after cytokine-stimulation of AEC-II with an order of potency IFN-γ>>IL-1β ≥ TNF-α. Additionally, we localized the CCL2 and CXCL10 mRNAs in human lung tissue explants by in situ hybridization, and demonstrated the selective effects of cytokines and dexamethasone on CCL2 and CXCL10 expression. These data suggest that the regulation of the CCL2 and CXCL10 expression exhibit significant differences in their mechanisms, and also demonstrate that the alveolar epithelium contributes to the cytokine milieu of the lung, with the ability to respond to locally generated cytokines and to produce potent mediators of the local inflammatory response
    corecore