105 research outputs found

    Ab initio and homology based prediction of protein domains by recursive neural networks

    Get PDF
    Background: Proteins, especially larger ones, are often composed of individual evolutionary units, domains, which have their own function and structural fold. Predicting domains is an important intermediate step in protein analyses, including the prediction of protein structures. Results: We describe novel systems for the prediction of protein domain boundaries powered by Recursive Neural Networks. The systems rely on a combination of primary sequence and evolutionary information, predictions of structural features such as secondary structure, solvent accessibility and residue contact maps, and structural templates, both annotated for domains (from the SCOP dataset) and unannotated (from the PDB). We gauge the contribution of contact maps, and PDB and SCOP templates independently and for different ranges of template quality. We find that accurately predicted contact maps are informative for the prediction of domain boundaries, while the same is not true for contact maps predicted ab initio. We also find that gap information from PDB templates is informative, but, not surprisingly, less than SCOP annotations. We test both systems trained on templates of all qualities, and systems trained only on templates of marginal similarity to the query (less than 25% sequence identity). While the first batch of systems produces near perfect predictions in the presence of fair to good templates, the second batch outperforms or match ab initio predictors down to essentially any level of template quality. We test all systems in 5-fold cross-validation on a large non-redundant set of multi-domain and single domain proteins. The final predictors are state-of-the-art, with a template-less prediction boundary recall of 50.8% (precision 38.7%) within ± 20 residues and a single domain recall of 80.3% (precision 78.1%). The SCOP-based predictors achieve a boundary recall of 74% (precision 77.1%) again within ± 20 residues, and classify single domain proteins as such in over 85% of cases, when we allow a mix of bad and good quality templates. If we only allow marginal templates (max 25% sequence identity to the query) the scores remain high, with boundary recall and precision of 59% and 66.3%, and 80% of all single domain proteins predicted correctly. Conclusion: The systems presented here may prove useful in large-scale annotation of protein domains in proteins of unknown structure. The methods are available as public web servers at the address: http://distill.ucd.ie/shandy/ and we plan on running them on a multi-genomic scale and make the results public in the near future.Science Foundation IrelandHealth Research BoardUCD President's Award 2004au, da, sp, ke, ab - kpw2/12/1

    High-level diversity of tailed phages, eukaryote-associated viruses, and virophage-like elements in the metaviromes of Antarctic soils

    Get PDF
    The metaviromes of two distinct Antarctic hyperarid desert soil communities have been characterized. Hypolithic communities, cyanobacterium-dominated assemblages situated on the ventral surfaces of quartz pebbles embedded in the desert pavement, showed higher virus diversity than surface soils, which correlated with previous bacterial community studies. Prokaryotic viruses (i.e., phages) represented the largest viral component (particularly Mycobacterium phages) in both habitats, with an identical hierarchical sequence abundance of families of tailed phages (Siphoviridae>Myoviridae>Podoviridae). No archaeal viruses were found. Unexpectedly, cyanophages were poorly represented in both metaviromes and were phylogenetically distant from currently characterized cyanophages. Putative phage genomes were assembled and showed a high level of unaffiliated genes, mostly from hypolithic viruses. Moreover, unusual gene arrangements in which eukaryotic and prokaryotic virus-derived genes were found within identical genome segments were observed. Phycodnaviridae and Mimiviridae viruses were the second-mostabundant taxa and more numerous within open soil. Novel virophage-like sequences (within the Sputnik clade) were identified. These findings highlight high-level virus diversity and novel species discovery potential within Antarctic hyperarid soils and may serve as a starting point for future studies targeting specific viral groups.IS

    Niche-dependent genetic diversity in Antarctic metaviromes

    Get PDF
    The metaviromes from 2 different Antarctic terrestrial soil niches have been analyzed. Both hypoliths (microbial assemblages beneath transluscent rocks) and surrounding open soils showed a high level diversity of tailed phages, viruses of algae and amoeba, and virophage sequences. Comparisons of other global metaviromes with the Antarctic libraries showed a niche-dependent clustering pattern, unrelated to the geographical origin of a given metavirome. Within the Antarctic open soil metavirome, a putative circularly permuted, »42kb dsDNA virus genome was annotated, showing features of a temperate phage possessing a variety of conserved protein domains with no significant taxonomic affiliations in current databases.National Research Foundation (South Africa) and the Genomics Research Institute of the University of Pretoria (South Africa).http://www.tandfonline.com/loi/kbac202015-12-31hb201

    Analysis of the Pantoea ananatis pan-genome reveals factors underlying its ability to colonize and interact with plant, insect and vertebrate hosts

    Get PDF
    BACKGROUND: Pantoea ananatis is found in a wide range of natural environments, including water, soil, as part of the epi- and endophytic flora of various plant hosts, and in the insect gut. Some strains have proven effective as biological control agents and plant-growth promoters, while other strains have been implicated in diseases of a broad range of plant hosts and humans. By analysing the pan-genome of eight sequenced P. ananatis strains isolated from different sources we identified factors potentially underlying its ability to colonize and interact with hosts in both the plant and animal Kingdoms. RESULTS: The pan-genome of the eight compared P. ananatis strains consisted of a core genome comprised of 3,876 protein coding sequences (CDSs) and a sizeable accessory genome consisting of 1,690 CDSs. We estimate that ~106 unique CDSs would be added to the pan-genome with each additional P. ananatis genome sequenced in the future. The accessory fraction is derived mainly from integrated prophages and codes mostly for proteins of unknown function. Comparison of the translated CDSs on the P. ananatis pan-genome with the proteins encoded on all sequenced bacterial genomes currently available revealed that P. ananatis carries a number of CDSs with orthologs restricted to bacteria associated with distinct hosts, namely plant-, animal- and insect-associated bacteria. These CDSs encode proteins with putative roles in transport and metabolism of carbohydrate and amino acid substrates, adherence to host tissues, protection against plant and animal defense mechanisms and the biosynthesis of potential pathogenicity determinants including insecticidal peptides, phytotoxins and type VI secretion system effectors. CONCLUSIONS: P. ananatis has an ‘open’ pan-genome typical of bacterial species that colonize several different environments. The pan-genome incorporates a large number of genes encoding proteins that may enable P. ananatis to colonize, persist in and potentially cause disease symptoms in a wide range of plant and animal hosts.This study was partially supported by the University of Pretoria Postdoctoral Fellowship Program, National Research Foundation (NRF), the Tree Protection Co-operative Programme (TPCP), the NRF/Dept. of Science and Technology Centre of Excellence in Tree Health Biotechnology (CTHB), and the THRIP support program of the Department of Trade and Industry, South Africa. IKT and PRJB were supported by a grant from the Scottish Government’s Rural and Environmental Science and Analytical Services (RESAS) division.http://www.biomedcentral.com/1471-2164/15/404am201

    Metagenomic analysis of the viral community in Namib desert hypoliths

    Get PDF
    Hypolithic microbial communities are specialized desert communities inhabiting the underside of translucent rocks where they are sheltered from harsh environmental conditions. Here, we present the first study of the viral fraction of these communities isolated from the hyperarid Namib Desert (coastal South Western Africa). Using next-generation sequencing of the isolated viral fraction, the diversity and taxonomic composition of hypolith communities was mapped and a functional assessment of the sequences determined. Phylotypic analysis showed that bacteriophages belonging to the order Caudovirales with the family Siphoviridae were most prevalent. A major fraction of phage types was linked by database homologies to Bacillus or Geobacillus sp. as a host. Phylogenetic analyses of terL and phoH marker genes indicated that many of the sequences were novel and distinct from known isolates and environments, an observation supported by the class distribution of identified ribonucleotide reductases. The composition of the viral hypolith fraction was not completely consistent with Namib hypolith phylotypic surveys, in which the cyanobacterial genus Chroococcidiopsis was found to be dominant. This could be attributed to lacking sequence information about hypolith viruses/bacteria in public databases or the hypothesis that hypolithic communities actively recruit viruses from the surrounding open soil in which Bacillaceae-infecting phages are more commonly found.http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1462-2920hb201

    Draft genome sequences of Diplodia sapinea, Ceratocystis manginecans, and Ceratocystis moniliformis

    Get PDF
    The draft nuclear genomes of Diplodia sapinea, Ceratocystis moniliformis s. str., and C. manginecans are presented. Diplodia sapinea is an important shoot-blight and canker pathogen of Pinus spp., C. moniliformis is a saprobe associated with wounds on a wide range of woody angiosperms and C. manginecans is a serious wilt pathogen of mango and Acacia mangium. The genome size of D. sapinea is estimated at 36.97 Mb and contains 13 020 predicted genes. Ceratocystis moniliformis includes 25.43 Mb and is predicted to encode at least 6 832 genes. This is smaller than that reported for the mango wilt pathogen C. manginecans which is 31.71 Mb and is predicted to encode at least 7 494 genes. The latter is thus more similar to C. fimbriata s.str., the type species of the genus. The genome sequences presented here provide an important resource to resolve issues pertaining to the taxonomy, biology and evolution of these fungi.The University of Pretoria, the Department of Science and Technology (DST)/National Research Foundation (NRF) Centre of Excellence in Tree Health Biotechnology, Genomics Research Institute (University of Pretoria) and Claude Leon Foundation, South Africa.http://www.imafungus.org/am201

    Discovery and profiling of small RNAs responsive to stress conditions in the plant pathogen <i>Pectobacterium atrosepticum</i>

    Get PDF
    BACKGROUND: Small RNAs (sRNAs) have emerged as important regulatory molecules and have been studied in several bacteria. However, to date, there have been no whole-transcriptome studies on sRNAs in any of the Soft Rot Enterobacteriaceae (SRE) group of pathogens. Although the main ecological niches for these pathogens are plants, a significant part of their life cycle is undertaken outside their host within adverse soil environment. However, the mechanisms of SRE adaptation to this harsh nutrient-deficient environment are poorly understood. RESULTS: In the study reported herein, by using strand-specific RNA-seq analysis and in silico sRNA predictions, we describe the sRNA pool of Pectobacterium atrosepticum and reveal numerous sRNA candidates, including those that are induced during starvation-activated stress responses. Consequently, strand-specific RNA-seq enabled detection of 137 sRNAs and sRNA candidates under starvation conditions; 25 of these sRNAs were predicted for this bacterium in silico. Functional annotations were computationally assigned to 68 sRNAs. The expression of sRNAs in P. atrosepticum was compared under growth-promoting and starvation conditions: 68 sRNAs were differentially expressed with 47 sRNAs up-regulated under nutrient-deficient conditions. Conservation analysis using BLAST showed that most of the identified sRNAs are conserved within the SRE. Subsequently, we identified 9 novel sRNAs within the P. atrosepticum genome. CONCLUSIONS: Since many of the identified sRNAs are starvation-induced, the results of our study suggests that sRNAs play key roles in bacterial adaptive response. Finally, this work provides a basis for future experimental characterization and validation of sRNAs in plant pathogens. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2376-0) contains supplementary material, which is available to authorized users

    Global, regional, and national burden of neurological disorders during 1990-2015 : a systematic analysis for the Global Burden of Disease Study 2015

    Get PDF
    Background Comparable data on the global and country-specific burden of neurological disorders and their trends are crucial for health-care planning and resource allocation. The Global Burden of Diseases, Injuries, and Risk Factors (GBD) Study provides such information but does not routinely aggregate results that are of interest to clinicians specialising in neurological conditions. In this systematic analysis, we quantified the global disease burden due to neurological disorders in 2015 and its relationship with country development level. Methods We estimated global and country-specific prevalence, mortality, disability-adjusted life-years (DALYs), years of life lost (YLLs), and years lived with disability (YLDs) for various neurological disorders that in the GBD classification have been previously spread across multiple disease groupings. The more inclusive grouping of neurological disorders included stroke, meningitis, encephalitis, tetanus, Alzheimer's disease and other dementias, Parkinson's disease, epilepsy, multiple sclerosis, motor neuron disease, migraine, tension-type headache, medication overuse headache, brain and nervous system cancers, and a residual category of other neurological disorders. We also analysed results based on the Socio-demographic Index (SDI), a compound measure of income per capita, education, and fertility, to identify patterns associated with development and how countries fare against expected outcomes relative to their level of development. Findings Neurological disorders ranked as the leading cause group of DALYs in 2015 (250.7 [95% uncertainty interval (UI) 229.1 to 274.7] million, comprising 10.2% of global DALYs) and the second-leading cause group of deaths (9.4 [9.1 to 9.7] million], comprising 16.8% of global deaths). The most prevalent neurological disorders were tensiontype headache (1505 9 [UI 1337.3 to 1681.6 million cases]), migraine (958.8 [872.1 to 1055.6] million), medication overuse headache (58.5 [50.8 to 67.4 million]), and Alzheimer's disease and other dementias (46.0 [40.2 to 52.7 million]). Between 1990 and 2015, the number of deaths from neurological disorders increased by 36.7%, and the number of DALYs by 7.4%. These increases occurred despite decreases in age-standardised rates of death and DALYs of 26.1% and 29.7%, respectively; stroke and communicable neurological disorders were responsible for most of these decreases. Communicable neurological disorders were the largest cause of DALYs in countries with low SDI. Stroke rates were highest at middle levels of SDI and lowest at the highest SDI. Most of the changes in DALY rates of neurological disorders with development were driven by changes in YLLs. Interpretation Neurological disorders are an important cause of disability and death worldwide. Globally, the burden of neurological disorders has increased substantially over the past 25 years because of expanding population numbers and ageing, despite substantial decreases in mortality rates from stroke and communicable neurological disorders. The number of patients who will need care by clinicians with expertise in neurological conditions will continue to grow in coming decades. Policy makers and health-care providers should be aware of these trends to provide adequate services.Peer reviewe

    Mapping 123 million neonatal, infant and child deaths between 2000 and 2017

    Get PDF
    Since 2000, many countries have achieved considerable success in improving child survival, but localized progress remains unclear. To inform efforts towards United Nations Sustainable Development Goal 3.2—to end preventable child deaths by 2030—we need consistently estimated data at the subnational level regarding child mortality rates and trends. Here we quantified, for the period 2000–2017, the subnational variation in mortality rates and number of deaths of neonates, infants and children under 5 years of age within 99 low- and middle-income countries using a geostatistical survival model. We estimated that 32% of children under 5 in these countries lived in districts that had attained rates of 25 or fewer child deaths per 1,000 live births by 2017, and that 58% of child deaths between 2000 and 2017 in these countries could have been averted in the absence of geographical inequality. This study enables the identification of high-mortality clusters, patterns of progress and geographical inequalities to inform appropriate investments and implementations that will help to improve the health of all populations

    Mapping development and health effects of cooking with solid fuels in low-income and middle-income countries, 2000-18 : a geospatial modelling study

    Get PDF
    Background More than 3 billion people do not have access to clean energy and primarily use solid fuels to cook. Use of solid fuels generates household air pollution, which was associated with more than 2 million deaths in 2019. Although local patterns in cooking vary systematically, subnational trends in use of solid fuels have yet to be comprehensively analysed. We estimated the prevalence of solid-fuel use with high spatial resolution to explore subnational inequalities, assess local progress, and assess the effects on health in low-income and middle-income countries (LMICs) without universal access to clean fuels.Methods We did a geospatial modelling study to map the prevalence of solid-fuel use for cooking at a 5 km x 5 km resolution in 98 LMICs based on 2.1 million household observations of the primary cooking fuel used from 663 population-based household surveys over the years 2000 to 2018. We use observed temporal patterns to forecast household air pollution in 2030 and to assess the probability of attaining the Sustainable Development Goal (SDG) target indicator for clean cooking. We aligned our estimates of household air pollution to geospatial estimates of ambient air pollution to establish the risk transition occurring in LMICs. Finally, we quantified the effect of residual primary solid-fuel use for cooking on child health by doing a counterfactual risk assessment to estimate the proportion of deaths from lower respiratory tract infections in children younger than 5 years that could be associated with household air pollution.Findings Although primary reliance on solid-fuel use for cooking has declined globally, it remains widespread. 593 million people live in districts where the prevalence of solid-fuel use for cooking exceeds 95%. 66% of people in LMICs live in districts that are not on track to meet the SDG target for universal access to clean energy by 2030. Household air pollution continues to be a major contributor to particulate exposure in LMICs, and rising ambient air pollution is undermining potential gains from reductions in the prevalence of solid-fuel use for cooking in many countries. We estimated that, in 2018, 205000 (95% uncertainty interval 147000-257000) children younger than 5 years died from lower respiratory tract infections that could be attributed to household air pollution.Interpretation Efforts to accelerate the adoption of clean cooking fuels need to be substantially increased and recalibrated to account for subnational inequalities, because there are substantial opportunities to improve air quality and avert child mortality associated with household air pollution. Copyright (C) 2022 The Author(s). Published by Elsevier Ltd.Peer reviewe
    corecore