33 research outputs found

    Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning

    Get PDF
    Small proteins, encoded by small open reading frames, are only beginning to emerge with the current advancement of omics technology and bioinformatics. There is increasing evidence that small proteins play roles in diverse critical biological functions, such as adjusting cellular metabolism, regulating other protein activities, controlling cell cycles, and affecting disease physiology. In prokaryotes such as bacteria, the small proteins are largely unexplored for their sequence space and functional groups. For most bacterial species from a natural community, the sample cannot be easily isolated or cultured, and the bacterial peptides must be better characterized in a metagenomic manner. The bacterial peptides identified from metagenomic samples can not only enrich the pool of small proteins but can also reveal the community-specific microbe ecology information from a small protein perspective. In this study, metaBP (Bacterial Peptides for metagenomic sample) has been developed as a comprehensive toolkit to explore the small protein universe from metagenomic samples. It takes raw sequencing reads as input, performs protein-level meta-assembly, and computes bacterial peptide homolog groups with sample-specific mutations. The metaBP also integrates general protein annotation tools as well as our small protein-specific machine learning module metaBP-ML to construct a full landscape for bacterial peptides. The metaBP-ML shows advantages for discovering functions of bacterial peptides in a microbial community and increases the yields of annotations by up to five folds. The metaBP toolkit demonstrates its novelty in adopting the protein-level assembly to discover small proteins, integrating protein-clustering tool in a new and flexible environment of RBiotools, and presenting the first-time small protein landscape by metaBP-ML. Taken together, metaBP (and metaBP-ML) can profile functional bacterial peptides from metagenomic samples with potential diverse mutations, in order to depict a unique landscape of small proteins from a microbial community

    Evaluation of sliding baseline methods for spatial estimation for cluster detection in the biosurveillance system

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Centers for Disease Control and Prevention's (CDC's) BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history.</p> <p>Methods</p> <p>The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate) and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code.</p> <p>Results</p> <p>Simple estimation methods that account for day-of-week (DOW) data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts.</p> <p>Conclusion</p> <p>The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving different spatial resolution or other syndromes can yield further improvement.</p

    Insights from Comparative Genomics of the Genus Salmonella

    Get PDF
    Comparative genomics have become a standard approach to gain insights into the interrelationships of microorganisms. Here, we have applied variable bioinformatic techniques to compare over 200 Salmonella genomes. First, we present a tree of all sequenced different members of the Enterobacteriaceae family, based on comparison of average amino acid identities. This technique was also applied to zoom in on the genomes of the genus Salmonella. The pan and core genomes of this genus were established and compared to experimental data available on the literature that identified essential genes. Difficulties and shortcomings of both approaches are discussed. Metabolic pathways unique for Salmonella were identified. Finally, we present an analysis of genes coding for small RNAs, an important part of the genetic repertoire of bacteria that is often ignored. The findings reported here are discussed and compared with available literature

    Insights from 20 years of bacterial genome sequencing

    Get PDF
    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them

    Nanostructure-specific X-ray tomography reveals myelin levels, integrity and axon orientations in mouse and human nervous tissue

    Get PDF
    Myelin insulates neuronal axons and enables fast signal transmission, constituting a key component of brain development, aging and disease. Yet, myelin-specific imaging of macroscopic samples remains a challenge. Here, we exploit myelin’s nanostructural periodicity, and use small-angle X-ray scattering tensor tomography (SAXS-TT) to simultaneously quantify myelin levels, nanostructural integrity and axon orientations in nervous tissue. Proof-of-principle is demonstrated in whole mouse brain, mouse spinal cord and human white and gray matter samples. Outcomes are validated by 2D/3D histology and compared to MRI measurements sensitive to myelin and axon orientations. Specificity to nanostructure is exemplified by concomitantly imaging different myelin types with distinct periodicities. Finally, we illustrate the method’s sensitivity towards myelin-related diseases by quantifying myelin alterations in dysmyelinated mouse brain. This non-destructive, stain-free molecular imaging approach enables quantitative studies of myelination within and across samples during development, aging, disease and treatment, and is applicable to other ordered biomolecules or nanostructures

    Ebolavirus comparative genomics

    Get PDF
    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).Fil: Jun, Se Ran. Oak Ridge National Laboratory; Estados Unidos. University of Tennessee; Estados UnidosFil: Leuze, Michael R.. Oak Ridge National Laboratory; Estados UnidosFil: Nookaew, Intawat. Oak Ridge National Laboratory; Estados UnidosFil: Uberbacher, Edward C.. Oak Ridge National Laboratory; Estados UnidosFil: Land, Miriam. Oak Ridge National Laboratory; Estados UnidosFil: Zhang, Qian. Oak Ridge National Laboratory; Estados Unidos. University of Tennessee; Estados UnidosFil: Wanchai, Visanu. Oak Ridge National Laboratory; Estados UnidosFil: Chai, Juanjuan. Oak Ridge National Laboratory; Estados UnidosFil: Nielsen, Morten. Technical University of Denmark; Dinamarca. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas ; ArgentinaFil: Trolle, Thomas. Technical University of Denmark; DinamarcaFil: Lund, Ole. Technical University of Denmark; DinamarcaFil: Buzard, Gregory S.. Booze Allen Hamilton; Estados UnidosFil: Pedersen, Thomas D.. Technical University of Denmark; Dinamarca. Assays; DinamarcaFil: Wassenaar, Trudy M.. Molecular Microbiology and Genomics Consultants; AlemaniaFil: Ussery, David W.. Oak Ridge National Laboratory; Estados Unidos. University of Tennessee; Estados Unidos. Technical University of Denmark; Dinamarc

    Shewanella knowledgebase: integration of the experimental data and computational predictions suggests a biological role for transcription of intergenic regions

    Get PDF
    Shewanellae are facultative Îł-proteobacteria whose remarkable respiratory versatility has resulted in interest in their utility for bioremediation of heavy metals and radionuclides and for energy generation in microbial fuel cells. Extensive experimental efforts over the last several years and the availability of 21 sequenced Shewanella genomes made it possible to collect and integrate a wealth of information on the genus into one public resource providing new avenues for making biological discoveries and for developing a system level understanding of the cellular processes. The Shewanella knowledgebase was established in 2005 to provide a framework for integrated genome-based studies on Shewanella ecophysiology. The present version of the knowledgebase provides access to a diverse set of experimental and genomic data along with tools for curation of genome annotations and visualization and integration of genomic data with experimental data. As a demonstration of the utility of this resource, we examined a single microarray data set from Shewanella oneidensis MR-1 for new insights into regulatory processes. The integrated analysis of the data predicted a new type of bacterial transcriptional regulation involving co-transcription of the intergenic region with the downstream gene and suggested a biological role for co-transcription that likely prevents the binding of a regulator of the upstream gene to the regulator binding site located in the intergenic region

    Petri Net Model of a Dynamically Partitioned Multiprocessor System

    No full text
    A multiprocessor system can be subdivided into partitions of processors, each of which can be dedicated to the execution of a parallel program. The partitioning of the system can be done statically at system configuration time, adaptively prior to the execution time, or dynamically during execution time. Since, in a dynamically partitioned multiprocessor system, partitioning can occur anytime during the execution of a program, designing an analytical model for such a system is a difficult task. In this paper a Petri net model of a dynamically partitioned multiprocessor system is presented. The workload consists of parallel programs which are characterized by their execution signatures. Repartitioning overhead is an important parameter and is modeled explicitly. The model is used to perform a series of sensitivity analysis experiments which give insight into the behavior of such systems. Several dynamic processor allocation policies have been implemented. Equal Slope and Shortest Job Fi..

    A hybrid laguerre method

    No full text
    corecore