142 research outputs found

    The LANL hemorrhagic fever virus database, a new platform for analyzing biothreat viruses

    Get PDF
    Hemorrhagic fever viruses (HFVs) are a diverse set of over 80 viral species, found in 10 different genera comprising five different families: arena-, bunya-, flavi-, filo- and togaviridae. All these viruses are highly variable and evolve rapidly, making them elusive targets for the immune system and for vaccine and drug design. About 55 000 HFV sequences exist in the public domain today. A central website that provides annotated sequences and analysis tools will be helpful to HFV researchers worldwide. The HFV sequence database collects and stores sequence data and provides a user-friendly search interface and a large number of sequence analysis tools, following the model of the highly regarded and widely used Los Alamos HIV database [Kuiken, C., B. Korber, and R.W. Shafer, HIV sequence databases. AIDS Rev, 2003. 5: p. 52–61]. The database uses an algorithm that aligns each sequence to a species-wide reference sequence. The NCBI RefSeq database [Sayers et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 39, D38–D51.] is used for this; if a reference sequence is not available, a Blast search finds the best candidate. Using this method, sequences in each genus can be retrieved pre-aligned. The HFV website can be accessed via http://hfv.lanl.gov

    Combining Interactomes from Multiple Organisms: a Case Study on Human-Mouse

    Get PDF
    The amount and quality of available data on different organisms varies greatly. While model organisms benefit from extensive experimental studies, there is often a lack of detailed experimental data for more specific organisms. Additionally, even among model organisms there are noticeable differences in the amount and type of data available, due to the different suitability of experiments in different organisms. The combination of interactomes for closely related species, represents a viable tool to increase the amount of protein- protein interaction data for a given organism. The Human-Mouse case of study is particularly relevant, as many experiments cannot be carried out on humans. This paper describes a general method to construct a combined interactome from different organisms.CONACYT – Consejo Nacional de Ciencia y TecnologíaPROCIENCI

    The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics

    Get PDF
    The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates. As of September 2008, the database describes the present knowledge on 113 glycoside hydrolase, 91 glycosyltransferase, 19 polysaccharide lyase, 15 carbohydrate esterase and 52 carbohydrate-binding module families. These families are created based on experimentally characterized proteins and are populated by sequences from public databases with significant similarity. Protein biochemical information is continuously curated based on the available literature and structural information. Over 6400 proteins have assigned EC numbers and 700 proteins have a PDB structure. The classification (i) reflects the structural features of these enzymes better than their sole substrate specificity, (ii) helps to reveal the evolutionary relationships between these enzymes and (iii) provides a convenient framework to understand mechanistic properties. This resource has been available for over 10 years to the scientific community, contributing to information dissemination and providing a transversal nomenclature to glycobiologists. More recently, this resource has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation. The CAZy resource resides at URL: http://www.cazy.org/

    Doctor of Philosophy

    Get PDF
    dissertationAccurate interpretation of seismic travel times and amplitudes in both the exploration and global scales is complicated by the band-limited nature of seismic data. We present a stochastic method, Viterbi sparse spike detection (VSSD), to reduce a seismic waveform into a most probable constituent spike train. Model waveforms are constructed from a set of candidate spike trains convolved with a source wavelet estimate. For each model waveform, a profile hidden Markov model (HMM) is constructed to represent the waveform as a stochastic generative model with a linear topology corresponding to a sequence of samples. The Viterbi algorithm is employed to simultaneously find the optimal nonlinear alignment between a model waveform and the seismic data, and to assign a score to each candidate spike train. The most probable travel times and amplitudes are inferred from the alignments of the highest scoring models. Our analyses show that the method can resolve closely spaced arrivals below traditional resolution limits and that travel time estimates are robust in the presence of random noise and source wavelet errors. We applied the VSSD method to constrain the elastic properties of a ultralow- velocity zone (ULVZ) at the core-mantle boundary beneath the Coral Sea. We analyzed vertical component short period ScP waveforms for 16 earthquakes occurring in the Tonga-Fiji trench recorded at the Alice Springs Array (ASAR) in central Australia. These waveforms show strong pre and postcursory seismic arrivals consistent with ULVZ layering. We used the VSSD method to measure differential travel-times and amplitudes of the post-cursor arrival ScSP and the precursor arrival SPcP relative to ScP. We compare our measurements to a database of approximately 340,000 synthetic seismograms finding that these data are best fit by a ULVZ model with an S-wave velocity reduction of 24%, a P-wave velocity reduction of 23%, a thickness of 8.5 km, and a density increase of 6%. We simultaneously constrain both P- and S-wave velocity reductions as a 1:1 ratio inside this ULVZ. This 1:1 ratio is not consistent with a partial melt origin to ULVZs. Rather, we demonstrate that a compositional origin is more likely

    Surface antigens and potential virulence factors from parasites detected by comparative genomics of perfect amino acid repeats

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many parasitic organisms, eukaryotes as well as bacteria, possess surface antigens with amino acid repeats. Making up the interface between host and pathogen such repetitive proteins may be virulence factors involved in immune evasion or cytoadherence. They find immunological applications in serodiagnostics and vaccine development. Here we use proteins which contain perfect repeats as a basis for comparative genomics between parasitic and free-living organisms.</p> <p>Results</p> <p>We have developed Reptile <url>http://reptile.unibe.ch</url>, a program for proteome-wide probabilistic description of perfect repeats in proteins. Parasite proteomes exhibited a large variance regarding the proportion of repeat-containing proteins. Interestingly, there was a good correlation between the percentage of highly repetitive proteins and mean protein length in parasite proteomes, but not at all in the proteomes of free-living eukaryotes. Reptile combined with programs for the prediction of transmembrane domains and GPI-anchoring resulted in an effective tool for in silico identification of potential surface antigens and virulence factors from parasites.</p> <p>Conclusion</p> <p>Systemic surveys for perfect amino acid repeats allowed basic comparisons between free-living and parasitic organisms that were directly applicable to predict proteins of serological and parasitological importance. An on-line tool is available at <url>http://genomics.unibe.ch/dora</url>.</p

    Species-level classification of the vaginal microbiome

    Get PDF
    Background The application of next-generation sequencing to the study of the vaginal microbiome is revealing the spectrum of microbial communities that inhabit the human vagina. High-resolution identification of bacterial taxa, minimally to the species level, is necessary to fully understand the association of the vaginal microbiome with bacterial vaginosis, sexually transmitted infections, pregnancy complications, menopause, and other physiological and infectious conditions. However, most current taxonomic assignment strategies based on metagenomic 16S rDNA sequence analysis provide at best a genus-level resolution. While surveys of 16S rRNA gene sequences are common in microbiome studies, few well-curated, body-site-specific reference databases of 16S rRNA gene sequences are available, and no such resource is available for vaginal microbiome studies. Results We constructed the Vaginal 16S rDNA Reference Database, a comprehensive and non-redundant database of 16S rDNA reference sequences for bacterial taxa likely to be associated with vaginal health, and we developed STIRRUPS, a new method that employs the USEARCH algorithm with a curated reference database for rapid species-level classification of 16S rDNA partial sequences. The method was applied to two datasets of V1-V3 16S rDNA reads: one generated from a mock community containing DNA from six bacterial strains associated with vaginal health, and a second generated from over 1,000 mid-vaginal samples collected as part of the Vaginal Human Microbiome Project at Virginia Commonwealth University. In both datasets, STIRRUPS, used in conjunction with the Vaginal 16S rDNA Reference Database, classified more than 95% of processed reads to a species-level taxon using a 97% global identity threshold for assignment. Conclusions This database and method provide accurate species-level classifications of metagenomic 16S rDNA sequence reads that will be useful for analysis and comparison of microbiome profiles from vaginal samples. STIRRUPS can be used to classify 16S rDNA sequence reads from other ecological niches if an appropriate reference database of 16S rDNA sequences is available

    Exonization of active mouse L1s: a driver of transcriptome evolution?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Long interspersed nuclear elements (LINE-1s, L1s) have been recently implicated in the regulation of mammalian transcriptomes.</p> <p>Results</p> <p>Here, we show that members of the three active mouse L1 subfamilies (A, G<sub>F </sub>and T<sub>F</sub>) contain, in addition to those on their sense strands, conserved functional splice sites on their antisense strands, which trigger multiple exonization events. The latter is particularly intriguing in the light of the strong antisense orientation bias of intronic L1s, implying that the toleration of antisense insertions results in an increased potential for exonization.</p> <p>Conclusion</p> <p>In a genome-wide analysis, we have uncovered evidence suggesting that the mobility of the large number of retrotransposition-competent mouse L1s (~2400 potentially active L1s in NCBIm35) has significant potential to shape the mouse transcriptome by continuously generating insertions into transcriptional units.</p

    Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding

    Get PDF
    Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods—i.e., measures of similarity between query and target sequences—provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional “semantic space.” Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space
    corecore