229 research outputs found

    Indexing Strategies for Rapid Searches of Short Words in Genome Sequences

    Get PDF
    Searching for matches between large collections of short (14–30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries

    Towards Alignment Independent Quantitative Assessment of Homology Detection

    Get PDF
    Identification of homologous proteins provides a basis for protein annotation. Sequence alignment tools reliably identify homologs sharing high sequence similarity. However, identification of homologs that share low sequence similarity remains a challenge. Lowering the cutoff value could enable the identification of diverged homologs, but also introduces numerous false hits. Methods are being continuously developed to minimize this problem. Estimation of the fraction of homologs in a set of protein alignments can help in the assessment and development of such methods, and provides the users with intuitive quantitative assessment of protein alignment results. Herein, we present a computational approach that estimates the amount of homologs in a set of protein pairs. The method requires a prevalent and detectable protein feature that is conserved between homologs. By analyzing the feature prevalence in a set of pairwise protein alignments, the method can estimate the number of homolog pairs in the set independently of the alignments' quality. Using the HomoloGene database as a standard of truth, we implemented this approach in a proteome-wide analysis. The results revealed that this approach, which is independent of the alignments themselves, works well for estimating the number of homologous proteins in a wide range of homology values. In summary, the presented method can accompany homology searches and method development, provides validation to search results, and allows tuning of tools and methods

    Characterization of globulin storage proteins of a low prolamin cereal species in relation to celiac disease

    Get PDF
    Brachypodium distachyon, a small annual grass with seed storage globulins as primary protein reserves was used in our study to analyse the toxic nature of non-prolamin seed storage proteins related to celiac disease. The main storage proteins of B. distachyon are the 7S globulin type proteins and the 11S, 12S seed storage globulins similar to oat and rice. Immunoblot analyses using serum samples from celiac disease patients were carried out followed by the identification of immune-responsive proteins using mass spectrometry. Serum samples from celiac patients on a gluten-free diet, from patients with Crohn's disease and healthy subjects, were used as controls. The identified proteins with intense serum-IgA reactivity belong to the 7S and 11-12S seed globulin family. Structure prediction and epitope predictions analyses confirmed the presence of celiac disease-related linear B cell epitope homologs and the presence of peptide regions with strong HLA-DQ8 and DQ2 binding capabilities. These results highlight that both MHC-II presentation and B cell response may be developed not only to prolamins but also to seed storage globulins. This is the first study of the non-prolamin type seed storage proteins of Brachypodium from the aspect of the celiac disease

    A Score of the Ability of a Three-Dimensional Protein Model to Retrieve Its Own Sequence as a Quantitative Measure of Its Quality and Appropriateness

    Get PDF
    BACKGROUND: Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. PRINCIPAL FINDINGS: The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449-460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. CONCLUSION: Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone

    Noisy Splicing Drives mRNA Isoform Diversity in Human Cells

    Get PDF
    While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution

    Molecular characterization of a novel ssRNA ourmia-like virus from the rice blast fungus Magnaporthe oryzae

    Get PDF
    In this study we characterize a novel positive and single stranded RNA (ssRNA) mycovirus isolated from the rice field isolate of Magnaporthe oryzae Guy11. The ssRNA contains a single open reading frame (ORF) of 2,373 nucleotides in length and encodes an RNA-dependent RNA polymerase (RdRp) closely related to ourmiaviruses (plant viruses) and ourmia-like mycoviruses. Accordingly, we name this virus Magnaporthe oryzae ourmia-like virus 1 (MOLV1). Although phylogenetic analysis suggests that MOLV1 is closely related to ourmia and ourmia-like viruses, it has some features never reported before within the Ourmiavirus genus. 3' RLM-RACE (RNA ligase-mediated rapid amplification of cDNA ends) and extension poly(A) tests (ePAT) suggest that the MOLV1 genome contains a poly(A) tail whereas the three cytosine and the three guanine residues present in 5' and 3' untranslated regions (UTRs) of ourmia viruses are not observed in the MOLV1 sequence. The discovery of this novel viral genome supports the hypothesis that plant pathogenic fungi may have acquired this type of viruses from their host plants

    Molecular and cellular mechanisms underlying the evolution of form and function in the amniote jaw.

    Get PDF
    The amniote jaw complex is a remarkable amalgamation of derivatives from distinct embryonic cell lineages. During development, the cells in these lineages experience concerted movements, migrations, and signaling interactions that take them from their initial origins to their final destinations and imbue their derivatives with aspects of form including their axial orientation, anatomical identity, size, and shape. Perturbations along the way can produce defects and disease, but also generate the variation necessary for jaw evolution and adaptation. We focus on molecular and cellular mechanisms that regulate form in the amniote jaw complex, and that enable structural and functional integration. Special emphasis is placed on the role of cranial neural crest mesenchyme (NCM) during the species-specific patterning of bone, cartilage, tendon, muscle, and other jaw tissues. We also address the effects of biomechanical forces during jaw development and discuss ways in which certain molecular and cellular responses add adaptive and evolutionary plasticity to jaw morphology. Overall, we highlight how variation in molecular and cellular programs can promote the phenomenal diversity and functional morphology achieved during amniote jaw evolution or lead to the range of jaw defects and disease that affect the human condition

    Origin of Saxitoxin Biosynthetic Genes in Cyanobacteria

    Get PDF
    BACKGROUND:Paralytic shellfish poisoning (PSP) is a potentially fatal syndrome associated with the consumption of shellfish that have accumulated saxitoxin (STX). STX is produced by microscopic marine dinoflagellate algae. Little is known about the origin and spread of saxitoxin genes in these under-studied eukaryotes. Fortuitously, some freshwater cyanobacteria also produce STX, providing an ideal model for studying its biosynthesis. Here we focus on saxitoxin-producing cyanobacteria and their non-toxic sisters to elucidate the origin of genes involved in the putative STX biosynthetic pathway. METHODOLOGY/PRINCIPAL FINDINGS:We generated a draft genome assembly of the saxitoxin-producing (STX+) cyanobacterium Anabaena circinalis ACBU02 and searched for 26 candidate saxitoxin-genes (named sxtA to sxtZ) that were recently identified in the toxic strain Cylindrospermopsis raciborskii T3. We also generated a draft assembly of the non-toxic (STX-) sister Anabaena circinalis ACFR02 to aid the identification of saxitoxin-specific genes. Comparative phylogenomic analyses revealed that nine putative STX genes were horizontally transferred from non-cyanobacterial sources, whereas one key gene (sxtA) originated in STX+ cyanobacteria via two independent horizontal transfers followed by fusion. In total, of the 26 candidate saxitoxin-genes, 13 are of cyanobacterial provenance and are monophyletic among the STX+ taxa, four are shared amongst STX+ and STX-cyanobacteria, and the remaining nine genes are specific to STX+ cyanobacteria. CONCLUSIONS/SIGNIFICANCE:Our results provide evidence that the assembly of STX genes in ACBU02 involved multiple HGT events from different sources followed presumably by coordination of the expression of foreign and native genes in the common ancestor of STX+ cyanobacteria. The ability to produce saxitoxin was subsequently lost multiple independent times resulting in a nested relationship of STX+ and STX- strains among Anabaena circinalis strains

    Gram Negative Wound Infection in Hospitalised Adult Burn Patients-Systematic Review and Metanalysis-

    Get PDF
    BACKGROUND: Gram negative infection is a major determinant of morbidity and survival. Traditional teaching suggests that burn wound infections in different centres are caused by differing sets of causative organisms. This study established whether Gram-negative burn wound isolates associated to clinical wound infection differ between burn centres. METHODS: Studies investigating adult hospitalised patients (2000-2010) were critically appraised and qualified to a levels of evidence hierarchy. The contribution of bacterial pathogen type, and burn centre to the variance in standardised incidence of Gram-negative burn wound infection was analysed using two-way analysis of variance. PRIMARY FINDINGS: Pseudomonas aeruginosa, Klebsiella pneumoniae, Acinetobacter baumanni, Enterobacter spp., Proteus spp. and Escherichia coli emerged as the commonest Gram-negative burn wound pathogens. Individual pathogens' incidence did not differ significantly between burn centres (F (4, 20) = 1.1, p = 0.3797; r2 = 9.84). INTERPRETATION: Gram-negative infections predominate in burn surgery. This study is the first to establish that burn wound infections do not differ significantly between burn centres. It is the first study to report the pathogens responsible for the majority of Gram-negative infections in these patients. Whilst burn wound infection is not exclusive to these bacteria, it is hoped that reporting the presence of this group of common Gram-negative "target organisms" facilitate clinical practice and target research towards a defined clinical demand.peer-reviewe

    The novel homozygous KCNJ10 c.986T>C (p.(Leu329Pro)) variant is pathogenic for the SeSAME/EAST homologue in Malinois dogs.

    Get PDF
    SeSAME/EAST syndrome is a multisystemic disorder in humans, characterised by seizures, sensorineural deafness, ataxia, developmental delay and electrolyte imbalance. It is exclusively caused by homozygous or compound heterozygous variations in the KCNJ10 gene. Here we describe a similar syndrome in two families belonging to the Malinois dog breed, based on clinical, neurological, electrodiagnostic and histopathological examination. Genetic analysis detected a novel pathogenic KCNJ10 c.986T>C (p.(Leu329Pro)) variant that is inherited in an autosomal recessive way. This variant has an allele frequency of 2.9% in the Belgian Malinois population, but is not found in closely related dog breeds or in dog breeds where similar symptoms have been already described. The canine phenotype is remarkably similar to humans, including ataxia and seizures. In addition, in half of the dogs clinical and electrophysiological signs of neuromyotonia were observed. Because there is currently no cure and treatment is nonspecific and unsatisfactory, this canine translational model could be used for further elucidating the genotype/phenotype correlation of this monogenic multisystem disorder and as an excellent intermediate step for drug safety testing and efficacy evaluations before initiating human studies
    corecore