316 research outputs found

    Frequency Analysis Techniques for Identification of Viral Genetic Data

    Get PDF
    Environmental metagenomic samples and samples obtained as an attempt to identify a pathogen associated with the emergence of a novel infectious disease are important sources of novel microorganisms. The low costs and high throughput of sequencing technologies are expected to allow for the genetic material in those samples to be sequenced and the genomes of the novel microorganisms to be identified by alignment to those in a database of known genomes. Yet, for various biological and technical reasons, such alignment might not always be possible. We investigate a frequency analysis technique which on one hand allows for the identification of genetic material without relying on alignment and on the other hand makes possible the discovery of nonoverlapping contigs from the same organism. The technique is based on obtaining signatures of the genetic data and defining a distance/similarity measure between signatures. More precisely, the signatures of the genetic data are the frequencies of k-mers occurring in them, with k being a natural number. We considered an entropy-based distance between signatures, similar to the Kullback-Leibler distance in information theory, and investigated its ability to categorize negative-sense single-stranded RNA (ssRNA) viral genetic data. Our conclusion is that in this viral context, the technique provides a viable way of discovering genetic relationships without relying on alignment. We envision that our approach will be applicable to other microbial genetic contexts, e.g., other types of viruses, and will be an important tool in the discovery of novel microorganisms

    Fixation, transient landscape and diffusion's dilemma in stochastic evolutionary game dynamics

    Full text link
    Agent-based stochastic models for finite populations have recently received much attention in the game theory of evolutionary dynamics. Both the ultimate fixation and the pre-fixation transient behavior are important to a full understanding of the dynamics. In this paper, we study the transient dynamics of the well-mixed Moran process through constructing a landscape function. It is shown that the landscape playing a central theoretical "device" that integrates several lines of inquiries: the stable behavior of the replicator dynamics, the long-time fixation, and continuous diffusion approximation associated with asymptotically large population. Several issues relating to the transient dynamics are discussed: (i) multiple time scales phenomenon associated with intra- and inter-attractoral dynamics; (ii) discontinuous transition in stochastically stationary process akin to Maxwell construction in equilibrium statistical physics; and (iii) the dilemma diffusion approximation facing as a continuous approximation of the discrete evolutionary dynamics. It is found that rare events with exponentially small probabilities, corresponding to the uphill movements and barrier crossing in the landscape with multiple wells that are made possible by strong nonlinear dynamics, plays an important role in understanding the origin of the complexity in evolutionary, nonlinear biological systems.Comment: 34 pages, 4 figure

    A systems approach to model natural variation in reactive properties of bacterial ribosomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Natural variation in protein output from translation in bacteria and archaea may be an organism-specific property of the ribosome. This paper adopts a systems approach to model the protein output as a measure of specific ribosome reactive properties in a ribosome-mediated translation apparatus. We use the steady-state assumption to define a transition state complex for the ribosome, coupled with mRNA, tRNA, amino acids and reaction factors, as a subsystem that allows a focus on the completed translational output as a measure of specific properties of the ribosome.</p> <p>Results</p> <p>In analogy to the steady-state reaction of an enzyme complex, we propose a steady-state translation complex for mRNA from any gene, and derive a maximum specific translation activity, <it>T</it><sub><it>a</it>(max)</sub>, as a property of the ribosomal reaction complex. <it>T</it><sub><it>a</it>(max) </sub>has units of <it>a</it>-protein output per time per <it>a</it>-specific mRNA. A related property of the ribosome, <inline-formula><m:math name="1752-0509-2-62-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>T</m:mi><m:mo>˜</m:mo></m:mover><m:mrow><m:mi>a</m:mi><m:mo stretchy="false">(</m:mo><m:mi>max</m:mi><m:mo>⁡</m:mo><m:mo stretchy="false">)</m:mo></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF"> MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmivaqLbaGaadaWgaaWcbaGaemyyaeMaeiikaGIagiyBa0MaeiyyaeMaeiiEaGNaeiykaKcabeaaaaa@3464@</m:annotation></m:semantics></m:math></inline-formula>, has units of <it>a</it>-protein per time per total RNA with the relationship <inline-formula><m:math name="1752-0509-2-62-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>T</m:mi><m:mo>˜</m:mo></m:mover><m:mrow><m:mi>a</m:mi><m:mo stretchy="false">(</m:mo><m:mi>max</m:mi><m:mo>⁡</m:mo><m:mo stretchy="false">)</m:mo></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF"> MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmivaqLbaGaadaWgaaWcbaGaemyyaeMaeiikaGIagiyBa0MaeiyyaeMaeiiEaGNaeiykaKcabeaaaaa@3464@</m:annotation></m:semantics></m:math></inline-formula> = <it>ρ</it><sub><it>a </it></sub><it>T</it><sub><it>a</it>(max)</sub>, where <it>ρ</it><sub><it>a </it></sub>represents the fraction of total RNA committed to translation output of <it>P</it><sub><it>a </it></sub>from gene <it>a </it>message. <it>T</it><sub><it>a</it>(max) </sub>as a ribosome property is analogous to <it>k</it><sub>cat </sub>for a purified enzyme, and <inline-formula><m:math name="1752-0509-2-62-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>T</m:mi><m:mo>˜</m:mo></m:mover><m:mrow><m:mi>a</m:mi><m:mo stretchy="false">(</m:mo><m:mi>max</m:mi><m:mo>⁡</m:mo><m:mo stretchy="false">)</m:mo></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF"> MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmivaqLbaGaadaWgaaWcbaGaemyyaeMaeiikaGIagiyBa0MaeiyyaeMaeiiEaGNaeiykaKcabeaaaaa@3464@</m:annotation></m:semantics></m:math></inline-formula> is analogous to enzyme specific activity in a crude extract.</p> <p>Conclusion</p> <p>Analogy to an enzyme reaction complex led us to a ribosome reaction model for measuring specific translation activity of a bacterial ribosome. We propose to use this model to design experimental tests of our hypothesis that specific translation activity is a ribosomal property that is subject to natural variation and natural selection much like <it>V</it><sub>max </sub>and <it>K</it><sub>m </sub>for any specific enzyme.</p

    Luminally expressed gastrointestinal biomarkers

    Get PDF
    Introduction: A biomarker is a measurable indicator of normal biologic processes, pathogenic processes or pharmacological responses. The identification of a useful biomarker is challenging, with several hurdles to overcome before clinical adoption. This review gives a general overview of a range of biomarkers associated with inflammatory bowel disease or colorectal cancer along the gastrointestinal tract. Areas covered: These markers include those that are already clinically accepted, such as inflammatory markers such as faecal calprotectin, S100A12 (Calgranulin C), Fatty Acid Binding Proteins (FABP), malignancy markers such as Faecal Occult Blood, Mucins, Stool DNA, Faecal microRNA (miRNA), other markers such as Faecal Elastase, Faecal alpha-1-antitrypsin, Alpha2-macroglobulin and possible future markers such as microbiota, volatile organic compounds and pH. Expert commentary: There are currently a few biomarkers that have been sufficiently validated for routine clinical use at present such as FC. However, many of these biomarkers continue to be limited in sensitivity and specificity for various GI diseases. Emerging biomarkers have the potential to improve diagnosis and monitoring but further study is required to determine efficacy and validate clinical utility

    Large introns in relation to alternative splicing and gene evolution: a case study of Drosophila bruno-3

    Get PDF
    Background: Alternative splicing (AS) of maturing mRNA can generate structurally and functionally distinct transcripts from the same gene. Recent bioinformatic analyses of available genome databases inferred a positive correlation between intron length and AS. To study the interplay between intron length and AS empirically and in more detail, we analyzed the diversity of alternatively spliced transcripts (ASTs) in the Drosophila RNA-binding Bruno-3 (Bru-3) gene. This gene was known to encode thirteen exons separated by introns of diverse sizes, ranging from 71 to 41,973 nucleotides in D. melanogaster. Although Bru-3's structure is expected to be conducive to AS, only two ASTs of this gene were previously described. Results: Cloning of RT-PCR products of the entire ORF from four species representing three diverged Drosophila lineages provided an evolutionary perspective, high sensitivity, and long-range contiguity of splice choices currently unattainable by high-throughput methods. Consequently, we identified three new exons, a new exon fragment and thirty-three previously unknown ASTs of Bru-3. All exon-skipping events in the gene were mapped to the exons surrounded by introns of at least 800 nucleotides, whereas exons split by introns of less than 250 nucleotides were always spliced contiguously in mRNA. Cases of exon loss and creation during Bru-3 evolution in Drosophila were also localized within large introns. Notably, we identified a true de novo exon gain: exon 8 was created along the lineage of the obscura group from intronic sequence between cryptic splice sites conserved among all Drosophila species surveyed. Exon 8 was included in mature mRNA by the species representing all the major branches of the obscura group. To our knowledge, the origin of exon 8 is the first documented case of exonization of intronic sequence outside vertebrates. Conclusion: We found that large introns can promote AS via exon-skipping and exon turnover during evolution likely due to frequent errors in their removal from maturing mRNA. Large introns could be a reservoir of genetic diversity, because they have a greater number of mutable sites than short introns. Taken together, gene structure can constrain and/or promote gene evolution

    Island method for estimating the statistical significance of profile-profile alignment scores

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many experiments suggest that the distribution of local profile-profile alignment scores is of the Gumbel form. However, estimating distribution parameters by random simulations turns out to be computationally very expensive.</p> <p>Results</p> <p>We demonstrate that the background distribution of profile-profile alignment scores heavily depends on profiles' composition and thus the distribution parameters must be estimated independently, for each pair of profiles of interest. We also show that accurate estimates of statistical parameters can be obtained using the "island statistics" for profile-profile alignments.</p> <p>Conclusion</p> <p>The island statistics can be generalized to profile-profile alignments to provide an efficient method for the alignment score normalization. Since multiple island scores can be extracted from a single comparison of two profiles, the island method has a clear speed advantage over the direct shuffling method for comparable accuracy in parameter estimates.</p

    How accurately is ncRNA aligned within whole-genome multiple alignments?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multiple alignment of homologous DNA sequences is of great interest to biologists since it provides a window into evolutionary processes. At present, the accuracy of whole-genome multiple alignments, particularly in noncoding regions, has not been thoroughly evaluated.</p> <p>Results</p> <p>We evaluate the alignment accuracy of certain noncoding regions using noncoding RNA alignments from Rfam as a reference. We inspect the MULTIZ 17-vertebrate alignment from the UCSC Genome Browser for all the human sequences in the Rfam seed alignments. In particular, we find 638 instances of chimeric and partial alignments to human noncoding RNA elements, of which at least 225 can be improved by straightforward means. As a byproduct of our procedure, we predict many novel instances of known ncRNA families that are suggested by the alignment.</p> <p>Conclusion</p> <p>MULTIZ does a fairly accurate job of aligning these genomes in these difficult regions. However, our experiments indicate that better alignments exist in some regions.</p

    Minimal Functional Sites Allow a Classification of Zinc Sites in Proteins

    Get PDF
    Zinc is indispensable to all forms of life as it is an essential component of many different proteins involved in a wide range of biological processes. Not differently from other metals, zinc in proteins can play different roles that depend on the features of the metal-binding site. In this work, we describe zinc sites in proteins with known structure by means of three-dimensional templates that can be automatically extracted from PDB files and consist of the protein structure around the metal, including the zinc ligands and the residues in close spatial proximity to the ligands. This definition is devised to intrinsically capture the features of the local protein environment that can affect metal function, and corresponds to what we call a minimal functional site (MFS). We used MFSs to classify all zinc sites whose structures are available in the PDB and combined this classification with functional annotation as available in the literature. We classified 77% of zinc sites into ten clusters, each grouping zinc sites with structures that are highly similar, and an additional 16% into seven pseudo-clusters, each grouping zinc sites with structures that are only broadly similar. Sites where zinc plays a structural role are predominant in eight clusters and in two pseudo-clusters, while sites where zinc plays a catalytic role are predominant in two clusters and in five pseudo-clusters. We also analyzed the amino acid composition of the coordination sphere of zinc as a function of its role in the protein, highlighting trends and exceptions. In a period when the number of known zinc proteins is expected to grow further with the increasing awareness of the cellular mechanisms of zinc homeostasis, this classification represents a valuable basis for structure-function studies of zinc proteins, with broad applications in biochemistry, molecular pharmacology and de novo protein design
    corecore