3,534 research outputs found

    SSE: a nucleotide and amino acid sequence analysis platform

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is an increasing need to develop bioinformatic tools to organise and analyse the rapidly growing amount of nucleotide and amino acid sequence data in organisms ranging from viruses to eukaryotes.</p> <p>Finding</p> <p>A simple sequence editor (SSE) was developed to create an integrated environment where sequences can be aligned, annotated, classified and directly analysed by a number of built-in bioinformatic programs. SSE incorporates a sequence editor for the creation of sequence alignments, a process assisted by integrated CLUSTAL/MUSCLE alignment programs and automated removal of indels. Sequences can be fully annotated and classified into groups and annotated of sequences and sequence groups and access to analytical programs that analyse diversity, recombination and RNA secondary structure. Methods for analysing sequence diversity include measures of divergence and evolutionary distances, identity plots to detect regions of nucleotide or amino acid homology, reconstruction of sequence changes, mono-, di- and higher order nucleotide compositional biases and codon usage.</p> <p>Association Index calculations, GroupScans, Bootscanning and TreeOrder scans perform phylogenetic analyses that reconcile group membership with tree branching orders and provide powerful methods for examining segregation of alleles and detection of recombination events. Phylogeny changes across alignments and scoring of branching order differences between trees using the Robinson-Fould algorithm allow effective visualisation of the sites of recombination events.</p> <p>RNA secondary and tertiary structures play important roles in gene expression and RNA virus replication. For the latter, persistence of infection is additionally associated with pervasive RNA secondary structure throughout viral genomic RNA that modulates interactions with innate cell defences. SSE provides several programs to scan alignments for RNA secondary structure through folding energy thermodynamic calculations and phylogenetic methods (detection of co-variant changes, and structure conservation between divergent sequences). These analyses complement methods based on detection of sequence constraints, such as suppression of synonymous site variability.</p> <p>For each program, results can be plotted in real time during analysis through an integrated graphics package, providing publication quality graphs. Results can be also directed to tabulated datafiles for import into spreadsheet or database programs for further analysis.</p> <p>Conclusions</p> <p>SSE combines sequence editor functions with analytical tools in a comprehensive and user-friendly package that assists considerably in bioinformatic and evolution research.</p

    Diversity of tRNA genes in eukaryotes

    Get PDF
    We compare the diversity of chromosomal-encoded transfer RNA (tRNA) genes from 11 eukaryotes as identified by tRNAScan-SE of their respective genomes. They include the budding and fission yeast, worm, fruit fly, fugu, chicken, dog, rat, mouse, chimp and human. The number of tRNA genes are between 170 and 570 and the number of tRNA isoacceptors range from 41 to 55. Unexpectedly, the number of tRNA genes having the same anticodon but different sequences elsewhere in the tRNA body (defined here as tRNA isodecoder genes) varies significantly (10–246). tRNA isodecoder genes allow up to 274 different tRNA species to be produced from 446 genes in humans, but only up to 51 from 275 genes in the budding yeast. The fraction of tRNA isodecoder genes among all tRNA genes increases across the phylogenetic spectrum. A large number of sequence differences in human tRNA isodecoder genes occurs in the internal promoter regions for RNA polymerase III. We also describe a systematic, ligation-based method to detect and quantify tRNA isodecoder molecules in human samples, and show differential expression of three tRNA isodecoders in six human tissues. The large number of tRNA isodecoder genes in eukaryotes suggests that tRNA function may be more diverse than previously appreciated

    The Origin of 2 Sexes Through Optimization of Recombination Entropy Against Time and Energy

    Get PDF
    Sexual reproduction in Nature requires two sexes, which raises the question why the reproductive scheme did not evolve to have three or more sexes. Here we construct a constrained optimization model based on the communication theory to analyze trade-offs among reproductive schemes with arbitrary number of sexes. More sexes on one hand lead to higher reproductive diversity, but on the other hand incur greater cost in time and energy for reproductive success. Our model shows that the two-sexes reproduction scheme maximizes the recombination entropy-to-cost ratio, and hence is the optimal solution to the problem.Comment: 10 pages 5 figures. to appear in Bulletin of Mathematical Biolog

    Integrating Horizontal Gene Transfer and Common Descent to Depict Evolution and Contrast It with β€˜β€˜Common Design

    Get PDF
    Horizontal gene transfer (HGT) and common descent interact in space and time. Because events of HGT co-occur with phylogenetic evolution, it is difficult to depict evolutionary patterns graphically. Tree-like representations of life’s diversification are useful, but they ignore the significance of HGT in evolutionary history, particularly of unicellular organisms, ancestors of multicellular life. Here we integrate the reticulated-tree model, ring of life, symbiogenesis whole-organism model, and eliminative pattern pluralism to represent evolution. Using Entamoeba histolytica alcohol dehydrogenase 2 (EhADH2), a bifunctional enzyme in the glycolytic pathway of amoeba, we illustrate how EhADH2 could be the product of both horizontally acquired features from ancestral prokaryotes (i.e. aldehyde dehydrogenase [ALDH] and alcohol dehydrogenase [ADH]), and subsequent functional integration of these enzymes into EhADH2, which is now inherited by amoeba via common descent. Natural selection has driven the evolution of EhADH2 active sites, which require specific amino acids (cysteine 252 in the ALDH domain; histidine 754 in the ADH domain), iron- and NAD1 as cofactors, and the substrates acetyl-CoA for ALDH and acetaldehyde for ADH. Alternative views invoking β€˜β€˜common design’’ (i.e. the non-naturalistic emergence of major taxa independent from ancestry) to explain the interaction between horizontal and vertical evolution are unfounded

    Blueprint for a high-performance biomaterial: full-length spider dragline silk genes.

    Get PDF
    Spider dragline (major ampullate) silk outperforms virtually all other natural and manmade materials in terms of tensile strength and toughness. For this reason, the mass-production of artificial spider silks through transgenic technologies has been a major goal of biomimetics research. Although all known arthropod silk proteins are extremely large (&gt;200 kiloDaltons), recombinant spider silks have been designed from short and incomplete cDNAs, the only available sequences. Here we describe the first full-length spider silk gene sequences and their flanking regions. These genes encode the MaSp1 and MaSp2 proteins that compose the black widow's high-performance dragline silk. Each gene includes a single enormous exon (&gt;9000 base pairs) that translates into a highly repetitive polypeptide. Patterns of variation among sequence repeats at the amino acid and nucleotide levels indicate that the interaction of selection, intergenic recombination, and intragenic recombination governs the evolution of these highly unusual, modular proteins. Phylogenetic footprinting revealed putative regulatory elements in non-coding flanking sequences. Conservation of both upstream and downstream flanking sequences was especially striking between the two paralogous black widow major ampullate silk genes. Because these genes are co-expressed within the same silk gland, there may have been selection for similarity in regulatory regions. Our new data provide complete templates for synthesis of recombinant silk proteins that significantly improve the degree to which artificial silks mimic natural spider dragline fibers

    Interdependence, Reflexivity, Fidelity, Impedance Matching, and the Evolution of Genetic Coding

    Get PDF
    Genetic coding is generally thought to have required ribozymes whose functions were taken over by polypeptide aminoacyl-tRNA synthetases (aaRS). Two discoveries about aaRS and their interactions with tRNA substrates now furnish a unifying rationale for the opposite conclusion: that the key processes of the Central Dogma of molecular biology emerged simultaneously and naturally from simple origins in a peptideβ€’RNA partnership, eliminating the epistemological utility of a prior RNA world. First, the two aaRS classes likely arose from opposite strands of the same ancestral gene, implying a simple genetic alphabet. The resulting inversion symmetries in aaRS structural biology would have stabilized the initial and subsequent differentiation of coding specificities, rapidly promoting diversity in the proteome. Second, amino acid physical chemistry maps onto tRNA identity elements, establishing reflexive, nanoenvironmental sensing in protein aaRS. Bootstrapping of increasingly detailed coding is thus intrinsic to polypeptide aaRS, but impossible in an RNA world. These notions underline the following concepts that contradict gradual replacement of ribozymal aaRS by polypeptide aaRS: (i) aaRS enzymes must be interdependent; (ii) reflexivity intrinsic to polypeptide aaRS production dynamics promotes bootstrapping; (iii) takeover of RNA-catalyzed aminoacylation by enzymes will necessarily degrade specificity; (iv) the Central Dogma's emergence is most probable when replication and translation error rates remain comparable. These characteristics are necessary and sufficient for the essentially de novo emergence of a coupled gene-replicase-translatase system of genetic coding that would have continuously preserved the functional meaning of genetically encoded protein genes whose phylogenetic relationships match those observed today

    The use of information theory in evolutionary biology

    Full text link
    Information is a key concept in evolutionary biology. Information is stored in biological organism's genomes, and used to generate the organism as well as to maintain and control it. Information is also "that which evolves". When a population adapts to a local environment, information about this environment is fixed in a representative genome. However, when an environment changes, information can be lost. At the same time, information is processed by animal brains to survive in complex environments, and the capacity for information processing also evolves. Here I review applications of information theory to the evolution of proteins as well as to the evolution of information processing in simulated agents that adapt to perform a complex task.Comment: 25 pages, 7 figures. To appear in "The Year in Evolutionary Biology", of the Annals of the NY Academy of Science

    The Challenge of Regulation in a Minimal Photoautotroph: Non-Coding RNAs in Prochlorococcus

    Get PDF
    Prochlorococcus, an extremely small cyanobacterium that is very abundant in the world's oceans, has a very streamlined genome. On average, these cells have about 2,000 genes and very few regulatory proteins. The limited capability of regulation is thought to be a result of selection imposed by a relatively stable environment in combination with a very small genome. Furthermore, only ten non-coding RNAs (ncRNAs), which play crucial regulatory roles in all forms of life, have been described in Prochlorococcus. Most strains also lack the RNA chaperone Hfq, raising the question of how important this mode of regulation is for these cells. To explore this question, we examined the transcription of intergenic regions of Prochlorococcus MED4 cells subjected to a number of different stress conditions: changes in light qualities and quantities, phage infection, or phosphorus starvation. Analysis of Affymetrix microarray expression data from intergenic regions revealed 276 novel transcriptional units. Among these were 12 new ncRNAs, 24 antisense RNAs (asRNAs), as well as 113 short mRNAs. Two additional ncRNAs were identified by homology, and all 14 new ncRNAs were independently verified by Northern hybridization and 5β€²RACE. Unlike its reduced suite of regulatory proteins, the number of ncRNAs relative to genome size in Prochlorococcus is comparable to that found in other bacteria, suggesting that RNA regulators likely play a major role in regulation in this group. Moreover, the ncRNAs are concentrated in previously identified genomic islands, which carry genes of significance to the ecology of this organism, many of which are not of cyanobacterial origin. Expression profiles of some of these ncRNAs suggest involvement in light stress adaptation and/or the response to phage infection consistent with their location in the hypervariable genomic islands
    • …
    corecore