14 research outputs found

    The inherent occurrence of complex intron-rich spliceosomal split genes, including regulatory and splicing elements, within pre-biotic random genetic sequences

    Get PDF
    Growing evidence indicates that complex intron-rich split genes and an advanced spliceosome existed in the earliest eukaryote, and possibly the first life form. We sought to examine how these split genes could have originated in the prebiotic system. We previously found that split coding sequences for complex proteins occur in abundance in random DNA sequences (P. Senapathy, et al, accompanying paper). This study demonstrates that a full complement of exons, introns and regulatory and splicing elements could have also occurred inherently within pre-biotic chemistry by chance. By comparing the characteristics of split genes found in computer-generated random genetic sequences with those of several extant eukaryotes, we show that an abundance of intron-rich split genes akin to those present in modern eukaryotes could have existed in the prebiotic system. These findings answer the post-genomic question of why the earliest life form contained highly complex intron-rich split genes, and, in conjunction with our companion study, show how they could encode a complex spliceosome

    Origin of biological information: Inherent occurrence of intron-rich split genes, coding for complex extant proteins, within pre-biotic random genetic sequences

    Get PDF
    The origin of biological information is an unexplained phenomenon. Prior research in resolving the origin of proteins, based on the assumption that the first genes were contiguous prokaryotic sequences has not succeeded. Rather, it has been established that contiguous protein-coding genes do not exist in practically any amount of random genetic sequences. We found that complex eukaryotic proteins could be inherently encoded in split genes that could exist by chance within mere micrograms to milligrams of random DNA. Using protein amino acid sequence variability, codon degeneracy, and stringent exon-length restriction, we demonstrate that split genes for proteins of extant eukaryotes occur extensively in random genetic sequences. The results provide evidence that an abundance of split genes encoding advanced proteins in a small amount of prebiotic genetic material could have ignited the evolution of the eukaryotic genome

    Origination of the Split Structure of Spliceosomal Genes from Random Genetic Sequences

    Get PDF
    The mechanism by which protein-coding portions of eukaryotic genes came to be separated by long non-coding stretches of DNA, and the purpose for this perplexing arrangement, have remained unresolved fundamental biological problems for three decades. We report here a plausible solution to this problem based on analysis of open reading frame (ORF) length constraints in the genomes of nine diverse species. If primordial nucleic acid sequences were random in sequence, functional proteins that are innately long would not be encoded due to the frequent occurrence of stop codons. The best possible way that a long protein-coding sequence could have been derived was by evolving a split-structure from the random DNA (or RNA) sequence. Results of the systematic analyses of nine complete genome sequences presented here suggests that perhaps the major underlying structural features of split-genes have evolved due to the indigenous occurrence of split protein-coding genes in primordial random nucleotide sequence. The results also suggest that intron-rich genes containing short exons may have been the original form of genes intrinsically occurring in random DNA, and that intron-poor genes containing long exons were perhaps derived from the original intron-rich genes

    ExDom: an integrated database for comparative analysis of the exon–intron structures of protein domains in eukaryotes

    Get PDF
    We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/

    Targeted Genome-Wide Enrichment of Functional Regions

    Get PDF
    Only a small fraction of large genomes such as that of the human contains the functional regions such as the exons, promoters, and polyA sites. A platform technique for selective enrichment of functional genomic regions will enable several next-generation sequencing applications that include the discovery of causal mutations for disease and drug response. Here, we describe a powerful platform technique, termed “functional genomic fingerprinting” (FGF), for the multiplexed genomewide isolation and analysis of targeted regions such as the exome, promoterome, or exon splice enhancers. The technique employs a fixed part of a uniquely designed Fixed-Randomized primer, while the randomized part contains all the possible sequence permutations. The Fixed-Randomized primers bind with full sequence complementarity at multiple sites where the fixed sequence (such as the splice signals) occurs within the genome, and multiplex amplify many regions bounded by the fixed sequences (e.g., exons). Notably, validation of this technique using cardiac myosin binding protein-C (MYBPC3) gene as an example strongly supports the application and efficacy of this method. Further, assisted by genomewide computational analyses of such sequences, the FGF technique may provide a unique platform for high-throughput sample production and analysis of targeted genomic regions by the next-generation sequencing techniques, with powerful applications in discovering disease and drug response genes

    RoBuST: an integrated genomics resource for the root and bulb crop families Apiaceae and Alliaceae

    No full text
    Abstract Background Root and bulb vegetables (RBV) include carrots, celeriac (root celery), parsnips (Apiaceae), onions, garlic, and leek (Alliaceae)—food crops grown globally and consumed worldwide. Few data analysis platforms are currently available where data collection, annotation and integration initiatives are focused on RBV plant groups. Scientists working on RBV include breeders, geneticists, taxonomists, plant pathologists, and plant physiologists who use genomic data for a wide range of activities including the development of molecular genetic maps, delineation of taxonomic relationships, and investigation of molecular aspects of gene expression in biochemical pathways and disease responses. With genomic data coming from such diverse areas of plant science, availability of a community resource focused on these RBV data types would be of great interest to this scientific community. Description The RoBuST database has been developed to initiate a platform for collecting and organizing genomic information useful for RBV researchers. The current release of RoBuST contains genomics data for 294 Alliaceae and 816 Apiaceae plant species and has the following features: (1) comprehensive sequence annotations of 3663 genes 5959 RNAs, 22,723 ESTs and 11,438 regulatory sequence elements from Apiaceae and Alliaceae plant families; (2) graphical tools for visualization and analysis of sequence data; (3) access to traits, biosynthetic pathways, genetic linkage maps and molecular taxonomy data associated with Alliaceae and Apiaceae plants; and (4) comprehensive plant splice signal repository of 659,369 splice signals collected from 6015 plant species for comparative analysis of plant splicing patterns. Conclusions RoBuST, available at http://robust.genome.com, provides an integrated platform for researchers to effortlessly explore and analyze genomic data associated with root and bulb vegetables.</p
    corecore