79,619 research outputs found

    On Gene Prediction by Cross-Species Comparative Sequenced Analysis.

    Get PDF
    The availability of large fragments of genomic DNA makes it possible to apply comparative genomics for identification of protein-coding regions. We have conducted a comparative analysis of homologous genomic sequences of organisms with different evolutionary distances and found the conservation of the non-coding regions between closely related organisms. In contrast, more distance shows much less intron similarity but less conversion on the exon structures. We sought to illuminate the impact of evolutionary distances on the performance of our gene-finding program based on the cross-species sequence comparison. Based on our finding and training of data sets, we proposed a model by which coding sequence could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available

    Comparative motif discovery combined with comparative transcriptomics yields accurate targetome and enhancer predictions

    Get PDF
    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date. After six months, it is available under a Creative Commons License Attribution-NonCommercial 3.0 Unported License.-- et al.The identification of transcription factor binding sites, enhancers, and transcriptional target genes often relies on the integration of gene expression profiling and computational cis-regulatory sequence analysis. Methods for the prediction of cis-regulatory elements can take advantage of comparative genomics to increase signal-to-noise levels. However, gene expression data are usually derived from only one species. Here we investigate tissue-specific cross-species gene expression profiling by high-throughput sequencing, combined with cross-species motif discovery. First, we compared different methods for expression level quantification and cross-species integration using Tag-seq data. Using the optimal pipeline, we derived a set of genes with conserved expression during retinal determination across Drosophila melanogaster, Drosophila yakuba, and Drosophila virilis. These genes are enriched for binding sites of eye-related transcription factors including the zinc-finger Glass, a master regulator of photoreceptor differentiation. Validation of predicted Glass targets using RNA-seq in homozygous glass mutants confirms that the majority of our predictions are expressed downstream from Glass. Finally, we tested nine candidate enhancers by in vivo reporter assays and found eight of them to drive GFP in the eye disc, of which seven colocalize with the Glass protein, namely, scrt, chp, dpr10, CG6329, retn, Lim3, and dmrt99B. In conclusion, we show for the first time the combined use of cross-species expression profiling with cross-species motif discovery as a method to define a core developmental program, and we augment the candidate Glass targetome from a single known target gene, lozenge, to at least 62 conserved transcriptional targets.This work is funded by research grants from Research Foundation Flanders (FWO, grant G.0704.11N), University of Leuven (CREA/10/014 and PF/10/016), and Human Frontiers Science Program (RGY0070/2011). M.N.S. is funded by a PhD fellowship from FWO.Peer Reviewe

    Systematic identification of functional plant modules through the integration of complementary data sources

    Get PDF
    A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation

    Functional analysis and transcriptional output of the Göttingen minipig genome

    Get PDF
    In the past decade the Göttingen minipig has gained increasing recognition as animal model in pharmaceutical and safety research because it recapitulates many aspects of human physiology and metabolism. Genome-based comparison of drug targets together with quantitative tissue expression analysis allows rational prediction of pharmacology and cross-reactivity of human drugs in animal models thereby improving drug attrition which is an important challenge in the process of drug development.; Here we present a new chromosome level based version of the Göttingen minipig genome together with a comparative transcriptional analysis of tissues with pharmaceutical relevance as basis for translational research. We relied on mapping and assembly of WGS (whole-genome-shotgun sequencing) derived reads to the reference genome of the Duroc pig and predict 19,228 human orthologous protein-coding genes. Genome-based prediction of the sequence of human drug targets enables the prediction of drug cross-reactivity based on conservation of binding sites. We further support the finding that the genome of Sus scrofa contains about ten-times less pseudogenized genes compared to other vertebrates. Among the functional human orthologs of these minipig pseudogenes we found HEPN1, a putative tumor suppressor gene. The genomes of Sus scrofa, the Tibetan boar, the African Bushpig, and the Warthog show sequence conservation of all inactivating HEPN1 mutations suggesting disruption before the evolutionary split of these pig species. We identify 133 Sus scrofa specific, conserved long non-coding RNAs (lncRNAs) in the minipig genome and show that these transcripts are highly conserved in the African pigs and the Tibetan boar suggesting functional significance. Using a new minipig specific microarray we show high conservation of gene expression signatures in 13 tissues with biomedical relevance between humans and adult minipigs. We underline this relationship for minipig and human liver where we could demonstrate similar expression levels for most phase I drug-metabolizing enzymes. Higher expression levels and metabolic activities were found for FMO1, AKR/CRs and for phase II drug metabolizing enzymes in minipig as compared to human. The variability of gene expression in equivalent human and minipig tissues is considerably higher in minipig organs, which is important for study design in case a human target belongs to this variable category in the minipig. The first analysis of gene expression in multiple tissues during development from young to adult shows that the majority of transcriptional programs are concluded four weeks after birth. This finding is in line with the advanced state of human postnatal organ development at comparative age categories and further supports the minipig as model for pediatric drug safety studies.; Genome based assessment of sequence conservation combined with gene expression data in several tissues improves the translational value of the minipig for human drug development. The genome and gene expression data presented here are important resources for researchers using the minipig as model for biomedical research or commercial breeding. Potential impact of our data for comparative genomics, translational research, and experimental medicine are discussed

    Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants

    Get PDF
    Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research

    Genomic and structural investigation on dolphin morbillivirus (DMV) in Mediterranean fin whales (Balaenoptera physalus).

    Get PDF
    Dolphin morbillivirus (DMV) has been deemed as one of the most relevant threats for fin whales (Balaenoptera physalus) being responsible for a mortality outbreak in the Mediterranean Sea in the last years. Knowledge of the complete viral genome is essential to understand any structural changes that could modify virus pathogenesis and viral tissue tropism. We report the complete DMV sequence of N, P/V/C, M, F and H genes identified from a fin whale and the comparison of primary to quaternary structure of proteins between this fin whale strain and some of those isolated during the 1990-'92 and the 2006-'08 epidemics. Some relevant substitutions were detected, particularly Asn52Ser located on F protein and Ile21Thr on N protein. Comparing mutations found in the fin whale DMV with those occurring in viral strains of other cetacean species, some of them were proven to be the result of diversifying selection, thus allowing to speculate on their role in host adaptation and on the way they could affect the interaction between the viral attachment and fusion with the target host cells

    Genome analysis of the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis cinerea

    Get PDF
    Sclerotinia sclerotiorum and Botrytis cinerea are closely related necrotrophic plant pathogenic fungi notable for their wide host ranges and environmental persistence. These attributes have made these species models for understanding the complexity of necrotrophic, broad host-range pathogenicity. Despite their similarities, the two species differ in mating behaviour and the ability to produce asexual spores. We have sequenced the genomes of one strain of S. sclerotiorum and two strains of B. cinerea. The comparative analysis of these genomes relative to one another and to other sequenced fungal genomes is provided here. Their 38–39 Mb genomes include 11,860–14,270 predicted genes, which share 83% amino acid identity on average between the two species. We have mapped the S. sclerotiorum assembly to 16 chromosomes and found large-scale co-linearity with the B. cinerea genomes. Seven percent of the S. sclerotiorum genome comprises transposable elements compared t

    Simple sequence repeats in zebra finch (Taeniopygia guttata) expressed sequence tags: a new resource for evolutionary genetic studies of passerines

    Get PDF
    Background Passerines (perching birds) are widely studied across many biological disciplines including ecology, population biology, neurobiology, behavioural ecology and evolutionary biology. However, understanding the molecular basis of relevant traits is hampered by the paucity of passerine genomics tools. Efforts to address this problem are underway, and the zebra finch (Taeniopygia guttata) will be the first passerine to have its genome sequenced. Here we describe a bioinformatic analysis of zebra finch expressed sequence tag (EST) Genbank entries. Results A total of 48,862 ESTs were downloaded from GenBank and assembled into contigs, representing an estimated 17,404 unique sequences. The unique sequence set contained 638 simple sequence repeats (SSRs) or microsatellites of length ≥20 bp and purity ≥90% and 144 simple sequence repeats of length ≥30 bp. A chromosomal location for the majority of SSRs was predicted by BLASTing against assembly 2.1 of the chicken genome sequence. The relative exonic location (5' untranslated region, coding region or 3' untranslated region) was predicted for 218 of the SSRs, by BLAST search against the ENSEMBL chicken peptide database. Ten loci were examined for polymorphism in two zebra finch populations and two populations of a distantly related passerine, the house sparrow Passer domesticus. Linkage was confirmed for four loci that were predicted to reside on the passerine homologue of chicken chromosome 7. Conclusion We show that SSRs are abundant within zebra finch ESTs, and that their genomic location can be predicted from sequence similarity with the assembled chicken genome sequence. We demonstrate that a useful proportion of zebra finch EST-SSRs are likely to be polymorphic, and that they can be used to build a linkage map. Finally, we show that many zebra finch EST-SSRs are likely to be useful in evolutionary genetic studies of other passerines

    Discovery of a second SALMFamide gene in the sea urchin Strongylocentrotus purpuratus reveals that L-type and F-type SALMFamide neuropeptides coexist in an echinoderm species

    Get PDF
    NOTICE: this is the author’s version of a work that was accepted for publication in MARINE GENOMICS. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in MARINE GENOMICS, [VOL 3, ISSUE 2, (2010)] DOI: 10.1016/j.margen.2010.08.00
    • …
    corecore