Search CORE

38 research outputs found

A Plasmodium falciparum FcB1-schizont-EST collection providing clues to schizont specific gene structure and polymorphism

Author: Artiguenave François
Bréhélin Laurent
Charneau Sébastien
Da Silva Corinne
Florent Isabelle
Gascuel Olivier
Grellier Philippe
Guillaume Elodie
Maréchal Eric
Porcel Betina M
Wincker Patrick
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The <it>Plasmodium falciparum </it>genome (3D7 strain) published in 2002, revealed ~5,400 genes, mostly based on <it>in silico </it>predictions. Experimental data is therefore required for structural and functional assessments of <it>P. falciparum </it>genes and expression, and polymorphic data are further necessary to exploit genomic information to further qualify therapeutic target candidates. Here, we undertook a large scale analysis of a <it>P. falciparum </it>FcB1-schizont-EST library previously constructed by suppression subtractive hybridization (SSH) to study genes expressed during merozoite morphogenesis, with the aim of: 1) obtaining an exhaustive collection of schizont specific ESTs, 2) experimentally validating or correcting <it>P. falciparum </it>gene models and 3) pinpointing genes displaying protein polymorphism between the FcB1 and 3D7 strains. Results A total of 22,125 clones randomly picked from the SSH library were sequenced, yielding 21,805 usable ESTs that were then clustered on the <it>P. falciparum </it>genome. This allowed identification of 243 protein coding genes, including 121 previously annotated as hypothetical. Statistical analysis of GO terms, when available, indicated significant enrichment in genes involved in "entry into host-cells" and "actin cytoskeleton". Although most ESTs do not span full-length gene reading frames, detailed sequence comparison of FcB1-ESTs versus 3D7 genomic sequences allowed the confirmation of exon/intron boundaries in 29 genes, the detection of new boundaries in 14 genes and identification of protein polymorphism for 21 genes. In addition, a large number of non-protein coding ESTs were identified, mainly matching with the two A-type rRNA units (on chromosomes 5 and 7) and to a lower extent, two atypical rRNA loci (on chromosomes 1 and 8), TARE subtelomeric regions (several chromosomes) and the recently described telomerase RNA gene (chromosome 9). Conclusion This FcB1-schizont-EST analysis confirmed the actual expression of 243 protein coding genes, allowing the correction of structural annotations for a quarter of these sequences. In addition, this analysis demonstrated the actual transcription of several remarkable non-protein coding loci: 2 atypical rRNA, TARE region and telomerase RNA gene. Together with other collections of <it>P. falciparum </it>ESTs, usually generated from mixed parasite stages, this collection of FcB1-schizont-ESTs provides valuable data to gain further insight into the <it>P. falciparum </it>gene structure, polymorphism and expression.</p

HAL Evry

Crossref

Hal - Université Grenoble Alpes

Springer - Publisher Connector

Directory of Open Access Journals

The Eukaryote Genome Annotation Platform at Genoscope

Author: Benjamin Noel
Betina M. Porcel
Claude Scarpelli
Corinne Da Silva
Fran&#xe7
France Denoeud
Franck Aniere
Jean Weissenbach
Jean-Marc Aury
Olivier Jaillon
Patrick Wincker
Sylvain Bonneval
Publication venue
Publication date: 24/07/2009
Field of study

The Genoscope annotation workflow for eukaryote genomes relies on evidence from ab initio gene models predictions combined with homology searches, using collections of expressed sequences - full length cDNAs, ESTs or massive-scale mRNA sequences from the same or closely related organisms – proteins or other genomic sequences. Global analysis of these drafts or complete sequences are then combining both approaches in the form of gene prediction data integration using GAZE, capable to identify a majority of the existing gene features. Although of very good quality, gene-modelling remains still tentative at the end of the process. Even though computational predictors are useful on large scale annotation for global genomics analysis, there is no complete genome for which all gene structures, in terms of exons, introns and coding regions, have been experimentally confirmed.

Finished genomes can provide exciting insights into the genome organization and evolution. Additional experimental data generated by genome sequencing projects give assistance to genome annotation aiming to a better understanding of the biology of the organism. Therefore, gene models and annotation can be improved by human curation to find errors or to resolve incongruous evidence on the automatic annotation of the genome. 

We now provide to collaborators carrying sequencing projects with a distributed annotation platform allowing expert evaluation of the annotation, in addition to our automated gene prediction pipeline.

To ensure at most the participation of the scientific community, an annotation tool for revising annotations has been set up using components of the Generic Model Organism Database toolkit, which provides tools for managing organism databases. A CHADO database, linked to an Apollo graphical interface, permit users to correct gene structures and store them in a dedicated organism database, as we will show on a few examples. Such a tool would facilitate connecting and comparing predicted annotations with existing biological data, becoming the repository of complete annotated finished genome sequence

Crossref

Nature Precedings

Digital expression profiling of novel diatom transcripts provides insight into their biological functions

Author: Allen Andrew E
Armbrust E Virginia
Bowler Chris
Cadoret Jean-Paul
De Martino Alessandra
Heijde Marc
Jabbari Kamel
Kaas Raymond
Katinka Michaël
La Roche Julie
Lopez Pascal J
Maheswari Uma
Martin-Jézéquel Véronique
Meichenin Agnès
Mock Thomas
Petit Jean-Louis
Porcel Betina M
Schnitzler Parker Micaela
Vardi Assaf
Weissenbach Jean
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Diatoms represent the predominant group of eukaryotic phytoplankton in the oceans and are responsible for around 20% of global photosynthesis. Two whole genome sequences are now available. Notwithstanding, our knowledge of diatom biology remains limited because only around half of their genes can be ascribed a function based onhomology-based methods. High throughput tools are needed, therefore, to associate functions with diatom-specific genes. Results: We have performed a systematic analysis of 130,000 ESTs derived from Phaeodactylum tricornutum cells grown in 16 different conditions. These include different sources of nitrogen, different concentrations of carbon dioxide, silicate and iron, and abiotic stresses such as low temperature and low salinity. Based on unbiased statistical methods, we have catalogued transcripts with similar expression profiles and identified transcripts differentially expressed in response to specific treatments. Functional annotation of these transcripts provides insights into expression patterns of genes involved in various metabolic and regulatory pathways and into the roles of novel genes with unknown functions. Specific growth conditions could be associated with enhanced gene diversity, known gene product functions, and over-representation of novel transcripts. Comparative analysis of data from the other sequenced diatom, Thalassiosira pseudonana, helped identify several unique diatom genes that are specifically regulated under particular conditions, thus facilitating studies of gene function, genome annotation and the molecular basis of species diversity. Conclusions: The digital gene expression database represents a new resource for identifying candidate diatom-specific genes involved in processes of major ecological relevance

OceanRep

HAL Evry

Springer - Publisher Connector

HAL-Inserm

Ghent University Academic Bibliography

PubMed Central

ArchiMer - Institutional Archive of Ifremer

HAL-CEA

University of East Anglia digital repository

Rapid protein evolution, organellar reductions, and invasive intronic elements in the marine aerobic parasite dinoflagellate Amoebophrya spp

Author: Alberti Adriana
Alves-de-Souza Catharina
Aury Jean-Marc
Barbeyron Tristan
Bigeard Estelle
Cai Ruibo
Corre Erwan
Da Silva Corinne
Farhat Sarah
Florent Isabelle
Guillou Laure
Istace Benjamin
Kayal Ehsan
Labadie Karine
Le Phuong
Marie Dominique
Maumus Florian
Mercier Jonathan
Noel Benjamin
Porcel Betina M.
Rombauts Stephane
Rouzé Pierre
Rukwavu Tsinda
Szymczak Jeremy
Tonon Thierry
Van de Peer Yves
Wincker Patrick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Background: Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (similar to 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. Results: We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions. Conclusion: These results expand the range of atypical genome features found in basal dinoflagellates and raise questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this particular unicellular lineage

HAL Evry

Ghent University Academic Bibliography

HAL Descartes

HAL-INSU

HAL-CEA

White Rose Research Online

Hal-Diderot

UPSpace at the University of Pretoria

Rapid protein evolution, organellar reductions, and invasive intronic elements in the marine aerobic parasite dinoflagellate Amoebophrya spp

Author: Alberti Adriana
Alves-de-Souza Catharina
Aury Jean-Marc
Barbeyron Tristan
Bigeard Estelle
Cai Ruibo
Corre Erwan
Da Silva Corinne
Farhat Sarah
Florent Isabelle
Guillou Laure
Istace Benjamin
Kayal Ehsan
Labadie Karine
Le Phuong
Marie Dominique
Maumus Florian
Mercier Jonathan
Noel Benjamin
Porcel Betina M.
Rombauts Stephane
Rouze Pierre
Rukwavu Tsinda
Szymczak Jeremy
Tonon Thierry
Van de Peer Yves
Wincker Patrick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/01/2021
Field of study

BACKGROUND : Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (~ 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. RESULTS : We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions. CONCLUSION : These results expand the range of atypical genome features found in basal dinoflagellates and raise questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this particular unicellular lineage.ADDITIONAL FILE 1: FIGURE S1. Phylogeny of Alveolata. Proteomes from 89 alveolates genomes and transcriptome assemblies from the MMETSP project (https://zenodo.org/record/257026/files/) were used to create orthologous groups using orthofinder v2.2 with the diamond BLAST similarity search. Single ortholog alignments were pruned using PhyloTreePruner v.1.0 (minimum taxa to keep 44 and support value 0.9) and realigned using mafft v7 and filtered with Gblocks v.0.91b (−b5 = a -p = n). Filtered alignments were concatenated using seqCat.pl and a phylogenetic tree was produced under Maximum Likelihood framework using RAxML v8.2.9 with the PROTGAMMALGF model of sequence evolution and 101 bootstraps. Asterics represent support values of 95 and above. A detailed method can be found in Kayal et al. 2018 BMC Evol. Biol. (https://doi.org/10.1186/s12862-018-1142-0). The full tree can be found at http://mmo.sb-roscoff.fr/jbrowseAmoebophrya/. FIGURE S2. SSU rDNA sequence identity (in percentage, relative to A25 and A120 compared to other species). FIGURE S3. Distribution of k-mer in A25 and A120 genomes. FIGURE S4. Classification of repeated elements in 3 Amoebophrya genomes (AT5, A25, and A120) using REPET. The x-axis represents the cumulated number of bases of repeated elements in the genome. FIGURE S5. Conserved motif of the putative splice leader (SL) in A25 and A120. FIGURE S6. Alignments of gene encoding the putative spliced leader (SL) gene in A25 and A120. FIGURE S7. Gene orientation change rate in 3 Amoebophrya genomes. FIGURE S8. Number of orthologs genes shared by selected taxa. FIGURE S9. Boxplot of the dN/dS ratios of orthologous genes between A25 and A120, calculated using the model average method (MA). FIGURE S10. Synteny dot-plot obtained by comparison between Amoebophrya A25 and AT5 genomes. FIGURE S11. Synteny dot-plot obtained by comparison between Amoebophrya A120 and AT5 genomes. FIGURE S12. Intron length distribution. FIGURE S13. GC content distribution. FIGURE S14. Multiple alignments of U2 snRNAs. FIGURE S15. Multiple alignments of U4 snRNAs. FIGURE S16. Multiple alignments of U5 snRNAs. FIGURE S17. Multiple alignments of U6 snRNAs. FIGURE S18. Secondary structure of Amoebophrya snRNA. FIGURE S19. Example of introner elements (IEs) in Amoebophrya. FIGURE S20. Distribution the direct repeats with size ranging between 3 and 8 nucleotides in A25. FIGURE S21. Distribution of the direct repeats with size ranging between 3 and 8 nucleotides in A120. FIGURE S22. Composition of direct repeats in introners elements. The diversity in composition of the three (a, b, c) most abundant of direct repeats in introner elements in A25 (up) and A120 (down). FIGURE S23. Terminal inverted repeat locations around the splicing sites in A25 and A120. The position of inverted repeats according to the location of the splice sites in A25 and A120. Left, the inverted repeats of A120 are located at 1–5 the nucleotides upstream and downstream of the splice sites. Right, the inverted repeats of A25 are located at the 1–6 nucleotides in upstream and downstream of the splice sites. FIGURE S24. The flowchart for the in silico search of introner elements. FIGURE S25. Hierarchical clustering analysis (pairwise similarity and OrthoMCL) of all intron families and of the inverted repeats in A25 and A120. FIGURE S26. Percentage of genes with assigned functions in relation with introns composition. FIGURE S27. Difference in the proportion of IEs-containing-genes compared to their KEGG assignment in A25 and A120. FIGURE S28. Distribution of conserved introns. TABLE S1. RCC number, date and site of isolation of strains considered in this study. TABLE S2. Metrics of Nanopore runs for the two Amoebophrya strains. TABLE S3. Search for pathways involved in plastidial functions that are entirely independent of plastid-encoded gene content. TABLE S4. Number of the different types of introns identified in A25 and A120 genomes. TABLE S5. Search for RNA editing in A25 and A120 introns. TABLE S6. Putative Amoebophrya A25 and A120 snRNP homologs. TABLE S7. Classification into families of non-canonical introns in A25 and A120. TABLE S8. RNAseq read assembly statistics of Amoebophrya A25 and A120 corresponding samples from the different time of infection and to the freeliving stage (dinospore only). TABLE S9. Total number of contigs belonging to samples from different stages of infection and the proportion of them that were aligned against the genomes of both Amoebophrya A25 and A120. ND corresponds to “not determined” when no measurement was done. TABLE S10. Metabolic pathway screened in A25 and A120 proteomes.This research was funded by the ANR (Agence Nationale de la Recherche) Grant ANR-14-CE02-0007 HAPAR, the CEA and the Région Bretagne (RC doctoral grant ARED PARASITE 9450 and EK postdoctoral grant SAD HAPAR 9229), and the CNRS (X-life SEAgOInG).http://www.mdpi.com/journal/biomedicinesam2022BiochemistryGeneticsMicrobiology and Plant Patholog

UPSpace at the University of Pretoria

The streamlined genome of Phytomonas spp. relative to human pathogenic kinetoplastids reveals a parasite tailored for plants

Author: A Alonso
A Brighouse
A Dereeper
A Krogh
A ten Have
AC Ivens
AD Uttaro
AD Uttaro
AH Fairlamb
AL Santos
AP Jackson
AP Jackson
Arnaud Couloux
AV Andreeva
B Andersson
B Liu
B Szoor
Balázs Szöőr
Benjamin Noel
Betina M. Porcel
C Donovan
C Louise
C Marin
C Marin
CG Elias
CM d'Avila-Levy
Corinne Da Silva
CS Peacock
D Malvy
DA Maslov
DA Maslov
Dan Zilberstein
David A. Campbell
E Arner
E Birney
E Kemen
E Muller
E Pennisi
E Peyretaillade
EM Gertz
EP Camargo
EP Camargo
F Bringaud
F Bringaud
F Bringaud
F Bringaud
F Bringaud
F Bringaud
F Chaumont
F Raymond
F Stegmeier
FR Opperdoes
France Denoeud
Fred Opperdoes
Frédéric Bringaud
G Lopez
G Mair
G Stahel
G Widmer
GE Canepa
GR Wyatt
H Vermeulen
IL Mauricio
J Amselem
J Castresana
J Kamper
J Lukes
J Votypka
JD Bendtsen
Jean-Marc Aury
Jeremy C. Mottram
JM Aury
JM Silverman
John M. McDowell
JP Ackers
JP Daniels
JR Coura
JR Naglik
Julie Poulain
Julius Lukeš
K Miranda
Kamel Jabbari
KD Stuart
KL Howe
L Koreny
L Li
L Simpson
L Simpson
LT Guerreiro
M Akerman
M Aslett
M Berriman
M den Boer
M Dollet
M Dollet
M Dollet
M Dollet
M Dollet
M Elias
M Kube
M Parsons
M Sanchez-Moreno
M Smith
Mark C. Field
MB Rogers
MC Lee
MH Saier Jr
Michael Katinka
Michel Dollet
ML Ammerman
Mohammed-Amine Madoui
MV Parthasarathy
Nancy R. Sturm
Nicholas J. Dickens
NM El-Sayed
NM El-Sayed
NR Sturm
NS Akopyants
P Horvath
P Nawathean
Patrick Wincker
Pavel Flegontov
PT Manna
R Binet
R Brenchley
R Desmier De Chenon
R Docampo
R Li
R Magan
R Weeks
RM Corrales
Roberto Docampo
Roxana Cintron
S Marche
S Marche
S Martinez-Calvillo
S Raffaele
Sandrine Fabre
SF Altschul
SF Altschul
Shulamit Michaeli
SK Natesan
SM Molinas
T Furuya
Tansy C. Hammarton
TM Lowe
TN Siegel
V Hannaert
V. Lila Koumandou
W Busch
WJ Kent
X Bai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Members of the family Trypanosomatidae infect many organisms, including animals, plants and humans. Plant-infecting trypanosomes are grouped under the single genus Phytomonas, failing to reflect the wide biological and pathological diversity of these protists. While some Phytomonas spp. multiply in the latex of plants, or in fruit or seeds without apparent pathogenicity, others colonize the phloem sap and afflict plants of substantial economic value, including the coffee tree, coconut and oil palms. Plant trypanosomes have not been studied extensively at the genome level, a major gap in understanding and controlling pathogenesis. We describe the genome sequences of two plant trypanosomatids, one pathogenic isolate from a Guianan coconut and one non-symptomatic isolate from Euphorbia collected in France. Although these parasites have extremely distinct pathogenic impacts, very few genes are unique to either, with the vast majority of genes shared by both isolates. Significantly, both Phytomonas spp. genomes consist essentially of single copy genes for the bulk of their metabolic enzymes, whereas other trypanosomatids e.g. Leishmania and Trypanosoma possess multiple paralogous genes or families. Indeed, comparison with other trypanosomatid genomes revealed a highly streamlined genome, encoding for a minimized metabolic system while conserving the major pathways, and with retention of a full complement of endomembrane organelles, but with no evidence for functional complexity. Identification of the metabolic genes of Phytomonas provides opportunities for establishing in vitro culturing of these fastidious parasites and new tools for the control of agricultural plant disease. © 2014 Porcel et al

HAL Evry

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia

Author: Anthouard Véronique
Aury Jean-Marc
Daubin Vincent
Duret Laurent
Jaillon Olivier
Jubin Claire
Noel Benjamin
Plattner Helmut
Porcel Betina M.
Ségurens Béatrice
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The duplication of entire genomes has long been recognized as having great potential for evolutionary novelties, but the mechanisms underlying their resolution through gene loss are poorly understood. Here we show that in the unicellular eukaryote Paramecium tetraurelia, a ciliate, most of the nearly 40,000 genes arose through at least three successive whole-genome duplications. Phylogenetic analysis indicates that the most recent duplication coincides with an explosion of speciation events that gave rise to the P. aurelia complex of 15 sibling species. We observed that gene loss occurs over a long timescale, not as an initial massive event. Genes from the same metabolic pathway or protein complex have common patterns of gene loss, and highly expressed genes are over-retained after all duplications. The conclusion of this analysis is that many genes are maintained after whole-genome duplication not because of functional innovation but because of gene dosage constraints

KOPS - The Institutional Repository of the University of Konstanz

Trypanosoma cruzi:Specific Detection of Parasites by PCR in Infected Humans and Vectors Using a Set of Primers (BP1/BP2) Targeted to a Nuclear DNA Sequence

Author: Alvarez
Andrés M. Ruiz
Araujo
Ariel M. Silber
Avila
Betina M. Porcel
Brechot
Castro
Elsa L. Segura
Ferre
Galvao
Garson
Gonzalez
Guy
Jacqueline Búa
Kain
Kirchhoff
Long
Maniatis
Maniatis
Meredith
Moncayo
Moser
Murthy
Ouaissi
Porcel
Ramos
Segura
Smith
Souto
Sturm
Taibi
Teixeira
Wincker
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Numerous Novel Annotations of the Human Genome Sequence Supported by a 5′-End–Enriched cDNA Collection

Author: Castelli Vanina
Cruaud Corinne
De Berardinis Veronique
Delfour Olivier
Friedlander Lucie
Gyapay Gabor
Porcel Betina M.
Salanoubat Marcel
Saurin William
Scarpelli Claude
Schächter Vincent
Ureta-Vidal Abel
Weissenbach Jean
Wincker Patrick
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/03/2004
Field of study

A collection of 90,000 human cDNA clones generated to increase the fraction of “full-length” cDNAs available was analyzed by sequence alignment on the human genome assembly. Five hundred fifty-two gene models not found in LocusLink, with coding regions of at least 300 bp, were defined by using this collection. Exon composition proposed for novel genes showed an average of 4.7 exons per gene. In 20% of the cases, at least half of the exons predicted for new genes coincided with evolutionary conserved regions defined by sequence comparisons with the pufferfish Tetraodon nigroviridis. Among this subset, CpG islands were observed at the 5′ end of 75%. In-frame stop codons upstream of the initiator ATG were present in 49% of the new genes, and 16% contained a coding region comprising at least 50% of the cDNA sequence. This cDNA resource also provided candidate small protein-coding genes, usually not included in genome annotations. In addition, analysis of a sample from this cDNA collection indicates that ∼380 gene models described in LocusLink could be extended at their 5′ end by at least one new exon. Finally, this cDNA resource provided an experimental support for annotations based exclusively on predictions, thus representing a resource substantially improving the human genome annotation

Crossref

PubMed Central