35 research outputs found

    Systematic analysis of mRNA 5' coding sequence incompleteness in Danio rerio: an automated EST-based approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>All standard methods for cDNA cloning are affected by a potential inability to effectively clone the 5' region of mRNA. The aim of this work was to estimate mRNA open reading frame (ORF) 5' region sequence completeness in the model organism <it>Danio rerio </it>(zebrafish).</p> <p>Results</p> <p>We implemented a novel automated approach (<it>5'_ORF_Extender</it>) that systematically compares available expressed sequence tags (ESTs) with all the zebrafish experimentally determined mRNA sequences, identifies additional sequence stretches at 5' region and scans for the presence of all conditions needed to define a new, extended putative ORF. Our software was able to identify 285 (3.3%) mRNAs with putatively incomplete ORFs at 5' region and, in three example cases selected (<it>selt1a</it>, <it>unc119.2</it>, <it>nppa</it>), the extended coding region at 5' end was cloned by reverse transcription-polymerase chain reaction (RT-PCR).</p> <p>Conclusion</p> <p>The implemented method, which could also be useful for the analysis of other genomes, allowed us to describe the relevance of the "5' end mRNA artifact" problem for genomic annotation and functional genomic experiment design in zebrafish.</p> <p>Open peer review</p> <p>This article was reviewed by Alexey V. Kochetov (nominated by Mikhail Gelfand), Shamil Sunyaev, and Gáspár Jékely. For the full reviews, please go to the Reviewers' Comments section.</p

    Uncertainty principle of genetic information in a living cell

    Get PDF
    BACKGROUND: Formal description of a cell's genetic information should provide the number of DNA molecules in that cell and their complete nucleotide sequences. We pose the formal problem: can the genome sequence forming the genotype of a given living cell be known with absolute certainty so that the cell's behaviour (phenotype) can be correlated to that genetic information? To answer this question, we propose a series of thought experiments. RESULTS: We show that the genome sequence of any actual living cell cannot physically be known with absolute certainty, independently of the method used. There is an associated uncertainty, in terms of base pairs, equal to or greater than μs (where μ is the mutation rate of the cell type and s is the cell's genome size). CONCLUSION: This finding establishes an "uncertainty principle" in genetics for the first time, and its analogy with the Heisenberg uncertainty principle in physics is discussed. The genetic information that makes living cells work is thus better represented by a probabilistic model rather than as a completely defined object

    Sequence, "subtle" alternative splicing and expression of the CYYR1 (cysteine/tyrosine-rich 1) mRNA in human neuroendocrine tumors

    Get PDF
    BACKGROUND: CYYR1 is a recently identified gene located on human chromosome 21 whose product has no similarity to any known protein and is of unknown function. Analysis of expressed sequence tags (ESTs) have revealed high human CYYR1 expression in cells belonging to the diffuse neuroendocrine system (DNES). These cells may be the origin of neuroendocrine (NE) tumors. The aim of this study was to conduct an initial analysis of sequence, splicing and expression of the CYYR1 mRNA in human NE tumors. METHODS: The CYYR1 mRNA coding sequence (CDS) was studied in 32 NE tumors by RT-PCR and sequence analysis. A subtle alternative splicing was identified generating two isoforms of CYYR1 mRNA differing in terms of the absence (CAG(- )isoform, the first described mRNA for CYYR1 locus) or the presence (CAG(+ )isoform) of a CAG codon. When present, this specific codon determines the presence of an alanine residue, at the exon 3/exon 4 junction of the CYYR1 mRNA. The two mRNA isoform amounts were determined by quantitative relative RT-PCR in 29 NE tumors, 2 non-neuroendocrine tumors and 10 normal tissues. A bioinformatic analysis was performed to search for the existence of the two CYYR1 isoforms in other species. RESULTS: The CYYR1 CDS did not show differences compared to the reference sequence in any of the samples, with the exception of an NE tumor arising in the neck region. Sequence analysis of this tumor identified a change in the CDS 333 position (T instead of C), leading to the amino acid mutation P111S. NE tumor samples showed no significant difference in either CYYR1 CAG(- )or CAG(+ )isoform expression compared to control tissues. CYYR1 CAG(- )isoform was significantly more expressed than CAG(+ )isoform in NE tumors as well as in control samples investigated. Bioinformatic analysis revealed that only the genomic sequence of Pan troglodytes CYYR1 is consistent with the possible existence of the two described mRNA isoforms. CONCLUSION: A new "subtle" splicing isoform (CAG(+)) of CYYR1 mRNA, the sequence and the expression of this gene were defined in a large series of NE tumors

    TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input.</p> <p>Results</p> <p>TRAM (Transcriptome Mapper) is a new general tool that allows the simple generation and analysis of quantitative transcriptome maps, starting from any source listing gene expression values for a given gene set (e.g. expression microarrays), implemented as a relational database. It includes a parser able to assign univocal and updated gene symbols to gene identifiers from different data sources. Moreover, TRAM is able to perform intra-sample and inter-sample data normalization, including an original variant of quantile normalization (scaled quantile), useful to normalize data from platforms with highly different numbers of investigated genes. When in 'Map' mode, the software generates a quantitative representation of the transcriptome of a sample (or of a pool of samples) and identifies if segments of defined lengths are over/under-expressed compared to the desired threshold. When in 'Cluster' mode, the software searches for a set of over/under-expressed consecutive genes. Statistical significance for all results is calculated with respect to genes localized on the same chromosome or to all genome genes. Transcriptome maps, showing differential expression between two sample groups, relative to two different biological conditions, may be easily generated. We present the results of a biological model test, based on a meta-analysis comparison between a sample pool of human CD34+ hematopoietic progenitor cells and a sample pool of megakaryocytic cells. Biologically relevant chromosomal segments and gene clusters with differential expression during the differentiation toward megakaryocyte were identified.</p> <p>Conclusions</p> <p>TRAM is designed to create, and statistically analyze, quantitative transcriptome maps, based on gene expression data from multiple sources. The release includes FileMaker Pro database management runtime application and it is freely available at <url>http://apollo11.isto.unibo.it/software/</url>, along with preconfigured implementations for mapping of human, mouse and zebrafish transcriptomes.</p

    Genome-scale analysis of human mRNA 5' coding sequences based on expressed sequence tag (EST) database.

    Get PDF
    none9noThe "5' end mRNA artifact" issue refers to the incorrect assignment of the first AUG codon in an mRNA, due to the incomplete determination of its 5' end sequence. We performed a systematic identification of coding regions at the 5' end of all human known mRNAs, using an automated expressed sequence tag (EST)-based approach. Following parsing of more than 7million BLAT alignments, we found 477 human loci, out of 18,665 analyzed, in which an extension of the mRNA 5' coding region was identified. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for GNB2L1, QARS and TDP2 cDNAs, and the consequences for the functional studies of these loci are discussed. We also generated a list of 20,775 human mRNAs where the presence of an in-frame stop codon upstream of the known start codon indicates completeness of the coding sequence at 5' in the current form.openCasadei R.; Piovesan A.; Vitale L.; Facchin F.; Pelleri M.C.; Canaider S.; Bianconi E.; Frabetti F.; Strippoli P.Casadei R.; Piovesan A.; Vitale L.; Facchin F.; Pelleri M.C.; Canaider S.; Bianconi E.; Frabetti F.; Strippoli P

    Splicing alternativo come sorgente di isoforme di mRNA differenti per una sola tripletta di basi CAG: analisi sistematica nel genoma umano

    No full text
    Nel corso dello studio del gene umano CYYR1 (Vitale et al., 2002), localizzato sul cromosoma 21, abbiamo isolato e clonato in vettore un cDNA corrispondente a una isoforma di splicing che differisce di 3 basi (CAG) dalla forma precedentemente da noi identificata e descritta. La sequenza del cDNA della isoforma permette di prevedere la codifica di un prodotto polipeptidico con un amminoacido aggiuntivo (alanina) inserito tra P111 e G112. Lo studio mediante RT-PCR (reazione a catena della polimerasi dopo retrotrascrizione) semiquantitativa ha permesso di dimostrare che le due isoforme sono espresse differenzialmente in diversi tessuti normali studiati. L’analisi genomica ha rivelato che l’origine dello splicing alternativo risiede nella sequenza terminale dell’introne 3 del gene CYYR1 (CAGCAG), che presenta due siti accettori di splicing canonici “AG” distanti tre basi tra loro ed entrambi utilizzabili dalla cellula. E’ stata quindi eseguita una revisione sistematica, basata su analisi bioinformatica e in alcuni casi anche sul clonaggio mediante RT-PCR, degli mRNA umani che si presentano in due possibili isoforme differenti per la presenza o l’assenza di una tripletta CAG. In particolare, abbiamo analizzato con uno specifico programma da noi sviluppato le sequenze di circa 30.000 introni umani (banca dati SRS), dimostrando che in 39 casi (0,13 %) un introne umano termina con la sequenza CAGCAG. In 10 casi, l’analisi della banca dati EST (expressed sequence tags, etichette di sequenze espresse) mostrava l’effettiva esistenza di mRNA differenti per la tripletta CAG. In alcuni casi, la tripletta aggiuntiva si trova nella regione codificante, permettendo la previsione di una sequenza amminoacidica del prodotto variante più lunga di 1 amminoacido. Abbiamo dimostrato la presenza di splicing alternativo, mediante RT-PCR, per i geni: SGNE1 (secretory granule, neuroendocrine protein 1), G6PD (glucose-6-phosphate dehydrogenase), PKD1 (polycystic kidney disease 1 gene). Nel caso di IGF1R (codificante il recettore per l’insulin-like growth factor 1), dati di letteratura precedenti hanno mostrato che i prodotti polipeptidici tradotti a partire dalle due differenti isoforme di mRNA e differenti per un singolo amminoacido possono svolgere una diversa funzione biologica (in termini di oncogenicità in saggi di trasfezione cellulare). La identificazione sistematica di isoforme di splicing minimamente differenti con produzione di catene polipeptidiche varianti, oltre ad aver permesso l’identificazione rapida di nuove isoforme di splicing di geni umani noti da tempo e svolgenti azioni biologiche fondamentali, amplia la nozione di variabilità dei prodotti di trascrizione genica che possono essere originati da un singolo locus e prelude alla ricerca di meccanismi simili operanti a livello dei siti donatori di splicing o di altre analoghe sequenze a livello dei siti accettori

    GeneRecords: A relational database for GenBank flat file parsing and data manipulation in personal computers

    Get PDF
    Summary: Extracting the desired data from a database entry for later analysis is a constant need in the biological sequence analysis community; GeneRecords 1.0 is a solution for GenBank biological flat file parsing, as it implements a structured representation of each feature and feature qualifier in GenBank following import in a common database managing system usable in a personal computer (Macintosh and Windows environments). This collection of related databases enables the local management of GenBank records, allowing indexing, retrieval and analysis of both information and sequences on a personal computer. © Oxford University Press 2004; all rights reserved
    corecore