26 research outputs found

    Systematic analysis of mRNA 5' coding sequence incompleteness in Danio rerio: an automated EST-based approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>All standard methods for cDNA cloning are affected by a potential inability to effectively clone the 5' region of mRNA. The aim of this work was to estimate mRNA open reading frame (ORF) 5' region sequence completeness in the model organism <it>Danio rerio </it>(zebrafish).</p> <p>Results</p> <p>We implemented a novel automated approach (<it>5'_ORF_Extender</it>) that systematically compares available expressed sequence tags (ESTs) with all the zebrafish experimentally determined mRNA sequences, identifies additional sequence stretches at 5' region and scans for the presence of all conditions needed to define a new, extended putative ORF. Our software was able to identify 285 (3.3%) mRNAs with putatively incomplete ORFs at 5' region and, in three example cases selected (<it>selt1a</it>, <it>unc119.2</it>, <it>nppa</it>), the extended coding region at 5' end was cloned by reverse transcription-polymerase chain reaction (RT-PCR).</p> <p>Conclusion</p> <p>The implemented method, which could also be useful for the analysis of other genomes, allowed us to describe the relevance of the "5' end mRNA artifact" problem for genomic annotation and functional genomic experiment design in zebrafish.</p> <p>Open peer review</p> <p>This article was reviewed by Alexey V. Kochetov (nominated by Mikhail Gelfand), Shamil Sunyaev, and Gáspár Jékely. For the full reviews, please go to the Reviewers' Comments section.</p

    Sequence, "subtle" alternative splicing and expression of the CYYR1 (cysteine/tyrosine-rich 1) mRNA in human neuroendocrine tumors

    Get PDF
    BACKGROUND: CYYR1 is a recently identified gene located on human chromosome 21 whose product has no similarity to any known protein and is of unknown function. Analysis of expressed sequence tags (ESTs) have revealed high human CYYR1 expression in cells belonging to the diffuse neuroendocrine system (DNES). These cells may be the origin of neuroendocrine (NE) tumors. The aim of this study was to conduct an initial analysis of sequence, splicing and expression of the CYYR1 mRNA in human NE tumors. METHODS: The CYYR1 mRNA coding sequence (CDS) was studied in 32 NE tumors by RT-PCR and sequence analysis. A subtle alternative splicing was identified generating two isoforms of CYYR1 mRNA differing in terms of the absence (CAG(- )isoform, the first described mRNA for CYYR1 locus) or the presence (CAG(+ )isoform) of a CAG codon. When present, this specific codon determines the presence of an alanine residue, at the exon 3/exon 4 junction of the CYYR1 mRNA. The two mRNA isoform amounts were determined by quantitative relative RT-PCR in 29 NE tumors, 2 non-neuroendocrine tumors and 10 normal tissues. A bioinformatic analysis was performed to search for the existence of the two CYYR1 isoforms in other species. RESULTS: The CYYR1 CDS did not show differences compared to the reference sequence in any of the samples, with the exception of an NE tumor arising in the neck region. Sequence analysis of this tumor identified a change in the CDS 333 position (T instead of C), leading to the amino acid mutation P111S. NE tumor samples showed no significant difference in either CYYR1 CAG(- )or CAG(+ )isoform expression compared to control tissues. CYYR1 CAG(- )isoform was significantly more expressed than CAG(+ )isoform in NE tumors as well as in control samples investigated. Bioinformatic analysis revealed that only the genomic sequence of Pan troglodytes CYYR1 is consistent with the possible existence of the two described mRNA isoforms. CONCLUSION: A new "subtle" splicing isoform (CAG(+)) of CYYR1 mRNA, the sequence and the expression of this gene were defined in a large series of NE tumors

    TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input.</p> <p>Results</p> <p>TRAM (Transcriptome Mapper) is a new general tool that allows the simple generation and analysis of quantitative transcriptome maps, starting from any source listing gene expression values for a given gene set (e.g. expression microarrays), implemented as a relational database. It includes a parser able to assign univocal and updated gene symbols to gene identifiers from different data sources. Moreover, TRAM is able to perform intra-sample and inter-sample data normalization, including an original variant of quantile normalization (scaled quantile), useful to normalize data from platforms with highly different numbers of investigated genes. When in 'Map' mode, the software generates a quantitative representation of the transcriptome of a sample (or of a pool of samples) and identifies if segments of defined lengths are over/under-expressed compared to the desired threshold. When in 'Cluster' mode, the software searches for a set of over/under-expressed consecutive genes. Statistical significance for all results is calculated with respect to genes localized on the same chromosome or to all genome genes. Transcriptome maps, showing differential expression between two sample groups, relative to two different biological conditions, may be easily generated. We present the results of a biological model test, based on a meta-analysis comparison between a sample pool of human CD34+ hematopoietic progenitor cells and a sample pool of megakaryocytic cells. Biologically relevant chromosomal segments and gene clusters with differential expression during the differentiation toward megakaryocyte were identified.</p> <p>Conclusions</p> <p>TRAM is designed to create, and statistically analyze, quantitative transcriptome maps, based on gene expression data from multiple sources. The release includes FileMaker Pro database management runtime application and it is freely available at <url>http://apollo11.isto.unibo.it/software/</url>, along with preconfigured implementations for mapping of human, mouse and zebrafish transcriptomes.</p

    Genome-scale analysis of human mRNA 5' coding sequences based on expressed sequence tag (EST) database.

    Get PDF
    none9noThe "5' end mRNA artifact" issue refers to the incorrect assignment of the first AUG codon in an mRNA, due to the incomplete determination of its 5' end sequence. We performed a systematic identification of coding regions at the 5' end of all human known mRNAs, using an automated expressed sequence tag (EST)-based approach. Following parsing of more than 7million BLAT alignments, we found 477 human loci, out of 18,665 analyzed, in which an extension of the mRNA 5' coding region was identified. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for GNB2L1, QARS and TDP2 cDNAs, and the consequences for the functional studies of these loci are discussed. We also generated a list of 20,775 human mRNAs where the presence of an in-frame stop codon upstream of the known start codon indicates completeness of the coding sequence at 5' in the current form.openCasadei R.; Piovesan A.; Vitale L.; Facchin F.; Pelleri M.C.; Canaider S.; Bianconi E.; Frabetti F.; Strippoli P.Casadei R.; Piovesan A.; Vitale L.; Facchin F.; Pelleri M.C.; Canaider S.; Bianconi E.; Frabetti F.; Strippoli P

    Splicing alternativo come sorgente di isoforme di mRNA differenti per una sola tripletta di basi CAG: analisi sistematica nel genoma umano

    No full text
    Nel corso dello studio del gene umano CYYR1 (Vitale et al., 2002), localizzato sul cromosoma 21, abbiamo isolato e clonato in vettore un cDNA corrispondente a una isoforma di splicing che differisce di 3 basi (CAG) dalla forma precedentemente da noi identificata e descritta. La sequenza del cDNA della isoforma permette di prevedere la codifica di un prodotto polipeptidico con un amminoacido aggiuntivo (alanina) inserito tra P111 e G112. Lo studio mediante RT-PCR (reazione a catena della polimerasi dopo retrotrascrizione) semiquantitativa ha permesso di dimostrare che le due isoforme sono espresse differenzialmente in diversi tessuti normali studiati. L’analisi genomica ha rivelato che l’origine dello splicing alternativo risiede nella sequenza terminale dell’introne 3 del gene CYYR1 (CAGCAG), che presenta due siti accettori di splicing canonici “AG” distanti tre basi tra loro ed entrambi utilizzabili dalla cellula. E’ stata quindi eseguita una revisione sistematica, basata su analisi bioinformatica e in alcuni casi anche sul clonaggio mediante RT-PCR, degli mRNA umani che si presentano in due possibili isoforme differenti per la presenza o l’assenza di una tripletta CAG. In particolare, abbiamo analizzato con uno specifico programma da noi sviluppato le sequenze di circa 30.000 introni umani (banca dati SRS), dimostrando che in 39 casi (0,13 %) un introne umano termina con la sequenza CAGCAG. In 10 casi, l’analisi della banca dati EST (expressed sequence tags, etichette di sequenze espresse) mostrava l’effettiva esistenza di mRNA differenti per la tripletta CAG. In alcuni casi, la tripletta aggiuntiva si trova nella regione codificante, permettendo la previsione di una sequenza amminoacidica del prodotto variante più lunga di 1 amminoacido. Abbiamo dimostrato la presenza di splicing alternativo, mediante RT-PCR, per i geni: SGNE1 (secretory granule, neuroendocrine protein 1), G6PD (glucose-6-phosphate dehydrogenase), PKD1 (polycystic kidney disease 1 gene). Nel caso di IGF1R (codificante il recettore per l’insulin-like growth factor 1), dati di letteratura precedenti hanno mostrato che i prodotti polipeptidici tradotti a partire dalle due differenti isoforme di mRNA e differenti per un singolo amminoacido possono svolgere una diversa funzione biologica (in termini di oncogenicità in saggi di trasfezione cellulare). La identificazione sistematica di isoforme di splicing minimamente differenti con produzione di catene polipeptidiche varianti, oltre ad aver permesso l’identificazione rapida di nuove isoforme di splicing di geni umani noti da tempo e svolgenti azioni biologiche fondamentali, amplia la nozione di variabilità dei prodotti di trascrizione genica che possono essere originati da un singolo locus e prelude alla ricerca di meccanismi simili operanti a livello dei siti donatori di splicing o di altre analoghe sequenze a livello dei siti accettori

    GeneRecords: A relational database for GenBank flat file parsing and data manipulation in personal computers

    Get PDF
    Summary: Extracting the desired data from a database entry for later analysis is a constant need in the biological sequence analysis community; GeneRecords 1.0 is a solution for GenBank biological flat file parsing, as it implements a structured representation of each feature and feature qualifier in GenBank following import in a common database managing system usable in a personal computer (Macintosh and Windows environments). This collection of related databases enables the local management of GenBank records, allowing indexing, retrieval and analysis of both information and sequences on a personal computer. © Oxford University Press 2004; all rights reserved

    Gene expression profile analysis in human T lymphocytes from patients with Down syndrome

    No full text
    Down Syndrome (DS) is caused by the presence of three copies of the whole human chromosome 21 (HC21) or of a HC21 restricted region; the phenotype is likely to have originated from the altered expression of genes in the HC21. We apply the cDNA microarray method to the study of gene expression in human T lymphocytes with trisomy 21 in comparison to normal cells. Two patients with DS were investigated, along with two normal subjects as a control, all being tested in independent, duplicated cell culture experiments. The most consistent finding was the overexpression of the superoxide dismutase gene (SOD1), located on 21q, and of MHC DR beta 3 (HLA-DRB3), GABA receptor A gamma 2 (GABRG2), acetyltransferase Coenzyme, A 2 (ACAT2) and ras suppressor protein 1 (RSU1) genes. When the data were clustered according to chromosome localization, the HC21 gene set showed, on average, the highest expression in DS cells in all the experiments. Moreover, separate clustering of patients and controls was obtained when analysis was restricted to HC21 gene expression values. These findings reinforce the specific gene dosage theory for the pathogenesis of the DS phenotype, and show a consistent overexpression of the SOD1 gene on 21q. © University College London 2004

    Characterization of human gene locus CYYR1: a complex multi-transcript system

    No full text
    Cysteine/tyrosine-rich 1 (CYYR1) is a gene we previously identified on human chromosome 21 starting from an in-depth bioinformatics analysis of chromosome 21 segment 40/105 (21q21.3), where no coding region had previously been predicted. CYYR1 was initially characterized as a four-exon gene, whose brain-derived cDNA sequencing predicts a 154-amino acid product. In this study we provide, with in silico and in vitro analyses, the first detailed description of the human CYYR1 locus. The analysis of this locus revealed that it is composed of a multi-transcript system, which includes at least seven CYYR1 alternative spliced isoforms and a new CYYR1 antisense gene (named CYYR1-AS1). In particular, we cloned, for the first time, the following isoforms: CYYR1-1,2,3,4b and CYYR1-1,2,3b, which present a different 3' transcribed region, with a consequent different carboxy-terminus of the predicted proteins; CYYR1-1,2,4 lacks exon 3; CYYR1-1,2,2bis,3,4 presents an additional exon between exon 2 and exon 3; CYYR1-1b,2,3,4 presents a different 5' untranslated region when compared to CYYR1. The complexity of the locus is enriched by the presence of an antisense transcript. We have cloned a long transcript overlapping with CYYR1 as an antisense RNA, probably a non-coding RNA. Expression analysis performed in different normal tissues, tumour cell lines as well as in trisomy 21 and euploid fibroblasts has confirmed a quantitative and qualitative variability in the expression pattern of the multi-transcript locus, suggesting a possible role in complex diseases that should be further investigated
    corecore