Search CORE

68 research outputs found

ChloroplastDB: the Chloroplast Genome Database

Author: Cui Liying
dePamphilis Claude W.
Jansen Robert K.
Leebens-Mack Jim
Makalowska Izabela
Richter Alexander
Veeraraghavan Narayanan
Wall Kerr
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

The Chloroplast Genome Database (ChloroplastDB) is an interactive, web-based database for fully sequenced plastid genomes, containing genomic, protein, DNA and RNA sequences, gene locations, RNA-editing sites, putative protein families and alignments (). With recent technical advances, the rate of generating new organelle genomes has increased dramatically. However, the established ontology for chloroplast genes and gene features has not been uniformly applied to all chloroplast genomes available in the sequence databases. For example, annotations for some published genome sequences have not evolved with gene naming conventions. ChloroplastDB provides unified annotations, gene name search, BLAST and download functions for chloroplast encoded genes and genomic sequences. A user can retrieve all orthologous sequences with one search regardless of gene names in GenBank. This feature alone greatly facilitates comparative research on sequence evolution including changes in gene content, codon usage, gene structure and post-transcriptional modifications such as RNA editing. Orthologous protein sets are classified by TribeMCL and each set is assigned a standard gene name. Over the next few years, as the number of sequenced chloroplast genomes increases rapidly, the tools available in ChloroplastDB will allow researchers to easily identify and compile target data for comparative analysis of chloroplast genes and genomes

Crossref

PubMed Central

Identification and characterization of a novel ubiquitous nucleolar protein ‘NARR’ encoded by a gene overlapping the rab34 oncogene

Author: Aebersold
Alexandre Zougman
Andersen
Andersen
Boguski
Bork
Boulon
Bridger
Bristow
Chen
Cheutin
Cox
Fearnley
Freedberg
Grummt
Herzog
Jacek R. Wiśniewski
Johns
Kochetov
Kozak
Kuster
Larsen
Li
Makalowska
Malumbres
Mann
Matthias Mann
McStay
Ong
Oyama
Park
Parreiras
Pederson
Peleg
Pendle
Rappsilber
Ruhl
Scherl
Schnier
Schulze
Shevchenko
Smollett
Valdez
Vermeulen
Wang
Wisniewski
Wisniewski
Wisniewski
Wisniewski
Zanelli
Zhou
Zougman
Zougman
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

There are only few reports on protein products originating from overlapping mammalian genes even though computational predictions suggest that an appreciable fraction of mammalian genes could potentially overlap. Mass spectrometry-based proteomics has now acquired the tools to probe proteins in an unbiased manner, providing direct evidence of the output of the genomic and gene expression machinery. In particular, proteomics can refine gene predictions and discover novel gene-processing events and gene arrangements. Here, we report the mass spectrometric discovery and biochemical validation of the novel protein encoded by a gene overlapping rab34 oncogene. The novel protein is highly conserved in mammals. In humans, it contains 13 distinct Nine-Amino acid Residue-Repeats (NARR) with the consensus sequence PRVIV(S/T)PR in which the serine or threonine residues are phosphorylated during M-phase. NARR is ubiquitously expressed and resides in nucleoli where it colocalizes with ribosomal DNA (rDNA) gene clusters. Its distribution only partially overlaps with upstream binding factor, one of the main regulators of RNA Polymerase I activity, and is entirely uncoupled from it in mitotic cells and upon inhibition of transcription. NARR only partially colocalizes with fibrillarin, the pre-ribosomal RNA-processing protein, positioning NARR in a separate niche within the rDNA cluster

Crossref

PubMed Central

Integrative analysis of the human cis-antisense gene pairs, miRNAs and their transcription regulation patterns

Author: Alfano
Borsani
Buratowski
Carlile
Chen
Coriton
Enerly
Engstrom
Faghihi
Forman
Galante
Gallagher
Ge
Guo
Hall
Hayashi
He
Henderson
Ivshina
Jiangtao Zhou
Johnson
Kapranov
Kasashima
Katayama
Kim
Kuznetsov
Kuznetsov
Landgraf
Lehner
Li
Li
Lian
Makalowska
Morris
Munroe
Nakamura
Ogawa
Oleg V. Grinchuk
Orfanelli
Orlov
Pawlicki
Peters
Piroon Jenjaroenpun
Platts
Poongothai
Rogozin
Scharer
Seno
Shendure
Sotiropoulou
Sun
Tokumaru
Vanhoutteghem
Veeramachaneni
Vladimir A. Kuznetsov
Volinia
Wang
Werner
Wilusz
Xie
Xu
Yelin
Yu
Yuriy L. Orlov
Zhang
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

Cis-antisense gene pairs (CASGPs) can transcribe mRNAs from an opposite strand of a given locus. To classify and understand diverse CASGP phenomena in the human we compiled a genome-wide catalog of CASGPs and integrated these sequences with microarray, SAGE and miRNA data. Using the concept of overlapping regions and clustering of SA transcripts by chromosome coordinates, we identified up to 9000 overlapping antisense loci. Four thousand three hundred and seventy-four of these CASGPs form 1759 complex gene architectures. We found that ∼35% (6347/18160) of RefSeq genes are overlapped with the antisense transcripts. About 30% of Affymetrix U133 microarray initial sequences map transcripts of ∼35% CASGPs and reveal mostly concordant expression in CASGPs. We found strong significant overrepresentation of human miRNA genes in loci of CASGPs. We developed a data-driven model of cross-talk between co-expressed CASGPs and DICER1-mediated miRNA pathway in normal spermatogenesis and in severe teratozoospermia. Specifically, we revealed complex SA structural–functional gene module composing the protein-coding genes, WDR6, DALRD3, NDUFAF3 and ncRNA precursors, mir-425 and mir-191, which could provide downregulation of ncRNA pathway via direct targeting DICER1 and basonuclin 2 transcripts by mir-425 and mir-191 in normal spermatogenesis, but this mechanism is switched off in severe teratozoospermia. The database is available from http://globalisland.bii.a-star.edu.sg/∼jiangtao/sas/index3.php?link =abou

Crossref

PubMed Central

A Method for the Simultaneous Estimation of Selection Intensities in Overlapping Genes

Author: A Narechania
A Pavesi
A Pavesi
AL Hughes
AL Hughes
AM Pedersen
BG Barrell
CE Jones
Dan Graur
DC Krakauer
EC Holmes
F Lillo
Giddy Landan
H Okamoto
HL Zaaijer
I Makalowska
IB Rogozin
J Hein
J Montoya
J Zhang
JC Obenauer
KR Sakharkar
KS Li
L Campitelli
M Nei
N Goldman
Niv Sabath
Oliver G. Pybus
P Pamilo
PK Keese
PR Cooper
R Belshaw
R Nielsen
RA Smith
S de Groot
S de Groot
S Guyader
S McCauley
S McCauley
S Normark
SB Needleman
T Miyata
WH Li
Y Bao
Y Suzuki
Z Yang
Z Yang
Z Yang
ZI Johnson
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Inferring the intensity of positive selection in protein-coding genes is important since it is used to shed light on the process of adaptation. Recently, it has been reported that overlapping genes, which are ubiquitous in all domains of life, seem to exhibit inordinate degrees of positive selection. Here, we present a new method for the simultaneous estimation of selection intensities in overlapping genes. We show that the appearance of positive selection is caused by assuming that selection operates independently on each gene in an overlapping pair, thereby ignoring the unique evolutionary constraints on overlapping coding regions. Our method uses an exact evolutionary model, thereby voiding the need for approximation or intensive computation. We test the method by simulating the evolution of overlapping genes of different types as well as under diverse evolutionary scenarios. Our results indicate that the independent estimation approach leads to the false appearance of positive selection even though the gene is in reality subject to negative selection. Finally, we use our method to estimate selection in two influenza A genes for which positive selection was previously inferred. We find no evidence for positive selection in both cases

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Regulation of MYCN expression in human neuroblastoma cells

Author: A Cayre
AP de Brouwer
Arjan PM de Brouwer
B Lehner
BC Armstrong
CA Thrash-Bingham
Christina A Hulsbergen-van de Kaa
CL Marcelis
D Scott
F Rossignol
Frank N van Leeuwen
G Lavorgna
Gosse J Adema
GW Krystal
H Shimada
H van Bokhoven
Hans van Bokhoven
HJ Nickerson
I Jolanda M de Vries
I Makalowska
J Hoebeeck
J Shendure
Joannes FM Jacobs
K De Preter
KJ Livak
KK Matthay
LW Stanton
M Baguma-Nibasheka
M Schwab
M Schwab
MW Pfaffl
Peter M Hoogerbrugge
R Corvi
R Yelin
S Misra
S Rozen
S Tanaka
SB Bordow
XX Tang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Contains fulltext : 81722.pdf (publisher's version ) (Open Access)BACKGROUND: Amplification of the MYCN gene in neuroblastoma (NB) is associated with a poor prognosis. However, MYCN-amplification does not automatically result in higher expression of MYCN in children with NB. We hypothesized that the discrepancy between MYCN gene expression and prognosis in these children might be explained by the expression of either MYCN-opposite strand (MYCNOS) or the shortened MYCN-isoform (DeltaMYCN) that was recently identified in fetal tissues. Both MYCNOS and DeltaMYCN are potential inhibitors of MYCN either at the mRNA or at the protein level. METHODS: Expression of MYCN, MYCNOS and DeltaMYCN was measured in human NB tissues of different stages. Transcript levels were quantified using a real-time reverse transcriptase polymerase chain reaction assay (QPCR). In addition, relative expression of these three transcripts was compared to the number of MYCN copies, which was determined by genomic real-time PCR (gQPCR). RESULTS: Both DeltaMYCN and MYCNOS are expressed in all NBs examined. In NBs with MYCN-amplification, these transcripts are significantly higher expressed. The ratio of MYCN:DeltaMYCN expression was identical in all tested NBs. This indicates that DeltaMYCN and MYCN are co-regulated, which suggests that DeltaMYCN is not a regulator of MYCN in NB. However, the ratio of MYCNOS:MYCN expression is directly correlated with NB disease stage (p = 0.007). In the more advanced NB stages and NBs with MYCN-amplification, relatively more MYCNOS is present as compared to MYCN. Expression of the antisense gene MYCNOS might be relevant to the progression of NB, potentially by directly inhibiting MYCN transcription by transcriptional interference at the DNA level. CONCLUSION: The MYCNOS:MYCN-ratio in NBs is significantly correlated with both MYCN-amplification and NB-stage. Our data indicate that in NB, MYCN expression levels might be influenced by MYCNOS but not by DeltaMYCN

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Radboud Repository

Small RNAs and the regulation of cis-natural antisense transcripts in Arabidopsis

Abstract Background In spite of large intergenic spaces in plant and animal genomes, 7% to 30% of genes in the genomes encode overlapping cis-natural antisense transcripts (cis-NATs). The widespread occurrence of cis-NATs suggests an evolutionary advantage for this type of genomic arrangement. Experimental evidence for the regulation of two cis-NAT gene pairs by natural antisense transcripts-generated small interfering RNAs (nat-siRNAs) via the RNA interference (RNAi) pathway has been reported in Arabidopsis. However, the extent of siRNA-mediated regulation of cis-NAT genes is still unclear in any genome. Results The hallmarks of RNAi regulation of NATs are 1) inverse regulation of two genes in a cis-NAT pair by environmental and developmental cues and 2) generation of siRNAs by cis-NAT genes. We examined Arabidopsis transcript profiling data from public microarray databases to identify cis-NAT pairs whose sense and antisense transcripts show opposite expression changes. A subset of the cis-NAT genes displayed negatively correlated expression profiles as well as inverse differential expression changes under at least one of the examined developmental stages or treatment conditions. By searching the <it>Arabidopsis </it>Small RNA Project (ASRP) and Massively Parallel Signature Sequencing (MPSS) small RNA databases as well as our stress-treated small RNA dataset, we found small RNAs that matched at least one gene in 646 pairs out of 1008 (64%) protein-coding cis-NAT pairs, which suggests that siRNAs may regulate the expression of many cis-NAT genes. 209 putative siRNAs have the potential to target more than one gene and half of these small RNAs could target multiple members of a gene family. Furthermore, the majority of the putative siRNAs within the overlapping regions tend to target only one transcript of a given NAT pair, which is consistent with our previous finding on salt- and bacteria-induced nat-siRNAs. In addition, we found that genes encoding plastid- or mitochondrion-targeted proteins are over-represented in the Arabidopsis cis-NATs and that 19% of sense and antisense partner genes of cis-NATs share at least one common Gene Ontology term, which suggests that they encode proteins with possible functional connection. Conclusion The negatively correlated expression patterns of sense and antisense genes as well as the presence of siRNAs in many of the cis-NATs suggest that siRNA regulation of cis-NATs via the RNAi pathway is an important gene regulatory mechanism for at least a subgroup of cis-NATs in Arabidopsis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana

Author: A Huffaker
A Huffaker
AC Marques
AH Paterson
BJ Haas
C Lemieux
C Rancurel
CH Jen
Channa Keshavaiah
Charles Spillane
D Fischer
D Weigel
DG Knowles
DJ Begun
E Birney
E Lyons
EJ Louis
GA Tuskan
GA Wilson
GA Wilson
GK Smyth
H Lin
H Parkinson
I Makalowska
IK Jordan
J Brosius
J Cai
J Kilian
JD Palmer
JE Bowers
JJ Cai
JL Bennetzen
K Horan
K Khalturin
K Schmid
KA Silverstein
L Delaye
L Gautier
L Li
M Dunaeva
M Freeling
M Long
M Lynch
M Lynch
M Morgante
M Raynal
M Schmid
M Toll-Riera
MA Campbell
MA Koch
Mark TA Donoghue
MS Barker
MT Levine
N Jiang
N Siew
N Wikstrom
O Jaillon
Q Zhou
QH Le
R Ming
RC Edgar
RC Gentleman
RM Clark
RR Weigel
S Hunter
S Lockton
S Luhua
S Ohno
S Ouyang
S Yooseph
Sandesh H Swamidatta
SF Altschul
T Domazet-Loso
TC Bosch
V Daubin
VV Kapitonov
W Aufsatz
W Wang
W Xiao
WJ Guo
WM Liu
X Yang
Y Benjamini
Y Jiang
Y Yin
Z Zhu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background All sequenced genomes contain a proportion of lineage-specific genes, which exhibit no sequence similarity to any genes outside the lineage. Despite their prevalence, the origins and functions of most lineage-specific genes remain largely unknown. As more genomes are sequenced opportunities for understanding evolutionary origins and functions of lineage-specific genes are increasing. Results This study provides a comprehensive analysis of the origins of lineage-specific genes (LSGs) in <it>Arabidopsis thaliana </it>that are restricted to the Brassicaceae family. In this study, lineage-specific genes within the nuclear (1761 genes) and mitochondrial (28 genes) genomes are identified. The evolutionary origins of two thirds of the lineage-specific genes within the <it>Arabidopsis thaliana </it>genome are also identified. Almost a quarter of lineage-specific genes originate from non-lineage-specific paralogs, while the origins of ~10% of lineage-specific genes are partly derived from DNA exapted from transposable elements (twice the proportion observed for non-lineage-specific genes). Lineage-specific genes are also enriched in genes that have overlapping CDS, which is consistent with such novel genes arising from overprinting. Over half of the subset of the 958 lineage-specific genes found only in <it>Arabidopsis thaliana </it>have alignments to intergenic regions in <it>Arabidopsis lyrata</it>, consistent with either <it>de novo </it>origination or differential gene loss and retention, with both evolutionary scenarios explaining the lineage-specific status of these genes. A smaller number of lineage-specific genes with an incomplete open reading frame across different <it>Arabidopsis thaliana </it>accessions are further identified as accession-specific genes, most likely of recent origin in <it>Arabidopsis thaliana</it>. Putative <it>de novo </it>origination for two of the <it>Arabidopsis thaliana</it>-only genes is identified via additional sequencing across accessions of <it>Arabidopsis thaliana </it>and closely related sister species lineages. We demonstrate that lineage-specific genes have high tissue specificity and low expression levels across multiple tissues and developmental stages. Finally, stress responsiveness is identified as a distinct feature of Brassicaceae-specific genes; where these LSGs are enriched for genes responsive to a wide range of abiotic stresses. Conclusion Improving our understanding of the origins of lineage-specific genes is key to gaining insights regarding how novel genes can arise and acquire functionality in different lineages. This study comprehensively identifies all of the Brassicaceae-specific genes in <it>Arabidopsis thaliana </it>and identifies how the majority of such lineage-specific genes have arisen. The analysis allows the relative importance (and prevalence) of different evolutionary routes to the genesis of novel ORFs within lineages to be assessed. Insights regarding the functional roles of lineage-specific genes are further advanced through identification of enrichment for stress responsiveness in lineage-specific genes, highlighting their likely importance for environmental adaptation strategies.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Access to Research at National University of Ireland, Galway

Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

Author: Amid Clara
Apweiler Rolf
Ashurst Jennifer
Auffray Charles
Barrero Roberto A
Bellgard Matthew
Bonaldo Maria de Fatima
Bono Hidemasa
Bromberg Susan K
Brookes Anthony J
Bruford Elspeth
Carninci Piero
Chakraborty Ranajit
Chelala Claude
Chen Zhu
Couillault Christine
Debily Marie-Anne
Devignes Marie-Dominique
Dubchak Inna
Endo Toshinori
Estreicher Anne
Eveno Eric
Eyras Eduardo
Fujii Yasuyuki
Fukami-Kobayashi Kaoru
Fukuchi Satoshi
Go Mitiko
Gojobori Takashi
Gough Craig
Graudens Esther
Hahn Yoonsoo
Han Michael
Han Ze-Guang
Hanada Kousuke
Hanaoka Hideki
Harada Erimi
Hashimoto Katsuyuki
Hayashizaki Yoshihide
Hide Winston
Hilton Phillip
Hinz Ursula
Hirai Momoki
Hirakawa Mika
Hishiki Teruyoshi
Homma Keiichi
Hopkinson Ian
Ikeo Kazuho
Imanishi Tadashi
Imbeaud Sandrine
Inoko Hidetoshi
Isogai Takao
Itoh Takeshi
Jia Libin
Jin Lihua
Kanapin Alexander
Kanehisa Minoru
Kaneko Yayoi
Karavidopoulou Youla
Kasprzyk Arek
Kasukawa Takeya
Kelso Janet
Kersey Paul
Kikuno Reiko
Kim Sangsoo
Kimura Kouichi
Korn Bernhard
Koyanagi Kanako O
Kuryshev Vladimir
Lenhard Boris
Makalowska Izabela
Makalowski Wojciech
Makino Takashi
Mano Shuhei
Mariage-Samson Regine
Mashima Jun
Matsuda Hideo
Mewes Hans-Werner
Minoshima Shinsei
Miyazaki Satoru
Mulder Nicola
Nagai Keiichi
Nagasaki Hideki
Nagata Naoki
Nakai Kenta
Nakao Mitsuteru
Nigam Rajni
Nishikawa Ken
Nishikawa Tetsuo
Nomura Nobuo
O'Donovan Claire
Ogasawara Osamu
Ohara Osamu
Ohtsubo Masafumi
Oishi Michio
Okada Norihiro
Okazaki Yasushi
Okido Toshihisa
Okubo Kousaku
Oota Satoshi
Ota Motonori
Ota Toshio
Otsuki Tetsuji
Piatier-Tonneau Dominique
Poustka Annemarie
Quackenbush John
R. Gopinath Gopal
Ren Shuang-Xi
Richard Roberts
Saitou Naruya
Sakai Hiroaki
Sakai Katsunaga
Sakaki Yoshiyuki
Sakamoto Shigetaka
Sakate Ryuichi
Schupp Ingo
Servant Florence
Sherry Stephen
Shiba Rie
Shimizu Nobuyoshi
Shimoyama Mary
Simpson Andrew J
Soares Bento
Souza Sandro J. de
Steward Charles
Stodolsky Marvin
Strausberg Robert L
Sugano Sumio
Sugawara Hideaki
Suwa Makiko
Suzuki Mami
Suzuki Yoshiyuki
Suzuki Yutaka
Takagi Toshihisa
Takahashi Aiko
Takeda Jun-ichi
Tamiya Gen
Tamura Takuro
Tanaka Hiroshi
Tanaka Susumu
Tanino Motohiko
Tateno Yoshio
Taylor Todd
Terwilliger Joseph D
Thierry-Mieg Danielle
Thierry-Mieg Jean
Thomas Michael A
Tonellato Peter
Unneberg Per
Veeramachaneni Vamsi
Wagner Lukas
Watanabe Shinya
Wiemann Stefan
Wilming Laurens
Yamaguchi-Kabata Yumi
Yamasaki Chisato
Yasuda Norikazu
Yasuda Tomohiro
Yoo Hyang-Sook
Yura Kei
Publication venue: Public Library of Science
Publication date: 01/01/2004
Field of study

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

Research Repository

Hokkaido University Collection of Scholarly and Academic Papers

UPF Digital Repository

White Rose Research Online

MPG.PuRe

Next-generation transcriptome assembly

Author: A McPherson
A Mortazavi
AL Greninger
BC Schaefer
BJ Haas
BT Wilhelm
C Adamidi
C Trapnell
C Trapnell
DR Kelley
DR Zerbino
F Denoeud
F Ozsolak
F Ozsolak
F Ozsolak
G Robertson
H Mizuno
H Shi
I Birol
I Kozarewa
I Makalowska
J Butler
J Cocquet
J Eid
J Falgueras
J Martin
JE Crawford
Jeffrey A. Martin
JR Miller
JT Simpson
JZ Levin
K Paszkiewicz
K Wang
KD Passalacqua
KF Au
L Mamanova
LT Sam
M Burset
M Guttman
M Jager
M Kinsella
M Yassour
MG Grabherr
ML Metzker
NA Twine
PA Pevzner
R Garg
RA Dalloul
RC Taylor
S Chen
S He
S Marguerat
S Meader
S Normark
S Rodrigue
SA Tomlins
SD Jackman
SL Salzberg
T Lassmann
TD Wu
TS Schwartz
TT Perkins
U Nagalakshmi
WJ Kent
Y Fukuda
Y Katz
Y Surget-Groba
Z Chen
Z Wang
Zhong Wang
ZI Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2011
Field of study

Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalog of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies-along with some perspectives on transcriptome assembly in the near future

Crossref

ZENODO

UNT Digital Library

Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

Author: Amid Clara
Ashurst Jennifer
Barrero Roberto A.
Bellgard , Matthew
Bono Hidemasa
Bromberg Susan K.
Brookes Anthony J.
Bruford Elspeth
Carninci Piero
Chelala Claude
Couillault Christine
de Fatima Bonaldo Maria
de Souza Sandro J.
Debily Marie-Anne
Devignes Marie-Dominique
Dubchak Inna
Endo Toshinori
Estreicher Anne
Eveno Eric
Eyras Eduardo
Fujii Yasuyuki
Fukami-Kobayashi Kaoru
Fukuchi Satoshi
Gopinath Gopal R.
Gough Craig
Graudens Esther
Hahn Yoonsoo
Han Michael
Han Ze-Guang
Hanada Kousuke
Hanaoka Hideki
Harada Erimi
Hashimoto Katsuyuki
Hilton Phillip
Hinz Ursula
Hirai Momoki
Hirakawa Mika
Hishiki Teruyoshi
Homma Keiichi
Hopkinson Ian
Ikeo Kazuho
Imanishi Tadashi
Imbeaud Sandrine
Inoko Hidetoshi
Itoh Takeshi
Jia Libin
Jin Lihua
Kanapin Alexander
Kaneko Yayoi
Karavidopoulou Youla
Kasprzyk Arek
Kasukawa Takeya
Kelso Janet
Kersey Paul
Kikuno Reiko
Kim Sangsoo
Kimura Kouichi
Korn Bernhard
Koyanagi Kanako O.
Kuryshev Vladimir
Lenhard Boris
Makalowska Izabela
Makino Takashi
Mano Shuhei
Mariage-Samson Regine
Mashima Jun
Matsuda Hideo
Mewes Hans-Werner
Minoshima Shinsei
Miyazaki Satoru
Mulder Nicola
Nagai Keiichi
Nagasaki Hideki
Nagata Naoki
Nakao Mitsuteru
Nigam Rajni
Nishikawa Tetsuo
O'Donovan Claire
Ogasawara Osamu
Ohara Osamu
Ohtsubo Masafumi
Okada Norihiro
Okido Toshihisa
OOta Satoshi
Ota Motonori
Ota Toshio
Otsuki Tetsuji
Piatier-Tonneau Dominique
Poustka Annemarie
Ren Shuang-Xi
Saitou Naruya
Sakai Hiroaki
Sakai Katsunaga
Sakamoto Shigetaka
Sakate Ryuichi
Schupp Ingo
SERVANT Florence
Sherry Stephen
Shiba Rie
Sugano Sumio
Suzuki Yoshiyuki
Suzuki Yutaka
Takeda Jun-Ichi
Tamura Takuro
Tanaka Susumu
Tanino Motohiko
Thierry-Mieg Danielle
Thierry-Mieg Jean
Thomas Michael, A.
Yamaguchi-Kabata Yumi
Yamasaki Chisato
Yasuda Tomohiro
Yura Kei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2004
Field of study

publication en ligne. Article dans revue scientifique avec comité de lecture. nationale.National audienceThe human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot