Search CORE

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

eScholarship - University of California

The UCSC Genome Browser Database: update 2006

Author: Baertsch R.
Barber G. P.
Bejerano G.
Clawson H.
Diekhans M.
Furey T. S.
Harte R. A.
Haussler D.
Hillman-Jackson J.
Hinrichs A. S.
Hsu F.
Karolchik D.
Kent W. J.
Kuhn R. M.
Pedersen J. S.
Pohl A.
Raney B. J.
Rosenbloom K. R.
Siepel A.
Smith K. E.
Sugnet C. W.
Sultan-Qurraie A.
Thomas D. J.
Trumbower H.
Weber R. J.
Weirauch M.
Zweig A. S.
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at

CiteSeerX

Cold Spring Harbor Laboratory Institutional Repository

Tracking and coordinating an international curation effort for the CCDS Project

Author: A. Frankish
B. Aken
Bab
Baertsch
Brogna
Buhler
C. M. Farrell
C. Wallin
Church
Crowe
D. Barrell
Eberle
Green
Hwang
J. E. Loveland
J. Harrow
Jackson
K. D. Pruitt
Kim
Kozak
Kozak
Kozak
L. Wilming
Lee
Luukkonen
M. Diekhans
M.-M. Suner
Morris
Natsoulis
Nicholson
Parla
Prakash
R. A. Harte
S. Searle
Silva
Simeone
The ENCODE Project Consortium
Udby
Wethmar
Wu
Publication venue: Oxford University Press
Publication date: 12/02/2013
Field of study

The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a ‘gold standard’ definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines

CiteSeerX

The UCSC Genome Browser Database: 2008 update

Author: A. Pohl
A. S. Hinrichs
A. S. Zweig
A. Thakkapallayil
Altshuler
B. Giardine
B. J. Raney
B. Rhead
Benson
Blanchette
Collins
Conrad
D. Haussler
D. Karolchik
F. Hsu
Feuk
G. P. Barber
H. Clawson
H. Trumbower
Hsu
Iafrate
J. S. Pedersen
K. E. Smith
K. M. Kober
K. R. Rosenbloom
Karolchik
Kent
Locke
M. Diekhans
M. Stanke
McCarroll
Mishra
R. A. Harte
R. Baertsch
R. M. Kuhn
Redon
Riggins
Rual
Sebat
Sharp
Sherry
Stelzl
T. Wang
The MGC Project Team
Vang
Velculescu
W. J. Kent
W. Miller
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01)Wellcome Trust (London, England) (Grant number WT062023)Wellcome Trust (London, England) (Grant number WT077198

DSpace@MIT

Cold Spring Harbor Laboratory Institutional Repository

King's Research Portal

The completion of the Mammalian Gene Collection (MGC)

Author: Astashyn A.
Baertsch R.
Bhat N.
Blakesley R. W.
Bonner T. I.
Bouffard G. G.
Brejova B.
Brent M.
Brown G.
Brownstein M.
Buetow K. H.
Chuah E.
Collins F. S.
Comstock C. L.
Deng A.
Deng M.
Derge J. G.
Dickson M. C.
Diekhans M.
Farrell C.
Feingold E. A.
Garcia A. M.
Gerhard D. S.
Ghamsari L.
Gibbs R. A.
Good P. J.
Green E. D.
Grimwood J.
Gruber C. E.
Gunaratne P. H.
Hart J.
Harte R.
Haussler D.
Hirst M.
Hudson J.
Jacob H.
Jang W.
Kent J.
Kloske D.
Landrum M.
Langton L.
Lazar J.
Lebeau A.
Lewis J.
Lin C.
Ma K.
Maglott D.
Mah D.
Maidak B. L.
Mandich A.
Marsh A.
McPherson J.
Mello E.
Misquitta L.
Moksa M.
Moore T.
Mullikin J.
Muratet M.
Murphy M.
Murphy T.
Murray R. R.
Muzny D.
Myers R. M.
Pang J.
Pardes E.
Pennacchio C.
Phan L.
Pruitt K. D.
Rajput B.
Rasooly R.
Riddick L.
Robinson C.
Rodriguez A. C.
Salehi-Ashtiani K.
Schaefer C. F.
Schmutz J.
Schreiber K.
Sethupathy P.
Shapiro N.
Shenmen C. M.
Shoaf D.
Sieja S.
Siepel A.
Simmons B.
Smith M. R.
Stevens M.
Taylor G.
Temple G.
Tse K.
van Baren M. J.
Wagner L.
Ward M.
Webb D.
Weber J.
Wei C.
Wu J.
Wu W.
Yankie L.
Young A. C.
Zeng T.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/12/2009
Field of study

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide

Identification and Classification of Conserved RNA Secondary Structures in the Human Genome

Author: Adam Siepel
Angrand PO Apiou F, Stewart AF, Dutrillaux B, Losson R, et al.
Aparicio S Chapman J, Stupka E, Putnam N, Chia JM, et al.
Bentwich I Avniel A, Karov Y, Aharonov R, Gilad S, et al.
Berezikov E Guryev V, van de Belt J, Wienholds E, Plasterk RH, et al.
Berry MJ Banu L, Chen YY, Mandel SJ, Kieffer JD, et al.
Blanchette M Kent WJ, Riemer C, Elnitski L, Smit AF, et al.
Bompfünewerer AF Flamm C, Fried C, Fritzsch G, Hofacker IL, et al.
Brudno M Do CB, Cooper GM, Kim MF, Davydov E, et al.
Chimpanzee Sequencing and Analysis Consortium
David Haussler
Eric S Lander
Gibbs RA Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, et al.
Gill Bejerano
Gregory RI Yan KP, Amuthan G, Chendrimada T, Doratotaj B, et al.
Griffiths-Jones S Moxon S, Marshall M, Khanna A, Eddy SR, et al.
Higuchi M Maas S, Single FN, Hartner J, Rozov A, et al.
Hillier LW Miller W, Birney E, Warren W, Hardison RC, et al.
Howard MT Aggarwal G, Anderson CB, Khatri S, Flanigan KM, et al.
International Human Genome Sequencing Consortium
Jakob Skou Pedersen
Jim Kent
Kate Rosenbloom
Kent WJ Sugnet CW, Furey TS, Roskin KM, Pringle TH, et al.
Kerstin Lindblad-Toh
Kryukov GV Castellano S, Novoselov SV, Lobanov AV, Zehtab O, et al.
Lagos-Quintana M Rauhut R, Yalcin A, Meyer J, Lendeckel W, et al.
Lim LP Lau NC, Weinstein EG, Abdelhakim A, Yekta S, et al.
Matsufuji S Matsufuji T, Miyazaki Y, Murakami Y, Atkins JF, et al.
Pahl PM Hodges YK, Meltesen L, Perryman MB, Horwitz KB, et al.
Richard Durbin
Schwartz S Kent WJ, Smit A, Zhang Z, Baertsch R, et al.
Siepel A Bejerano G, Pedersen JS, Hinrichs AS, Hou M, et al.
Waterston RH Lindblad-Toh K, Birney E, Rogers J, Abril JF, et al.
Webb Miller
Xie X Lu J, Kulbokas EJ, Golub TR, Mootha V, et al.
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization

Public Library of Science (PLOS)

Cold Spring Harbor Laboratory Institutional Repository

Harvard University - DASH

Directory of Open Access Journals

eScholarship - University of California

Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data

Author: Abdallah Kald
Abdallah Kald
Airola Antti
Airola Antti
Aittokallio Tero
Aittokallio Tero
Anghe Catalina
Ankerst Donna P
Azima Helia
Baertsch Robert
Ballester Pedro J
Bare Chris
Bare J Christopher
Bhandari Vinayak
Bot Brian M
Bot Brian M
Buchardt Ann-Sophie
Buturovic Ljubomir
Cao Da
Chalise Prabhakar
Chang Billy HW
Cho Junwoo
Chu Tzu-Ming
Coley R Yates
Conjeti Sailesh
Correia Sara
Costello James C
Costello James C
Dai Junqiang
Dai Ziwei
Dang Cuong C
Dargatz Philip
Delavarkhan Sam
Deng Detian
Dhanik Ankur
Du Yu
Dunbar Maria Bekker-Nielsen
Elangovan Aparna
Ellis Shellie
Elo Laura L
Espiritu Shadrielle M
Fan Fan
Farshi Ashkan B
Freitas Ana
Fridley Brooke
Friend Stephen
Friend Stephen
Fuchs Christiane
Gofer Eyal
Golinska Agnieszka K
Graw Stefan
Greiner Russ
Guan Yuanfang
Guinney Justin
Guinney Justin
Guo Jing
Gupta Pankaj
Guyer Anna I
Han Jiawei
Hansen Niels R
Hirvonen Outi
Huang Barbara
Huang Chao
Hwang Jinseub
Ibrahim Joseph G
Jayaswa Vivek
Jeon Jouhyun
Ji Zhicheng
Juvvadi Deekshith
Jyrkkiö Sirkku
Kanigel-Winner Kimberly
Katouzian Amin
Kazanov Marat D
Khan Suleiman A
Khan Suleiman A
Khayyer Shahin
Kim Dalho
Koestler Devin
Kokowicz Fernanda
Kondofersky Ivan
Krautenbacher Norbert
Krstajic Damjan
Kumar Luke
Kurz Christoph
Kyan Matthew
Laajala Teemu D
Laajala Teemu D
Laimighofer Michael
Lee Eunjee
Lesinski Wojciech
Li Miaozhu
Li Ye
Lian Qiuyu
Liang Xiaotao
Lim Minseong
Lin Henry
Lin Xihui
Lu Jing
Mahmoudian Mehrad
Manshaei Roozbeh
Meier Richard
Miljkovic Dejan
Mirtti Tuomas
Mirtti Tuomas
Mnich Krzysztof
Navab Nassir
Neto Elias C
Neto Elias Chaibub
Newton Yulia
Norman Thea
Norman Thea
Pahikkala Tapio
Pahikkala Tapio
Pal Subhabrata
Park Byeongju
Patel Jaykumar
Pathak Swetabh
Pattin Alejandrina
Peddinti Gopal
Peddinti Gopalacharyulu
Peng Jian
Petersen Anne H
Philip Robin
Piccolo Stephen R
Polewko-Klim Aneta
Pölsterl Sebastian
Rao Karthik
Ren Xiang
Rocha Miguel
Rudnicki Witold R.
Ryan Charles J
Ryan Charles J
Ryu Hyunnam
Sartor Oliver
Sartor Oliver
Scher Howard I
Scherb Hagen
Sehgal Raghav
Seyednasrollah Fatemeh
Shang Jingbo
Shao Bin
Shen Liji
Shen Liji
Sher Howard
Shiga Motoki
Sokolov Artem
Song Lei
Soule Howard
Soule Howard
Stolovitzky Gustavo
Stolovitzky Gustavo
Stuart Josh
Sun Ren
Sweeney Christopher J
Sweeney Christopher J
Söllner Julia F
Tahmasebi Nazanin
Tan Kar-Tong
Tomaziu Lisbeth
Usset Joseph
Vang Yeeleng S
Vega Roberto
Vieira Vitor
Wang David
Wang Difei
Wang Junmei
Wang Lichao
Wang Sheng
Wang Tao
Wang Tao
Wang Yue
Winner Kimberly Kanigel
Wolfinger Russ
Wong Chris
Wu Zhenke
Xiao Jinfeng
Xie Xiaohui
Xie Yang
Xie Yang
Xin Doris
Yang Hojin
Yu Nancy
Yu Thomas
Yu Thomas
Yu Xiang
Zahedi Sulmaz
Zanin Massimiliano
Zhang Chihao
Zhang Jingwen
Zhang Shihua
Zhang Yanchun
Zhou Fang Liz
Zhou Fang Liz
Zhu Hongtu
Zhu Shanfeng
Zhu Yuxin
Publication venue
Publication date: 01/01/2016
Field of study

Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe

Universidade do Minho: RepositoriUM