Search CORE

41 research outputs found

The role of Havana and communities in the manual curation of unfinished vertebrate genomes

Author: Adam Frankish
Catherine Snow
Chao-Kung Chen
Charlie Steward
Denise Carvalho-Silva
Denise Carvalho-Silva
Harminder Sehra
James Gilbert
Jane Loveland
Jennifer Harrow
Jonathan Mudge
Laurens Wilming
Leo Gordon
Marie Marthe Suner
Mark Thomas
Mustapha Larbaoui
Toby Hunt
Publication venue
Publication date: 20/04/2009
Field of study

Manual annotation‭ (‬the‭ "‬museum‭" ‬model of annotation‭) ‬relies on a small group of specialized curators to catalogue and classify genes according to their functional roles.‭ This‬ is both costly and time consuming and therefore is used only for model organisms with sufficient funding.‭ ‬Smaller research communities often have to rely on other models of annotation,‭ ‬mainly automated annotation‭ (‬the‭ "‬factory‭" ‬model,‭ ‬e.g.‭ ‬Ensembl‭)‬,‭ ‬and the‭ "‬jamboree‭" ‬model‭ (‬in which a group of leading biologists from the community and bioinformaticians come together for a short intensive annotation workshop‭)‬.‭ ‬At the Wellcome Trust Sanger Institute‭ (‬WTSI‭)‬,‭ ‬the Havana team provides high quality manual annotation of finished vertebrate genome sequences,‭ ‬namely human,‭ ‬mouse and zebrafish.‭ ‬We also perform the curation of specific finished regions such as the MHC in dog,‭ ‬cow and pig,‭ ‬whose whole genomes have been‭ ‬assembled from unfinished BACs or from whole genome shotgun sequences.‭ ‬In addition,‭ ‬we at Havana have also hosted annotation jamborees for the cow‭ (‬Bos taurus‭) ‬and pig‭ (‬Sus scrofa‭) ‬genomes.‭ ‬During those sessions,‭ ‬the research community had the opportunity to annotate their genes of interest under expert guidance using the custom written publicly available Otterlace annotation system,‭ ‬and the unified manual annotation guidelines.‭ ‬By making use of the tools and skills acquired during the cow and pig jamborees,‭ ‬the delegates can continue annotating their genomes remotely.‭ ‬For the pig genome,‭ ‬a highly contiguous physical map has been generated by an international effort of four laboratories (available in Pre!Ensembl) and‭ ‬is being used as a substrate for the swine genome sequencing project.‭ ‬Upcoming vertebrate genomes will be sequenced to a high depth coverage with the next generation sequencing technologies‭ (‬e.g.‭ ‬Illumina,‭ ‬454,‭ ‬SOLiD‭) ‬but will have the drawback of not being manually finished.‭ ‬Manual annotation will be more accurate than the automated predictions at coping with any assembly problems derived from these high coverage but unfinished‭ (‬or automatic pre-finished‭) ‬genomes.‭ ‬Once these inherent assembly errors are corrected and the gene structures are accurately identified with manual annotation,‭ ‬the curated genes will be incorporated and merged with the predicted gene models in Ensembl to provide a unified view of the landscape of vertebrate genomes.‭ ‬I will present an introduction to our manual annotation system and our experience using it for annotation jamborees at the WTSI

Crossref

Nature Precedings

Bioinformatics Training Network (BTN): a community resource for bioinformatics trainers

Author: Attwood Teresa K.
Blatter Marie-Claude
Blicher Thomas
Brazas Michelle D.
Brooksbank Catherine
Budd Aidan
De Las Rivas Javier
Fernandes Pedro
Jacob Joachim
Jimenez Rafael C.
Jones Phil
Lopez Rodrigo
Loveland Jane
McDowall Jennifer
Nyrönen Tommi H.
Rother Kristian
Schneider Maria V.
van Gelder Celia W. G.
Vaughan Brendan W.
Via Allegra
Walter Peter
Watson James
Publication venue: Oxford University Press
Publication date: 21/08/2011
Field of study

Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of ‘high-throughput biology’, the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections

Crossref

PubMed Central

Copenhagen University Research Information System

Radboud Repository

Digital.CSIC

The University of Manchester - Institutional Repository

Archivio della ricerca- Università di Roma La Sapienza

Online Research Database In Technology

University of Melbourne Institutional Repository

The pig X and Y Chromosomes: structure, sequence, and evolution.

We have generated an improved assembly and gene annotation of the pig X Chromosome, and a first draft assembly of the pig Y Chromosome, by sequencing BAC and fosmid clones from Duroc animals and incorporating information from optical mapping and fiber-FISH. The X Chromosome carries 1033 annotated genes, 690 of which are protein coding. Gene order closely matches that found in primates (including humans) and carnivores (including cats and dogs), which is inferred to be ancestral. Nevertheless, several protein-coding genes present on the human X Chromosome were absent from the pig, and 38 pig-specific X-chromosomal genes were annotated, 22 of which were olfactory receptors. The pig Y-specific Chromosome sequence generated here comprises 30 megabases (Mb). A 15-Mb subset of this sequence was assembled, revealing two clusters of male-specific low copy number genes, separated by an ampliconic region including the HSFY gene family, which together make up most of the short arm. Both clusters contain palindromes with high sequence identity, presumably maintained by gene conversion. Many of the ancestral X-related genes previously reported in at least one mammalian Y Chromosome are represented either as active genes or partial sequences. This sequencing project has allowed us to identify genes--both single copy and amplified--on the pig Y Chromosome, to compare the pig X and Y Chromosomes for homologous sequences, and thereby to reveal mechanisms underlying pig X and Y Chromosome evolution.This work was funded by BBSRC grant BB/F021372/1. The Flow Cytometry and Cytogenetics Core Facilities at the Wellcome Trust Sanger Institute and Sanger investigators are funded by the Wellcome Trust (grant number WT098051). K.B., D.C.-S., and J.H. acknowledge support from the Wellcome Trust (WT095908), the BBSRC (BB/I025506/1), and the European Molecular Biology Laboratory. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007–2013) under grant agreement no. 222664 (“Quantomics”).This is the final version of the article. It first appeared from Cold Spring Harbor Laboratory Press via http://dx.doi.org/10.1101/gr.188839.11

University of Essex Research Repository

Crossref

PubMed Central

UCL Discovery

Kent Academic Repository

Apollo (Cambridge)

iAnn: an event sharing platform for the life sciences

Summary: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available. Availability: http://iann.pro/iannviewer Contact: [email protected]

RERO DOC Digital Library

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

Author: Aken Bronwen L
Barnes If
Bennett Ruth
Berry Andrew E
Bruford Elspeth A
Bult Carol J
Cox Eric
Davidson Claire
Diekhans Mark
Farrell Catherine M
Frankish Adam
Girón Carlos G
Goldfarb Tamara
Gonzalez Jose M
Hunt Toby
Jackson John
Joardar Vinita
Kay Mike P
Kodali Vamsi K
Loveland Jane E
Martin Fergal J
McAndrews Monica
McGarvey Kelly M
Mudge Jonathan M
Murphy Michael
Murphy Terence
O\u27Leary Nuala A
Pruitt Kim D
Pujar Shashikant
Rajput Bhanu
Rangwala Sanjida H
Riddick Lillian D
Seal Ruth L
Suner Marie-Marthe
Wallin Craig
Webb David
Zhu Sophia
Publication venue: The Mouseion at the JAXlibrary
Publication date: 04/01/2018
Field of study

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Nucleic Acids Res 2018 Jan 4; 46(D1):D221-D228

The Jackson Laboratory: The Mouseion at the JAXlibrary

The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01)Wellcome Trust (London, England) (Grant number WT062023)Wellcome Trust (London, England) (Grant number WT077198

DSpace@MIT

PubMed Central

King's Research Portal

Structural and functional annotation of the porcine immunome

Author: Ait-Ali Tahar
Amid Clara
Anselmo Anna
Archibald Alan L.
Astley Matthew
Badaoui Bouabid
Bed'Hom Bertrand
Beraldi Dario
Berman Daniel
Blecha Frank
Botti Sara
Bystrom Megan
Carvalho-Silva Denise
Chen Celine
Cheng Ryan Pei-Yen
Dawson Harry D.
Freeman Tom C.
Fritz Eric
Gilbert James G. R.
Giuffra Elisabetta
Hardy Matthew
Harrow Jennifer L.
Hu Zhiliang
Huang Ting-Hua
Hume David A.
Hunt Toby
Kapetanovic Ronan
Kataria Ranjit
Kay Mike
Lloyd David
Loveland Jane E.
Lunney Joan K.
Mann Katherine M.
Morozumi Takeya
Murtaugh Michael P.
Pascal Geraldine
Reecy James M.
Rogel-Gaillard Claire
Sang Yongming
Schwartz John C.
Shinkai Hiroki
Snow Catherine
Steward Charles
Thomas Mark
Toki Daisuke
Tuggle Christopher K.
Uenishi Hirohide
Wilming Laurens
Zhang Jie
Zhao Shu-Hong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems.[br/] Results: The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome.[br/] Conclusions: This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response

Crossref

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

HAL Université de Tours

ProdInra

GENCODE: reference annotation for the human and mouse genomes in 2023.

Author: Arnan Carme
Banerjee Abhimanyu
Barnes If
Bennett Ruth
Berry Andrew
Bignell Alexandra
Boix Carles
Calvet Ferriol
Carbonell-Sala Sílvia
Cerdán-Vélez Daniel
Choudhary Jyoti S
Cunningham Fiona
Davidson Claire
Diekhans Mark
Donaldson Sarah
Dursun Cagatay
Fatima Reham
Flicek Paul
Frankish Adam
Gerstein Mark
Giorgetti Stefano
Giron Carlos Garcıa
Gonzalez Jose Manuel
Guigo Roderic
Gómez Laura Martínez
Hardy Matthew
Harrison Peter W
Hollis Zoe
Hourlier Thibaut
Hubbard Tim J P
Hunt Toby
James Benjamin
Jiang Yunzhe
Johnson Rory
Jungreis Irwin
Kay Mike
Kellis Manolis
Kundaje Anshul
Lagarde Julien
Loveland Jane E
Martin Fergal J
Mudge Jonathan M
Nair Surag
Ni Pengyu
Paten Benedict
Pozo Fernando
Ramalingam Vivek
Ruffier Magali
Schmitt Bianca M
Schreiber Jacob M
Sisu Cristina
Steed Emily
Sumathipala Dulika
Suner Marie-Marthe
Sycheva Irina
Tress Michael L
Uszczynska-Ratajczak Barbara
Wass Elizabeth
Wright James C
Yang Yucheng T
Yates Andrew
Zafrulla Zahoor
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/11/2022
Field of study

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org

Bern Open Repository and Information System (BORIS)

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Author: Adams Matthew S
Balderrama-Gutierrez Gabriela
Barnes If
Behera Amit K
Berry Andrew
Birol Inanc
Bostan Hamed
Brooks Angela N
Brooks Ashley M
Capella Salvador
Carbonell-Sala Sílvia
Carninci Piero
Chen Ying
Conesa Ana
De María Maite
Denslow Nancy D
Dhillon Namrita
Diekhans Mark
Du Mei RM
Fai Au Kin
Felton Colette
Fernandez-Gonzalez Jose M
Ferrández-Peral Luis
Frankish Adam
Garcia-Reyero Natàlia
Goetz Stefan
Gonzalez Jose M
Guigó Roderic
Göke Jonathan
Hafezqorani Saber
Hasan Çelik Muhammed
Hernández-Ferrer Carles
Herwig Ralf
Hunt Toby
Hunter Margaret E
Jerryd Meade Marcus
Kawaji Hideya
Kei Wan Yuk
Kondratova Liudmyla
Lagarde Julien
Laird Smith Melissa
Lee Joseph
Li Haoran
Liang Li Jian
Liang Cindy E
Lienhard Matthias
Liu Tianyuan
Loveland Jane E
Martinez-Martin Alessandra
Menor Carlos
Mestre-Tomás Jorge
Mikheenko Alla
Ming Nip Ka
Moraga Amador David A
Mortazavi Ali
Mudge Jonathan M
Mulligan Dennis
Panayotova Nedka G
Paniagua Alejandro
Pardo-Palacios Francisco J
Pertea Mihaela
Prjibelski Andrey D
Reese Fairlie
Repchevsky Dmitry
Ritchie Matthew E
Rouchka Eric
Saint-John Brandon
Sapena Enrique
Sheynkman Gloria M
Sheynkman Leon
Sim Andre D
Suner Marie-Marthe
Takahashi Hazuki
Tang Alison D
Tilgner Hagen U
Vollmers Christopher
Wang Changqing
Wang Dingjie
Williams Brian
Wold Barbara J
Wong Brandon Y
Yang Chen
Youngworth Ingrid Ashley
Publication venue: bioXRiv
Publication date: 27/07/2023
Field of study

The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis

UCL Discovery