Search CORE

34 research outputs found

From identification to validation to gene count

Author: Aken Bronwen
Amid Clara
Carninci Piero
Ezkurdia Iakes
Frankish Adam
Gilbert James
Gingeras Thomas R.
Guigó Serra Roderic
Harrow Jennifer
HAVANA
Hubbard Tim J.
Kokocinski Felix
Searle Stephen
Tress Michael
White Simon
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

The current GENCODE gene count of ~ 30,000, including 21,727 protein-coding and 8,483 RNA genes, is significantly lower than the 100,000 genes anticipated by early estimates. Accurate annotation of protein-coding and non-coding genes and pseudogenes is essential in calculating the true gene count and gaining insight into human evolution. As part of the GENCODE Consortium, the HAVANA team produces high quality manual gene annotation, which forms the basis for the reference gene set being used by the ENCODE project and provides a rich annotation of alternative splice variants and assignment of functional potential. However, the protein-coding potential of some splice variants is uncertain and valid splice variants can remain unannotated if they are absent from current cDNA libraries. Recent technological developments in sequencing and mass spectrometry have created a vast amount of new transcript and protein data that facilitate the identification and validation of new and existing transcripts, while harboring their own limitations and problems

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

UPF Digital Repository

King's Research Portal

Comparison of sequencing methods and data processing pipelines for whole genome sequencing and minority single nucleotide variant (mSNV) analysis during an influenza A/H5N8 outbreak

Author: Amid C. (Clara)
Beer M. (Martin)
Bestebroer T.M. (Theo)
Brookes S.M. (Sharon M.)
Brown I.H. (Ian)
Ellis R.J. (Richard J.)
Everett H. (Helen)
Fouchier R.A.M. (Ron)
Poen M.J. (Marjolein)
Pohlmann A. (Anne)
Schapendonk C.M.E. (Claudia)
Scheuer R.D. (Rachel)
Smits S.L. (Saskia)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 20/02/2020
Field of study

As high-throughput sequencing technologies are becoming more widely adopted for analysing pathogens in disease outbreaks there needs to be assurance that the different sequencing technologies and approaches to data analysis will yield reliable and comparable results. Conversely, understanding where agreement cannot be achieved provides insight into the limitations of these approaches and also allows efforts to be focused on areas of the process that need improvement. This manuscript describes the next-generation sequencing of three closely related viruses, each analysed using different sequencing strategies, sequencing instruments and data processing pipelines. In order to determine the comparability of consensus sequences and minority (sub-consensus) single nucleotide variant (mSNV) identification, the biological samples, the sequence data from 3 sequencing platforms and the *.bam quality-trimmed alignment files of raw data of 3 influenza A/H5N8 viruses were shared. This analysis demonstrated that variation in the final result could be attributed to all stages in the process, but the most critical were the well-known homopolymer errors introduced by 454 sequencing, and the alignment processes in the different data processing pipelines which affected the consistency of mSNV detection. However, homopolymer errors aside, there was generally a good agreement between consensus sequences that were obtained for all combinations of sequencing platforms and data processing pipelines. Nevertheless, minority variant analysis will need a different level of careful standardization and awareness about the possible limitations, as shown in this study

Erasmus University Digital Repository

Accelerating surveillance and research of antimicrobial resistance - an online repository for sharing of antimicrobial susceptibility data associated with whole-genome sequences

Author: Aarestrup F.M. (Frank)
Amid C. (Clara)
Cochrane G. (Guy)
Csabai I. (Istvan)
Hendriksen R.S. (Rene S.)
Koopmans D.V.M. M.P.G. (Marion)
Lund O. (Ole)
Matamoros S. (Sébastien)
Pakseresht N. (Nima)
Pataki B.Á. (Bálint Ármin)
Rossello M. (Marc)
Schultsz C. (Constance)
Silvester N. (Nicole)
The Compare Ml-Amr Group ()
Publication venue: 'Microbiology Society'
Publication date: 01/05/2020
Field of study

Antimicrobial resistance (AMR) is an emerging threat to modern medicine. Improved diagnostics and surveillance of resistant bacteria require the development of next-generation analysis tools and collabor

Erasmus University Digital Repository

The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01)Wellcome Trust (London, England) (Grant number WT062023)Wellcome Trust (London, England) (Grant number WT077198

DSpace@MIT

PubMed Central

King's Research Portal

Structural and functional annotation of the porcine immunome

Author: Ait-Ali Tahar
Amid Clara
Anselmo Anna
Archibald Alan L.
Astley Matthew
Badaoui Bouabid
Bed'Hom Bertrand
Beraldi Dario
Berman Daniel
Blecha Frank
Botti Sara
Bystrom Megan
Carvalho-Silva Denise
Chen Celine
Cheng Ryan Pei-Yen
Dawson Harry D.
Freeman Tom C.
Fritz Eric
Gilbert James G. R.
Giuffra Elisabetta
Hardy Matthew
Harrow Jennifer L.
Hu Zhiliang
Huang Ting-Hua
Hume David A.
Hunt Toby
Kapetanovic Ronan
Kataria Ranjit
Kay Mike
Lloyd David
Loveland Jane E.
Lunney Joan K.
Mann Katherine M.
Morozumi Takeya
Murtaugh Michael P.
Pascal Geraldine
Reecy James M.
Rogel-Gaillard Claire
Sang Yongming
Schwartz John C.
Shinkai Hiroki
Snow Catherine
Steward Charles
Thomas Mark
Toki Daisuke
Tuggle Christopher K.
Uenishi Hirohide
Wilming Laurens
Zhang Jie
Zhao Shu-Hong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems.[br/] Results: The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome.[br/] Conclusions: This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response

Crossref

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

HAL Université de Tours

ProdInra

Metagenomics-Based Proficiency Test of Smoked Salmon Spiked with a Mock Community

Author: Aarestrup Frank M.
Amid Clara
Brinkmann Annika
Castellani Gastone
Cotter Paul D.
Crispie Fiona
De Cesare Alessandra
Ellis Richard J.
Grützke Josephine
Guyader Soizick Le
Hakhverdyan Mikhayil
Hendriksen Rene S.
Manfreda Gerardo
Mordhorst Hanne
Mossong Joël
Nitsche Andreas
Pamp Sünje Johanna
Petersen Thomas N.
Poulsen Casper
Ragimbeau Catherine
Sala Claudia
Schaeffer Julien
Schlundt Joergen
Tay Moon Y. F.
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

peer reviewedAn inter-laboratory proficiency test was organized to assess the ability of participants to perform shotgun metagenomic sequencing of cold smoked salmon, experimentally spiked with a mock community composed of six bacteria, one parasite, one yeast, one DNA, and two RNA viruses. Each participant applied its in-house wet-lab workflow(s) to obtain the metagenomic dataset(s), which were then collected and analyzed using MG-RAST. A total of 27 datasets were analyzed. Sample pre-processing, DNA extraction protocol, library preparation kit, and sequencing platform, influenced the abundance of specific microorganisms of the mock community. Our results highlight that despite differences in wet-lab protocols, the reads corresponding to the mock community members spiked in the cold smoked salmon, were both detected and quantified in terms of relative abundance, in the metagenomic datasets, proving the suitability of shotgun metagenomic sequencing as a genomic tool to detect microorganisms belonging to different domains in the same food matrix. The implementation of standardized wet-lab protocols would highly facilitate the comparability of shotgun metagenomic sequencing dataset across laboratories and sectors. Moreover, there is a need for clearly defining a sequencing reads threshold, to consider pathogens as detected or undetected in a food sample

Multidisciplinary Digital Publishing Institute

T-Stór

ArchiMer - Institutional Archive of Ifremer

DR-NTU (Digital Repository of NTU)

Online Research Database In Technology

Publikationsserver des Robert Koch-Instituts

Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition

Author: Acinas Silvia G.
Acinas Silvia G.
Aiach Nathalie Giordanenco
Alberti Adriana
Albini Guillaume
Amid Clara
Aury Jean-Marc
Bazire Pascal
Belser Caroline
Beluche Odette
Bertrand Alexis
Bertrand Laurie
Besnard-Gonnet Marielle
Bordelais Isabelle
Bork Peer
Boss Emmanuel
Boutard Magali
Bowler Chris
Bowler Chris
Brum Jennifer R.
Brun Elodie
Cochrane Guy
Cornejo-Castillo Francisco M.
Cruaud Corinne
Da Silva Corinne
De Vargas Colomban
De Vargas Colomban
Desgranges Elodie
Dossat Carole
Dubois Maria
Duhaime Melissa
Dumont Corinne
Engelen Stefan
Ettedgui Evelyne
Fernandez Patricia
Fernández-Gómez Beatriz
Ferrera Isabel
Follows Michael
Garcia Espérance
Gas Shahinaz
Gavory Frédérick
Gorsky Gabriel
Grimsley Nigel
Grimsley Nigel
Guerin Thomas
Guy Julie
Hamon Chadia
Haquelle Maud
Hingamp Pascal
Hoopen Petra Ten
Hurwitz Bonnie L.
Iudicone Daniele
Jacoby E'krame
Jaillon Olivier
Jaillon Olivier
Kandels-Lewis Stefanie
Kandels-Lewis Stefanie
Karp-Boss Lee
Karsenti Eric
Karsenti Eric
Labadie Karine
Lebled Sandrine
Lemainque Arnaud
Lenoble Patricia
Logares Ramiro
Louesse Claudine
Mahieu Eric
Mairey Barbara
Martins Nathalie
Megret Catherine
Milani Claire
Muanga Jacqueline
Not Fabrice
Ogata Hiroyuki
Ogata Hiroyuki
Orvain Céline
Payen Emilie
Pelletier Eric
Perroud Peggy
Pesant Stéphane
Pesant Stéphane
Petit Emmanuelle
Poulain Julie
Poulos Bonnie T.
Poulton Nicole
Raes Jeroen
Robert Dominique
Romac Sarah
Ronsin Murielle
Royo-Llonch Marta
Samson Gaëlle
Sardet Christian
Sieracki Michael E.
Sieracki Michael E.
Speich Sabrina
Stemmann Lars
Stepanauskas Ramunas
Sullivan Matthew B.
Sullivan Matthew B.
Sunagawa Shinichi
Vacherie Benoit
Wessner Mark
Wincker Patrick
Wincker Patrick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/05/2018
Field of study

A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems

DSpace@MIT

Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

Author: Amid Clara
Apweiler Rolf
Ashurst Jennifer
Auffray Charles
Barrero Roberto A
Bellgard Matthew
Bonaldo Maria de Fatima
Bono Hidemasa
Bromberg Susan K
Brookes Anthony J
Bruford Elspeth
Carninci Piero
Chakraborty Ranajit
Chelala Claude
Chen Zhu
Couillault Christine
Debily Marie-Anne
Devignes Marie-Dominique
Dubchak Inna
Endo Toshinori
Estreicher Anne
Eveno Eric
Eyras Eduardo
Fujii Yasuyuki
Fukami-Kobayashi Kaoru
Fukuchi Satoshi
Go Mitiko
Gojobori Takashi
Gough Craig
Graudens Esther
Hahn Yoonsoo
Han Michael
Han Ze-Guang
Hanada Kousuke
Hanaoka Hideki
Harada Erimi
Hashimoto Katsuyuki
Hayashizaki Yoshihide
Hide Winston
Hilton Phillip
Hinz Ursula
Hirai Momoki
Hirakawa Mika
Hishiki Teruyoshi
Homma Keiichi
Hopkinson Ian
Ikeo Kazuho
Imanishi Tadashi
Imbeaud Sandrine
Inoko Hidetoshi
Isogai Takao
Itoh Takeshi
Jia Libin
Jin Lihua
Kanapin Alexander
Kanehisa Minoru
Kaneko Yayoi
Karavidopoulou Youla
Kasprzyk Arek
Kasukawa Takeya
Kelso Janet
Kersey Paul
Kikuno Reiko
Kim Sangsoo
Kimura Kouichi
Korn Bernhard
Koyanagi Kanako O
Kuryshev Vladimir
Lenhard Boris
Makalowska Izabela
Makalowski Wojciech
Makino Takashi
Mano Shuhei
Mariage-Samson Regine
Mashima Jun
Matsuda Hideo
Mewes Hans-Werner
Minoshima Shinsei
Miyazaki Satoru
Mulder Nicola
Nagai Keiichi
Nagasaki Hideki
Nagata Naoki
Nakai Kenta
Nakao Mitsuteru
Nigam Rajni
Nishikawa Ken
Nishikawa Tetsuo
Nomura Nobuo
O'Donovan Claire
Ogasawara Osamu
Ohara Osamu
Ohtsubo Masafumi
Oishi Michio
Okada Norihiro
Okazaki Yasushi
Okido Toshihisa
Okubo Kousaku
Oota Satoshi
Ota Motonori
Ota Toshio
Otsuki Tetsuji
Piatier-Tonneau Dominique
Poustka Annemarie
Quackenbush John
R. Gopinath Gopal
Ren Shuang-Xi
Richard Roberts
Saitou Naruya
Sakai Hiroaki
Sakai Katsunaga
Sakaki Yoshiyuki
Sakamoto Shigetaka
Sakate Ryuichi
Schupp Ingo
Servant Florence
Sherry Stephen
Shiba Rie
Shimizu Nobuyoshi
Shimoyama Mary
Simpson Andrew J
Soares Bento
Souza Sandro J. de
Steward Charles
Stodolsky Marvin
Strausberg Robert L
Sugano Sumio
Sugawara Hideaki
Suwa Makiko
Suzuki Mami
Suzuki Yoshiyuki
Suzuki Yutaka
Takagi Toshihisa
Takahashi Aiko
Takeda Jun-ichi
Tamiya Gen
Tamura Takuro
Tanaka Hiroshi
Tanaka Susumu
Tanino Motohiko
Tateno Yoshio
Taylor Todd
Terwilliger Joseph D
Thierry-Mieg Danielle
Thierry-Mieg Jean
Thomas Michael A
Tonellato Peter
Unneberg Per
Veeramachaneni Vamsi
Wagner Lukas
Watanabe Shinya
Wiemann Stefan
Wilming Laurens
Yamaguchi-Kabata Yumi
Yamasaki Chisato
Yasuda Norikazu
Yasuda Tomohiro
Yoo Hyang-Sook
Yura Kei
Publication venue: Public Library of Science
Publication date: 01/01/2004
Field of study

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

Research Repository

Hokkaido University Collection of Scholarly and Academic Papers

UPF Digital Repository

White Rose Research Online

MPG.PuRe