Search CORE

22 research outputs found

A database and API for variation, dense genotyping and resequencing data

Author: Birney Ewan
Chen Yuan
Cunningham Fiona
Flicek Paul
McLaren William M
Rios Daniel
Stabenau Arne
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and manipulation of such data necessitating the redesign of existing genome-wide bioinformatics resources. Results Ensembl has created a database and software library to support data storage, analysis and access to the existing and emerging variation data from large mammalian and vertebrate genomes. These tools scale to thousands of individual genome sequences and are integrated into the Ensembl infrastructure for genome annotation and visualisation. The database and software system is easily expanded to integrate both public and non-public data sources in the context of an Ensembl software installation and is already being used outside of the Ensembl project in a number of database and application environments. Conclusions Ensembl's powerful, flexible and open source infrastructure for the management of variation, genotyping and resequencing data is freely available at <url>http://www.ensembl.org</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor

Author: Bethan Pritchard
Chen
Daniel Rios
Fiona Cunningham
Flicek
Karchin
Paul Flicek
Rios
Sherry
William McLaren
Yuan Chen
Publication venue: Oxford University Press
Publication date
Field of study

Summary: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species

Crossref

PubMed Central

interPopula: a Python API to access the HapMap Project dataset

Author: B Peng
B Rhead
D Rios
D Smedley
F Hsu
F Rousset
GA Thorisson
IH Consortium
J Akey
JD Hunter
JE Stajich
JL Kelley
LD Stein
PJA Cock
SA Tishkoff
TE Oliphant
Tiago Antao
V Curwen
VJ Carey
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background The HapMap project is a publicly available catalogue of common genetic variants that occur in humans, currently including several million SNPs across 1115 individuals spanning 11 different populations. This important database does not provide any programmatic access to the dataset, furthermore no standard relational database interface is provided. Results interPopula is a Python API to access the HapMap dataset. interPopula provides integration facilities with both the Python ecology of software (e.g. Biopython and matplotlib) and other relevant human population datasets (e.g. Ensembl gene annotation and UCSC Known Genes). A set of guidelines and code examples to address possible inconsistencies across heterogeneous data sources is also provided. Conclusions interPopula is a straightforward and flexible Python API that facilitates the construction of scripts and applications that require access to the HapMap dataset.</p

LSTM Online Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A database for efficient storage and management of multi panel SNP data

Author: C. V. C. Truong
E. Groeneveld
Publication venue: 'Copernicus GmbH'
Publication date: 01/11/2013
Field of study

The fast development of high throughput genotyping has opened up new possibilities in genetics while at the same time producing immense data handling issues. A system design and proof of concept implementation are presented which provides efficient data storage and manipulation of single nucleotide polymorphism (SNP) genotypes in a relational database. A new strategy using SNP and individual selection vectors allows us to view SNP data as matrices or sets. These genotype sets provide an easy way to handle original and derived data, the latter at basically no storage costs. Due to its vector based database storage, data imports and exports are much faster than those of other SNP databases. In the proof of concept implementation, the compressed storage scheme reduces disk space requirements by a factor of around 300. Furthermore, this design scales linearly with number of individuals and SNPs involved. The procedure supports panels of different sizes. This allows a straight forward management of different panel sizes in the same population as it occurs in animal breeding programs when higher density panels replace previous lower density versions

Directory of Open Access Journals

Ensembl 2011

The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types

Crossref

PubMed Central

UCL Discovery

Coventry University Pure Portal

King's Research Portal

Segtor: Rapid Annotation of Genomic Coordinates and Single Nucleotide Variations Using Segment Trees

Author: Carlos Gil Ferreira
Cynthia Gibas
Edson Luiz Folador
Fabio Passetti
Gabriel Renaud
Pedro Neves
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Various research projects often involve determining the relative position of genomic coordinates, intervals, single nucleotide variations (SNVs), insertions, deletions and translocations with respect to genes and their potential impact on protein translation. Due to the tremendous increase in throughput brought by the use of next-generation sequencing, investigators are routinely faced with the need to annotate very large datasets. We present Segtor, a tool to annotate large sets of genomic coordinates, intervals, SNVs, indels and translocations. Our tool uses segment trees built using the start and end coordinates of the genomic features the user wishes to use instead of storing them in a database management system. The software also produces annotation statistics to allow users to visualize how many coordinates were found within various portions of genes. Our system currently can be made to work with any species available on the UCSC Genome Browser. Segtor is a suitable tool for groups, especially those with limited access to programmers or with interest to analyze large amounts of individual genomes, who wish to determine the relative position of very large sets of mapped reads and subsequently annotate observed mutations between the reads and the reference. Segtor (http://lbbc.inca.gov.br/segtor/) is an open-source tool that can be freely downloaded for non-profit use. We also provide a web interface for testing purposes

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

A "Candidate-Interactome" Aggregate Analysis of Genome-Wide Association Data in Multiple Sclerosis

Author: Abraham R
Alfredsson L
Annibali V
Ardlie K
Aubin C
Baker A
Baker K
Ban M
Band G
Baranzini SE
Barcellos LF
Bellenguez C
Bergamaschi L
Bergamaschi R
Bernstein A
Berthele A
Blackburn H
Blackwell JM
Boggild M
Bomfim IL
Boneschi FM
Booth DR
Bradfield JP
Bramon E
Brassat D
Broadley SA
Brown MA
Buck D
Bumpstead SJ
Buscarinu MC
Butzkueven H
Capra R
Carroll WM
Casas JP
Cavalla P
Celius EG
Cepok S
Chiavacci R
Clerget-Darpoux F
Clysters K
Coarelli G
Comabella M
Comi G
Compston A
Corvin A
Cossburn M
Cournu-Rebeix I
Cox MB
Cozen W
Cree BAC
Cross AH
Cusi D
D'alfonso S
D'hooghe MB
Daly MJ
Davis E
de Bakker PIW
De Jager PL
Debouverie M
Deloukas P
Dilthey A
Dixon K
Dobosi R
Donnelly P
Dronov S
Dubois B
Duncanson A
Edkins S
Ellinghaus D
Elovaara I
Esposito F
Fontaine B
Fontenille C
Foote S
Fornasiero A
Franke A
Freeman C
Galimberti D
Ghezzi A
Giannoulatou E
Gillman M
Glessner J
Gomez R
Goris A
Gout O
Graham C
Grant SFA
Gray E
Guerini FR
Gwilliam R
Hafler DA
Haines JL
Hakonarson H
Hall P
Hammond N
Hamsten A
Harbo HF
Hartung H-P
Hauser SL
Hawkins C
Hawkins S
Heard RN
Heath S
Hellenthal G
Hemmer B
Hillert J
Hobart J
Hoshi M
Hunt SE
Infante-Duarte C
Ingram G
Ingram W
Islam T
Ivinson AJ
Jagodic M
Jankowski J
Jayakumar A
Kabesch M
Kemppinen A
Kermode AG
Kilpatrick TJ
Kim C
Klopp N
Kockum I
Koivisto K
Langford C
Larsson M
Lathrop M
Lechner-Scott JS
Leone MA
Leppä V
Leslie S
Liddle J
Liljedahl U
Lincoln RR
Link J
Liu J
Lorentzen ÅR
Lupoli S
Macciardi F
Mack T
Markus HS
Marriott M
Martin R
Martinelli V
Mason D
Mathew CG
McCann OT
McCauley JL
McVean G
Mechelli R
Mentch F
Mero I-L
Mihalova T
Montalban X
Mottershead J
Moutsianas L
Mycko MP
Myhr K-M
Naldi P
Oksenberg JR
Ollier W
Olsson T
Oturai AB
Page A
Palmer CNA
Palotie A
Patsopoulos NA
Pelletier J
Peltonen L
Perez ML
Pericak-Vance MA
Piccio L
Pickersgill T
Piehl F
Pirinen M
Plomin R
Pobywajlo S
Policano C
Potter SC
Quach HL
Ramsay PP
Rautanen A
Ravindrarajah R
Reunanen M
Reynolds R
Ricigliano VAG
Ricketts M
Rioux JD
Ristori G
Robertson N
Rodegher M
Roesner S
Romano S
Rubio JP
Rückert I-M
Saarela J
Salvetti M
Salvi E
Santaniello A
Sawcer S
Schaefer CA
Schreiber S
Schulze C
Scott RJ
Sellebjerg F
Selmaj KW
Sexton D
Shen L
Simms-Acuna B
Skidmore S
Sleiman PMA
Smestad C
Spencer CCA
Spurkland A
Stankovich J
Stewart GJ
Strange A
Strange RC
Su Z
Sulonen A-M
Sundqvist E
Syvänen A-C
Søndergaard HB
Sørensen PS
Taddeo F
Taylor B
Tienari P
Tourbah A
Trembath RC
Tronczynska E
Tubridy N
Umeton R
Vickery J
Villoslada P
Viswanathan AC
Vittori D
Waller MJ
Wang K
Wason J
Weston P
Whittaker P
Wichmann H-E
Widaa S
Willoughby E
Winkelmann J
Wittig M
Wood NW
Yaouanq J
Zajicek J
Zhang H
Zipp F
Zuvich R
Publication venue
Publication date: 16/05/2013
Field of study

Though difficult, the study of gene-environment interactions in multifactorial diseases is crucial for interpreting the relevance of non-heritable factors and prevents from overlooking genetic associations with small but measurable effects. We propose a “candidate interactome” (i.e. a group of genes whose products are known to physically interact with environmental factors that may be relevant for disease pathogenesis) analysis of genome-wide association data in multiple sclerosis. We looked for statistical enrichment of associations among interactomes that, at the current state of knowledge, may be representative of gene-environment interactions of potential, uncertain or unlikely relevance for multiple sclerosis pathogenesis: Epstein-Barr virus, human immunodeficiency virus, hepatitis B virus, hepatitis C virus, cytomegalovirus, HHV8-Kaposi sarcoma, H1N1-influenza, JC virus, human innate immunity interactome for type I interferon, autoimmune regulator, vitamin D receptor, aryl hydrocarbon receptor and a panel of proteins targeted by 70 innate immune-modulating viral open reading frames from 30 viral species. Interactomes were either obtained from the literature or were manually curated. The P values of all single nucleotide polymorphism mapping to a given interactome were obtained from the last genome-wide association study of the International Multiple Sclerosis Genetics Consortium & the Wellcome Trust Case Control Consortium, 2. The interaction between genotype and Epstein Barr virus emerges as relevant for multiple sclerosis etiology. However, in line with recent data on the coexistence of common and unique strategies used by viruses to perturb the human molecular system, also other viruses have a similar potential, though probably less relevant in epidemiological terms

UCL Discovery

The complete genome sequence of a Neandertal from the Altai Mountains

We present a high-quality genome sequence of a Neandertal woman from Siberia. We show that her parents were related at the level of half siblings and that mating among close relatives was common among her recent ancestors. We also sequenced the genome of a Neandertal from the Caucasus to low coverage. An analysis of the relationships and population history of available archaic genomes and 25 present-day human genomes shows that several gene flow events occurred among Neandertals, Denisovans and early modern humans, possibly including gene flow into Denisovans from an unknown archaic group. Thus, interbreeding, albeit of low magnitude, occurred among many hominin groups in the Late Pleistocene. In addition, the high quality Neandertal genome allows us to establish a definitive list of substitutions that became fixed in modern humans after their separation from the ancestors of Neandertals and Denisovans

Crossref

Harvard University - DASH

PubMed Central

eScholarship - University of California

MPG.PuRe

Next Generation Diagnostics in Inherited Arrhythmia Syndromes A Comparison of Two Approaches

Author: Angharad M Roberts
Anneke Lucassen
David O Robinson
Elijah R Behr
James S Ware
Nicholas S Peters
Rachel Buchan
Shibu John
Stuart A Cook
Sungsam Gong
Publication venue
Publication date: 03/04/2020
Field of study

Abstract Next-generation sequencing (NGS) provides an unprecedented opportunity to assess genetic variation underlying human disease. Here, we compared two NGS approaches for diagnostic sequencing in inherited arrhythmia syndromes. We compared PCR-based target enrichment and long-read sequencing (PCR-LR) with in-solution hybridization-based enrichment and short-read sequencing (Hyb-SR). The PCR-LR assay comprehensively assessed five long-QT genes routinely sequenced in diagnostic laboratories and "hot spots" in RYR2. The Hyb-SR assay targeted 49 genes, including those in the PCR-LR assay. The sensitivity for detection of control variants did not differ between approaches. In both assays, the major limitation was upstream target capture, particular in regions of extreme GC content. These initial experiences with NGS cardiovascular diagnostics achieved up to 89 % sensitivity at a fraction of current costs. In the next iteration of these assays we anticipate sensitivity above 97 % for all LQT genes. NGS assays will soon replace conventional sequencing for LQT diagnostics and molecular pathology

CiteSeerX

Ensembl variation resources

Author: Birney Ewan
Brent Simon
Chen Yuan
Cunningham Fiona
Flicek Paul
Kulesha Eugene
Marin-Garcia Pablo
McLaren William M
Pritchard Bethan
Rios Daniel
Smedley Damian
Smith James
Spudich Giulietta M
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central