Search CORE

63 research outputs found

Recommended from our members

BCFtools/csq: haplotype-aware variant consequences.

Author: Danecek Petr
McCarthy Shane A
Publication venue: Bioinformatics
Publication date: 01/07/2017
Field of study

MOTIVATION: Prediction of functional variant consequences is an important part of sequencing pipelines, allowing the categorization and prioritization of genetic variants for follow up analysis. However, current predictors analyze variants as isolated events, which can lead to incorrect predictions when adjacent variants alter the same codon, or when a frame-shifting indel is followed by a frame-restoring indel. Exploiting known haplotype information when making consequence predictions can resolve these issues. RESULTS: BCFtools/csq is a fast program for haplotype-aware consequence calling which can take into account known phase. Consequence predictions are changed for 501 of 5019 compound variants found in the 81.7M variants in the 1000 Genomes Project data, with an average of 139 compound variants per haplotype. Predictions match existing tools when run in localized mode, but the program is an order of magnitude faster and requires an order of magnitude less memory. AVAILABILITY AND IMPLEMENTATION: The program is freely available for commercial and non-commercial use in the BCFtools package which is available for download from http://samtools.github.io/bcftools . CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Apollo (Cambridge)

Twelve years of SAMtools and BCFtools.

Author: Bonfield James K
Danecek Petr
Davies Robert M
Keane Thomas
Li Heng
Liddle Jennifer
Marshall John
McCarthy Shane A
Ohan Valeriu
Pollard Martin O
Whitwham Andrew
Publication venue: Gigascience
Publication date: 01/02/2021
Field of study

BACKGROUND: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. FINDINGS: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. CONCLUSION: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org

arXiv.org e-Print Archive

Enlighten

Apollo (Cambridge)

HTSlib: C library for reading/writing high-throughput sequencing data

Author: Bonfield James K.
Danecek Petr
Davies Robert M.
Keane Thomas
Li Heng
Marshall John
Ohan Valeriu
Whitwham Andrew
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2021
Field of study

Background: Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health. Findings: We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading. Conclusion: Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded >1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license

Enlighten

Detecting cryptic clinically relevant structural variation in exome-sequencing data increases diagnostic yield for developmental disorders.

Author: Danecek Petr
Eberhardt Ruth Y
Firth Helen V
FitzPatrick David R
Gallone Giuseppe
Gardner Eugene J
Hurles Matthew E
Lindsay Sarah J
Martin Hilary C
Prigmore Elena
Rajan Diana
Sifrim Alejandro
Wright Caroline F
Publication venue: Am J Hum Genet
Publication date: 10/10/2021
Field of study

Structural variation (SV) describes a broad class of genetic variation greater than 50 bp in size. SVs can cause a wide range of genetic diseases and are prevalent in rare developmental disorders (DDs). Individuals presenting with DDs are often referred for diagnostic testing with chromosomal microarrays (CMAs) to identify large copy-number variants (CNVs) and/or with single-gene, gene-panel, or exome sequencing (ES) to identify single-nucleotide variants, small insertions/deletions, and CNVs. However, individuals with pathogenic SVs undetectable by conventional analysis often remain undiagnosed. Consequently, we have developed the tool InDelible, which interrogates short-read sequencing data for split-read clusters characteristic of SV breakpoints. We applied InDelible to 13,438 probands with severe DDs recruited as part of the Deciphering Developmental Disorders (DDD) study and discovered 63 rare, damaging variants in genes previously associated with DDs missed by standard SNV, indel, or CNV discovery approaches. Clinical review of these 63 variants determined that about half (30/63) were plausibly pathogenic. InDelible was particularly effective at ascertaining variants between 21 and 500 bp in size and increased the total number of potentially pathogenic variants identified by DDD in this size range by 42.9%. Of particular interest were seven confirmed de novo variants in MECP2, which represent 35.0% of all de novo protein-truncating variants in MECP2 among DDD study participants. InDelible provides a framework for the discovery of pathogenic SVs that are most likely missed by standard analytical workflows and has the potential to improve the diagnostic yield of ES across a broad range of genetic diseases

RD&E Research Repository

Apollo (Cambridge)

Tracing the Route of Modern Humans out of Africa by Using 225 Human Genome Sequences from Ethiopians and Egyptians

Author: Bekele Endashaw
Bradman Neil
Chen Yuan
Danecek Petr
Durbin Richard
Ekong Rosemary
Gurdasani Deepti
Haber Marc
Kivisild Toomas
Luiselli Donata
Mekonnen Ephrem
Oljira Tamiru
Pagani Luca
Scally Aylwyn
Schiffels Stephan
Tyler-Smith Chris
Xue Yali
Zalloua Pierre
Publication venue: The Authors. Published by Elsevier Inc.
Publication date: 01/01/2015
Field of study

The predominantly African origin of all modern human populations is well established, but the route taken out of Africa is still unclear. Two alternative routes, via Egypt and Sinai or across the Bab el Mandeb strait into Arabia, have traditionally been proposed as feasible gateways in light of geographic, paleoclimatic, archaeological, and genetic evidence. Distinguishing among these alternatives has been difficult. We generated 225 whole-genome sequences (225 at 8× depth, of which 8 were increased to 30×; Illumina HiSeq 2000) from six modern Northeast African populations (100 Egyptians and five Ethiopian populations each represented by 25 individuals). West Eurasian components were masked out, and the remaining African haplotypes were compared with a panel of sub-Saharan African and non-African genomes. We showed that masked Northeast African haplotypes overall were more similar to non-African haplotypes and more frequently present outside Africa than were any sets of haplotypes derived from a West African population. Furthermore, the masked Egyptian haplotypes showed these properties more markedly than the masked Ethiopian haplotypes, pointing to Egypt as the more likely gateway in the exodus to the rest of the world. Using five Ethiopian and three Egyptian high-coverage masked genomes and the multiple sequentially Markovian coalescent (MSMC) approach, we estimated the genetic split times of Egyptians and Ethiopians from non-African populations at 55,000 and 65,000 years ago, respectively, whereas that of West Africans was estimated to be 75,000 years ago. Both the haplotype and MSMC analyses thus suggest a predominant northern route out of Africa via Egypt

Elsevier - Publisher Connector

Crossref

University of Birmingham Research Portal

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

MPG.PuRe

Archivio istituzionale della ricerca - Università di Padova

Recommended from our members

Insights into human genetic variation and population history from 929 diverse genomes.

Author: Almarri Mohamed A
Ayub Qasim
Bergström Anders
Blanché Hélène
Cann Howard
Chen Yuan
Danecek Petr
Deleuze Jean-François
Durbin Richard
Felkel Sabine
Hallast Pille
Hui Ruoyun
Kamm Jack
Mallick Swapan
McCarthy Shane A
Reich David
Sandhu Manjinder S
Scally Aylwyn
Skoglund Pontus
Tyler-Smith Chris
Xue Yali
Publication venue: Science
Publication date: 01/01/2020
Field of study

Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented common genetic variation private to southern Africa, central Africa, Oceania, and the Americas, but an absence of such variants fixed between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the past 10,000 years, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations.Wellcome grants 098051 and 206194, and S.A.M. and R.D. also by Wellcome grant 207492. A.B. and P.S. were supported by the Francis Crick Institute (FC001595) which receives its core funding from Cancer Research UK, the UK Medical Research Council and the Wellcome Trust. P.S. was also supported by the European Research Council (grant no. 852558) and the Wellcome Trust (217223/Z/19/Z). R.H. was supported by a Gates Cambridge scholarship. P.H. was supported by Estonian Research Council Grant PUT1036. D.R. is an Investigator of the Howard Hughes Medical Institute

Apollo (Cambridge)

HAL-CEA

University of East Anglia digital repository

Contribution of retrotransposition to developmental disorders.

Author: Chandler Kate E
Clement Emma
Danecek Petr
Firth Helen V
FitzPatrick David R
Gallone Giuseppe
Gardner Eugene J
Gerety Sebastian S
Handsaker Juliet
Hurles Matthew E
Ironfield Holly
Lachlan Katherine L
Prescott Katrina
Prigmore Elena
Rosser Elisabeth
Samocha Kaitlin E
Short Patrick J
Sifrim Alejandro
Singh Tarjinder
Publication venue: Nat Commun
Publication date: 01/12/2019
Field of study

Mobile genetic Elements (MEs) are segments of DNA which can copy themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. Here we identify RT-derived events in 9738 exome sequenced trios with DD-affected probands. We ascertain 9 de novo MEs, 4 of which are likely causative of the patient's symptoms (0.04%), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we estimate genome-wide germline ME mutation rate and selective constraint and demonstrate that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies

Southampton (e-Prints Soton)

Apollo (Cambridge)

Genomic Diagnosis of Rare Pediatric Disease in the United Kingdom and Ireland

Author: Aitken Stuart
Andrews Katrina A
Brent Simon
Campbell Patrick
Chundru V Kartik
Danecek Petr
Eberhardt Ruth Y.
Firth Helen V.
FitzPatrick David R
Foreman Julia
Gardner Eugene J.
Hampstead Juliet
Hobson Rachel J.
Hurles Matthew E
Kaplanis Joanna
Lindsay Sarah J
Martin Hilary C
Middleton Anna
Parker Michael
Perrett Daniel
Samocha Kaitlin E.
Wright Caroline F
Publication venue: 'Massachusetts Medical Society'
Publication date: 12/04/2023
Field of study

Edinburgh Research Explorer

Optimizing the Diagnosis of Rare Genomic Disease in the UK and Ireland

Author: Aitken Stuart
Andrews Katrina A
Brent Simon
Campbell Patrick
Chundru V Kartik
Danecek Petr
Eberhardt Ruth Y.
Firth Helen V.
FitzPatrick David R
Foreman Julia
Gardner Eugene J.
Hampstead Juliet
Hobson Rachel J.
Hurles Matthew E
Kaplanis Joanna
Lindsay Sarah J
Martin Hilary C
Middleton Anna
Parker Michael J.
Perrett Daniel
Samocha Kaitlin E.
Wright Caroline F
Publication venue
Publication date: 27/04/2023
Field of study

Edinburgh Research Explorer

Common genetic variation drives molecular heterogeneity in human iPSCs.

Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells

Crossref

UCL Discovery

Apollo (Cambridge)

University of Dundee Online Publications

King's Research Portal

University of Melbourne Institutional Repository