Search CORE

2,882 research outputs found

The variant call format and VCFtools

Author: A. Auton
C. A. Albers
Durbin
E. Banks
G. Abecasis
G. Lunter
G. McVean
G. T. Marth
M. A. DePristo
P. Danecek
R. Durbin
R. E. Handsaker
S. T. Sherry
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API

Oxford University Research Archive

estMOI: estimating multiplicity of infection using parasite deep sequencing data.

Author: Amambua-Ngwa
Auburn
Babiker
Bowman
Colin J. Sutherland
Harold Ocholla
Manske
Mark D. Preston
Nkhoma
Ntoumi
Preston
Robinson
Ross
Samuel A. Assefa
Sepulveda
Susana Campino
Taane G. Clark
Publication venue: 'Oxford University Press (OUP)'
Publication date: 17/01/2014
Field of study

Individuals living in endemic areas generally harbour multiple parasite strains. Multiplicity of infection (MOI) can be an indicator of immune status and transmission intensity. It has a potentially confounding effect on a number of population genetic analyses, which often assume isolates are clonal. Polymerase chain reaction-based approaches to estimate MOI can lack sensitivity. For example, in the human malaria parasite Plasmodium falciparum, genotyping of the merozoite surface protein (MSP1/2) genes is a standard method for assessing MOI, despite the apparent problem of underestimation. The availability of deep coverage data from massively parallizable sequencing technologies means that MOI can be detected genome wide by considering the abundance of heterozygous genotypes. Here, we present a method to estimate MOI, which considers unique combinations of polymorphisms from sequence reads. The method is implemented within the estMOI software. When applied to clinical P.falciparum isolates from three continents, we find that multiple infections are common, especially in regions with high transmission

Crossref

LSHTM Research Online

PubMed Central

An heuristic filtering tool to identify phenotype-associated genetic variants applied to human intellectual disability and canine coat colors

Author: Bosmans Tim
Broeckx Bart
Coopman Frank
Deforce Dieter
Dingemanse Walter
Gielen Ingrid
Saunders Jimmy
Van Nieuwerburgh Filip
Verhoeven Geert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background: Identification of one or several disease causing variant(s) from the large collection of variants present in an individual is often achieved by the sequential use of heuristic filters. The recent development of whole exome sequencing enrichment designs for several non-model species created the need for a species-independent, fast and versatile analysis tool, capable of tackling a wide variety of standard and more complex inheritance models. With this aim, we developed "Mendelian", an R-package that can be used for heuristic variant filtering. Results: The R-package Mendelian offers fast and convenient filters to analyze putative variants for both recessive and dominant models of inheritance, with variable degrees of penetrance and detectance. Analysis of trios is supported. Filtering against variant databases and annotation of variants is also included. This package is not species specific and supports parallel computation. We validated this package by reanalyzing data from a whole exome sequencing experiment on intellectual disability in humans. In a second example, we identified the mutations responsible for coat color in the dog. This is the first example of whole exome sequencing without prior mapping in the dog. Conclusion: We developed an R-package that enables the identification of disease-causing variants from the long list of variants called in sequencing experiments. The software and a detailed manual are available at https://github.com/BartBroeckx/Mendelian

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

FigShare

Recommended from our members

Long-term balancing selection drives evolution of immunity genes in Capsella.

Author: Bemm Felix
Hagmann Jörg
Koenig Daniel
Li Rachel
Neuffer Barbara
Slotte Tanja
Weigel Detlef
Wright Stephen I
Publication venue: eScholarship, University of California
Publication date: 01/02/2019
Field of study

Genetic drift is expected to remove polymorphism from populations over long periods of time, with the rate of polymorphism loss being accelerated when species experience strong reductions in population size. Adaptive forces that maintain genetic variation in populations, or balancing selection, might counteract this process. To understand the extent to which natural selection can drive the retention of genetic diversity, we document genomic variability after two parallel species-wide bottlenecks in the genus Capsella. We find that ancestral variation preferentially persists at immunity related loci, and that the same collection of alleles has been maintained in different lineages that have been separated for several million years. By reconstructing the evolution of the disease-related locus MLO2b, we find that divergence between ancient haplotypes can be obscured by referenced based re-sequencing methods, and that trans-specific alleles can encode substantially diverged protein sequences. Our data point to long-term balancing selection as an important factor shaping the genetics of immune systems in plants and as the predominant driver of genomic variability after a population bottleneck

eScholarship - University of California

MPG.PuRe

Second-generation PLINK: rising to the challenge of larger and richer datasets

Author: Chang Christopher C.
Chow Carson C.
Lee James J.
Purcell Shaun M.
Tellier Laurent C. A. M.
Vattikuti Shashaank
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2014
Field of study

PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.Comment: 2 figures, 1 additional fil

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

Harvard University - DASH

Copenhagen University Research Information System

PubMed Central

Gene expression in Leishmania is regulated predominantly by gene dosage

Author: Berriman Matthew
Beverley Stephen M.
Cotton James A.
Durrant Caroline
Grigg Michael E.
Iantorno Stefano A.
Khan Asis
Myler Peter
Ouellette Marc
Sacks David L.
Sanders Mandy J.
Warren Wesley C.
Weiss Louis M.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

ABSTRACT Leishmania tropica, a unicellular eukaryotic parasite present in North and East Africa, the Middle East, and the Indian subcontinent, has been linked to large outbreaks of cutaneous leishmaniasis in displaced populations in Iraq, Jordan, and Syria. Here, we report the genome sequence of this pathogen and 7,863 identified protein-coding genes, and we show that the majority of clinical isolates possess high levels of allelic diversity, genetic admixture, heterozygosity, and extensive aneuploidy. By utilizing paired genome-wide high-throughput DNA sequencing (DNA-seq) with RNA-seq, we found that gene dosage, at the level of individual genes or chromosomal “somy” (a general term covering disomy, trisomy, tetrasomy, etc.), accounted for greater than 85% of total gene expression variation in genes with a 2-fold or greater change in expression. High gene copy number variation (CNV) among membrane-bound transporters, a class of proteins previously implicated in drug resistance, was found for the most highly differentially expressed genes. Our results suggest that gene dosage is an adaptive trait that confers phenotypic plasticity among natural Leishmania populations by rapid down- or upregulation of transporter proteins to limit the effects of environmental stresses, such as drug selection. IMPORTANCE Leishmania is a genus of unicellular eukaryotic parasites that is responsible for a spectrum of human diseases that range from cutaneous leishmaniasis (CL) and mucocutaneous leishmaniasis (MCL) to life-threatening visceral leishmaniasis (VL). Developmental and strain-specific gene expression is largely thought to be due to mRNA message stability or posttranscriptional regulatory networks for this species, whose genome is organized into polycistronic gene clusters in the absence of promoter-mediated regulation of transcription initiation of nuclear genes. Genetic hybridization has been demonstrated to yield dramatic structural genomic variation, but whether such changes in gene dosage impact gene expression has not been formally investigated. Here we show that the predominant mechanism determining transcript abundance differences (>85%) in Leishmania tropica is that of gene dosage at the level of individual genes or chromosomal somy

Crossref

Directory of Open Access Journals

Digital Commons@Becker

Enlighten

An exome-wide sequencing study of the GOLDN cohort reveals novel associations of coding variants and fasting plasma lipids

Author: An Ping
et al
Feitosa Mary F
Province Michael A
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

Digital Commons@Becker