Search CORE

FigShare

The variant call format and VCFtools

Author: A. Auton
C. A. Albers
Durbin
E. Banks
G. Abecasis
G. Lunter
G. McVean
G. T. Marth
M. A. DePristo
P. Danecek
R. Durbin
R. E. Handsaker
S. T. Sherry
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API

Whole-genome sequencing of bladder cancers reveals somatic CDKN1A mutations and clinicopathological associations with mutation burden

Author: A Roth
AL Gartel
C Balbas-Martinez
C Yau
D Cappellen
D Sidransky
DA Solomon
G Gundem
G Guo
G Lunter
JB Cazier
L Lacombe
LB Alexandrov
ML Lu
MS Lawrence
P Lianes
PJ Goebell
S Denzinger
S Lise
T Abbas
Y Gui
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Bladder cancers are a leading cause of death from malignancy. Molecular markers might predict disease progression and behaviour more accurately than the available prognostic factors. Here we use whole-genome sequencing to identify somatic mutations and chromosomal changes in 14 bladder cancers of different grades and stages. As well as detecting the known bladder cancer driver mutations, we report the identification of recurrent protein-inactivating mutations in CDKN1A and FAT1. The former are not mutually exclusive with TP53 mutations or MDM2 amplification, showing that CDKN1A dysfunction is not simply an alternative mechanism for p53 pathway inactivation. We find strong positive associations between higher tumour stage/grade and greater clonal diversity, the number of somatic mutations and the burden of copy number changes. In principle, the identification of sub-clones with greater diversity and/or mutation burden within early-stage or low-grade tumours could identify lesions with a high risk of invasive progression

University of Birmingham Research Portal

White Rose Research Online

University of Melbourne Institutional Repository

Multi-level evidence of an allelic hierarchy of USH2A variants in hearing, auditory processing and speech/language outcomes.

Author: 1000 Genomes Project C.
A Adato
A Boyd
A Gialluisi
A Rimmer
A Van Aarem
A Vouloumanos
AA Benasich
AL Barabasi
BJ Keats
BR Shrestha
C Kilkenny
C Witton
CA Anderson
CC Brewer
CF Norbury
CF Reisser
CS Lai
D Szklarczyk
DF Newbury
DF Newbury
DR Moore
DR Moore
DV Bishop
E Eising
E Lenassi
G Conti-Ramsden
G Dehaene-Lambertz
G Lunter
GR Abecasis
J Golding
J Heckman
J Hornickel
JA Boughman
JC Barrett
JC Taylor
JM Ellingford
K Walter
K Wang
K Watanabe
L Huang
M Kircher
M Lek
M Luciano
M Van Segbroeck
MC Liberman
MEK Niemi
MG Filipe
MJ Henry
MR Bowl
N Pearsall
P Cingolani
P Danecek
P Le Quesne Stabej
R Mora
R Nudel
RH Fitch
RH Fitch
RM Rosenfeld
S Colella
S Lee
S Purcell
S Richards
S Shultz
SW Threlkeld
WJ Kent
WJ Kimberling
X Liu
X Zhan
XS Chen
Y Zhou
Y-M Tien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Language development builds upon a complex network of interacting subservient systems. It therefore follows that variations in, and subclinical disruptions of, these systems may have secondary effects on emergent language. In this paper, we consider the relationship between genetic variants, hearing, auditory processing and language development. We employ whole genome sequencing in a discovery family to target association and gene x environment interaction analyses in two large population cohorts; the Avon Longitudinal Study of Parents and Children (ALSPAC) and UK10K. These investigations indicate that USH2A variants are associated with altered low-frequency sound perception which, in turn, increases the risk of developmental language disorder. We further show that Ush2a heterozygote mice have low-level hearing impairments, persistent higher-order acoustic processing deficits and altered vocalizations. These findings provide new insights into the complexity of genetic mechanisms serving language development and disorders and the relationships between developmental auditory and neural systems

Edinburgh Research Explorer

St George's Online Research Archive

University of Melbourne Institutional Repository

Oxford Brookes University: RADAR

Explore Bristol Research

Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution

Author: A Bais
A Halpern
A Lifanov
A Moses
A Moses
A Moses
A Siepel
B Berman
B Knudsen
C Bergman
C Bergman
C Dewey
D Halligan
D Karolchik
D Pollard
D Pollard
D Raijman
E Berezikov
E Birney
E Blackwood
E Davidson
E Dermitzakis
F Gao
G Lunter
G Lunter
G Lunter
G Stormo
G Wray
G Wray
I Holmes
I Holmes
I Holmes
I Miklos
J Berg
J Stone
J Thorne
J Thorne
J Warner
K Wong
M Brudno
M Frith
M Frith
M Hasegawa
M Ludwig
M Ludwig
M Noyes
O Hallikas
P Andolfatto
P Keightley
P Kheradpour
P Ray
P Tomancak
R Cartwright
R Durrett
R Satija
R Siddharthan
R Waterston
S Aerts
S Doniger
S Gallo
S MacArthur
S Sinha
S Sinha
Saurabh Sinha
V Mustonen
W Huang
W Wasserman
W Wong
Wyeth W. Wasserman
X Li
X Li
Xin He
Xu Ling
Z Hu
Publication venue: Public Library of Science
Publication date: 01/03/2009
Field of study

Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs) and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i) the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii) binding sites in distal bound sequences (relative to transcription start sites) tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis), ready to be applied in a broad biological context

Gentle Masking of Low-Complexity Sequences Improves Homology Search

Author: A Biegert
A Schaffer
B Niu
B Suzek
C Camacho
E Gertz
E Hazkani-Covo
F Chiaromonte
G Lunter
J Qin
K Forslund
Leonardo Mariño-Ramírez
M Frith
M Frith
M Frith
M Frith
Martin C. Frith
P Fujita
R Harris
S Altschul
S Altschul
S Kielbasa
S Schwartz
S Sheetlin
W Miller
W Pearson
Z Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with “gentle” masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is , where is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to “harsh” masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search

CiteSeerX

Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs

Author: A Dress
A Godzik
A Löytynoja
A Löytynoja
A Novák
A Novák
A Sali
A Siepel
A Tramontano
Adrienn Szabó
AS Schwartz
AS Schwartz
B Dwivedi
B Knudsen
B Larget
B Misof
B Schwikowski
BD Redelings
BD Redelings
BJM Webb
BP Blackburne
C Dessimoz
C Notredame
C Notredame
CB Do
CJ Challis
D Altschuh
D Chivian
D DeBlasio
D Lupyan
D Metzler
D Metzler
D Robinson
DA Morrison
DF Feng
E Levy Karin
G Jordan
G Landan
G Lunter
G Lunter
G Lunter
G Raghava
G Talavera
GA Churchill
GA Lunter
Hall B G
HT Mevissen
I Holmes
I Miklós
I Miklós
IL Dryden
IM Wallace
István Miklós
J Castresana
J Felsenstein
J Gatesy
J Hein
J Kim
J Zhu
JA Lake
JD Thompson
JD Thompson
JL Thorne
JL Thorne
JL Thorne
JL Thorne
Joseph L Herman
Jotun Hein
K Bucka-Lassen
K Liu
K Liu
KM Wong
L Wang
L Yu
LE Carvalho
LS Wang
M Hamada
M Hamada
M Hamada
M Höhl
M Vingron
M Vingron
M Wu
M Zuker
MA Suchard
MJ Wise
MO Dayhoff
MP Simmons
MS Waterman
MSY Lee
O Gotoh
O Penn
O Penn
O Penn
P Ajawatanawong
P Arunapuram
P Collingridge
PJ Green
PJ Green
PP Gardner
R Durbin
R Satija
R Satija
R Schwarzenbacher
RA Cartwright
RC Edgar
RJ Dickson
RJ Dickson
RK Bradley
Rune Lyngsø
S Capella-Gutiérrez
S Karlin
S Miyazawa
S Needleman
S Sinha
Silla-Martínez Capella-Gutiérrez S
SME Sahraeian
TA Hopf
TH Ogden
TL Blundell
U Roshan
V Ahola
W Fletcher
WC Wheeler
Y Liu
Y Ruffieux
Ádám Novák
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment. Results In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased. Conclusions The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign webcite

SZTAKI Publication Repository

Springer - Publisher Connector

NOX1 loss-of-function genetic variants in patients with inflammatory bowel disease.

Author: Ahmad T.
Allan C.
Anderson C. A.
Anderson C. A.
Anderson Carl A.
Arancibia Carolina
Attar M.
Auth Marcus K. H.
Bailey Adam
Barakat Farah
Barnes Ellie
Barrett J. C.
Bell J.
Bentley D.
Bird-Lieberman Beth
Braden Barbara
Braegger Christian P.
Brain Oliver
Broxholme J.
Broxholme J.
Bryant R. V.
Buck D.
Buck D.
Capitani M.
Cazier J. -B.
Cazier J. -B.
Cazier J. -B.
Cho J.
Collier Jane
COLORS in IBD
Copley R.
Cornall R.
Cornall R.
Croft Nick
Danesh J.
Denson L. A.
Donnelly P.
Donnelly P.
Donnelly P.
Duerr R. H.
East James
Edwards C.
Elawad Mamoun
Fiddy S.
Fiedler K.
Fyderek Krzysztof
Geremia Alessandra
Green A.
Gregory J.
Gregory L.
Gregory L.
Grocock R.
Hart A.
Hatton E.
Hawkey C.
Henderson Paul
Heuschkel Rob
Holmes C.
Howarth Lucy
Hughes L.
Humburg P.
Humphray S.
INTERVAL Study
Jostins L.
Jung J.
Kammermeier Jochen
Kanapin A.
Kelsen J. R.
Kennedy N. A.
Keshav Satish
Kingsbury Z.
Klenerman Paul
Knaus U. G.
Kugathasan S.
Lamb C. A.
Lamble S.
Lee J. C.
Leedham Simon
Lees C. W.
Li V. S. W.
Lise S.
Lo Bernice
Lonie L.
Lunter G.
Lunter G.
Mansfield J. C.
Martin H.
Mathew C. G.
Mathew C. G.
McCarthy D.
McCarthy D. J.
McGovern D. P. B.
McVean G.
McVean G.
McVean G.
Meran L.
Mondal K.
Moore C.
Mowat C.
Muise A. M.
Murray L.
Newman W. G.
Ouwehand W. H.
Oxford IBD cohort study investigators
Pagnamenta A.
Palmer Rebecca
Pandey S.
Parkes M.
Parkes M.
Parkes Miles
Piazza P.
Polanco G.
Posovszky Carsten
Powrie Fiona
Prescott N. J.
Ratcliffe P.
Rimmer A.
Roberts D. J.
Rodrigues A.
Rodrigues Astor
Russell R. K.
Russell Richard K.
Sahgal N.
Sambrook J.
Satsangi J.
Satsangi Jack
Schwerd T.
Serra E. G.
Shah Neil
Simmons A.
Simmons Alison
Strisciuglio Caterina
Sullivan P. B.
Sullivan Peter B
Taylor J.
Tomlinson I.
Travis S. P. L.
Travis Simon P L
Trebes A.
Tremelling M.
Uhlig H. H.
Uhlig H. H.
Uhlig Holm H
UK IBD Genetics Consortium
Wedrychowicz Andrzej
WGS500 Consortium
Wilkie A. O. M.
Wilkie A. O. M.
Wilson D. C.
Wilson D. C.
Wilson David C.
Wright B.
Yau C.
Zilbauer Matthias
Zurek Marlen
Publication venue: Mucosal Immunol
Publication date: 01/11/2017
Field of study

Genetic defects that affect intestinal epithelial barrier function can present with very early-onset inflammatory bowel disease (VEOIBD). Using whole-genome sequencing, a novel hemizygous defect in NOX1 encoding NAPDH oxidase 1 was identified in a patient with ulcerative colitis-like VEOIBD. Exome screening of 1,878 pediatric patients identified further seven male inflammatory bowel disease (IBD) patients with rare NOX1 mutations. Loss-of-function was validated in p.N122H and p.T497A, and to a lesser degree in p.Y470H, p.R287Q, p.I67M, p.Q293R as well as the previously described p.P330S, and the common NOX1 SNP p.D360N (rs34688635) variant. The missense mutation p.N122H abrogated reactive oxygen species (ROS) production in cell lines, ex vivo colonic explants, and patient-derived colonic organoid cultures. Within colonic crypts, NOX1 constitutively generates a high level of ROS in the crypt lumen. Analysis of 9,513 controls and 11,140 IBD patients of non-Jewish European ancestry did not reveal an association between p.D360N and IBD. Our data suggest that loss-of-function variants in NOX1 do not cause a Mendelian disorder of high penetrance but are a context-specific modifier. Our results implicate that variants in NOX1 change brush border ROS within colonic crypts at the interface between the epithelium and luminal microbes

Edinburgh Research Explorer

UCL Discovery

Queen Mary Research Online

ZORA

Apollo (Cambridge)

University of Melbourne Institutional Repository

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Probabilistic Phylogenetic Inference with Insertions and Deletions

Author: A Pang
A Siepel
A Siepel
A Stamatakis
AD Smith
B Boussau
B Knudsen
B Knudsen
B Knudsen
B Larget
B Mau
B Mau
B Qian
B Qian
B Qian
B Rannala
C Kosiol
C Moler
D Metzler
D Simon
David Haussler
DF Robinson
DG Hwang
DL Swofford
E Rivas
Elena Rivas
F Ronquist
G Lunter
G Lunter
G Lunter
G McGuire
GA Churchill
GJ Mitchison
GJ Mitchison
I Holmes
I Holmes
I Holmes
I Miklós
I Miklós
J Adachi
J Felsenstein
J Felsenstein
J Felsenstein
J Felsenstein
J Hein
J Hein
J Hein
J Kim
J Stoye
J Wang
JD McAuliffe
JJ Cannone
JL Thorne
JL Thorne
JL Thorne
JP Huelsenbeck
JS Pedersen
L Chindelevitch
L Coin
M Blanchette
M Dayhoff
M Gribskov
M Hasegawa
M Kimura
M Steel
MJ Bishop
MK Kuhner
MS Chang
N Goldman
P Liò
PD Keightley
R Durbin
R Fleissner
S Guindon
S Karlin
S Tavaré
S Whelan
Sean R. Eddy
SV Muse
TH Jukes
W Cai
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new “concordance test” benchmark on real ribosomal RNA alignments, we show that the extended program dnamlε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm

CiteSeerX