Search CORE

325 research outputs found

Multinomial belief networks for healthcare data

Author: de Jong J.
Donker H. C.
Lunter G. A.
Neijzen D.
Publication venue
Publication date: 06/04/2024
Field of study

Healthcare data from patient or population cohorts are often characterized by sparsity, high missingness and relatively small sample sizes. In addition, being able to quantify uncertainty is often important in a medical context. To address these analytical requirements we propose a deep generative Bayesian model for multinomial count data. We develop a collapsed Gibbs sampling procedure that takes advantage of a series of augmentation relations, inspired by the Zhou\unicode{x2013}Cong\unicode{x2013}Chen model. We visualise the model's ability to identify coherent substructures in the data using a dataset of handwritten digits. We then apply it to a large experimental dataset of DNA mutations in cancer and show that we can identify biologically meaningful clusters of mutational signatures in a fully data-driven way.Comment: 18 pages, 4 figs; supplement: 22 page

arXiv.org e-Print Archive

The variant call format and VCFtools

Author: A. Auton
C. A. Albers
Durbin
E. Banks
G. Abecasis
G. Lunter
G. McVean
G. T. Marth
M. A. DePristo
P. Danecek
R. Durbin
R. E. Handsaker
S. T. Sherry
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API

Oxford University Research Archive

Reliability of panel-based mutational signatures for immune-checkpoint-inhibition efficacy prediction in non-small cell lung cancer

Author: Cuppens K.
Donker H. C.
Froyen G.
Groen H. J.M.
Hiltermann T. J.N.
Lunter G. A.
Maes B.
Schuuring E.
van Es B.
Volders P. J.
Publication venue
Publication date: 01/08/2023
Field of study

OBJECTIVES: Mutational signatures (MS) are gaining traction for deriving therapeutic insights for immune checkpoint inhibition (ICI). We asked if MS attributions from comprehensive targeted sequencing assays are reliable enough for predicting ICI efficacy in non-small cell lung cancer (NSCLC).METHODS: Somatic mutations of m = 126 patients were assayed using panel-based sequencing of 523 cancer-related genes. In silico simulations of MS attributions for various panels were performed on a separate dataset of m = 101 whole genome sequenced patients. Non-synonymous mutations were deconvoluted using COSMIC v3.3 signatures and used to test a previously published machine learning classifier.RESULTS: The ICI efficacy predictor performed poorly with an accuracy of 0.51 -0.09 +0.09, average precision of 0.52 -0.11 +0.11, and an area under the receiver operating characteristic curve of 0.50 -0.09 +0.10. Theoretical arguments, experimental data, and in silico simulations pointed to false negative rates (FNR) related to panel size. A secondary effect was observed, where deconvolution of small ensembles of point mutations lead to reconstruction errors and misattributions. CONCLUSION: MS attributions from current targeted panel sequencing are not reliable enough to predict ICI efficacy. We suggest that, for downstream classification tasks in NSCLC, signature attributions be based on whole exome or genome sequencing instead.</p

Proceedings - University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Resonances in a spring-pendulum: algorithms for equivariant singularity theory

Author: Arnol'd V I
Arnol'd V I
Bridges T J
Broer H W
Broer H W
Broer H W
Broer H W
Bröcker Th
Callahan T K
Cowell R G
Cox D
Deprit A
Deprit A
Duistermaat J J
G A Lunter
G Vegter
Gatermann K
Golubitsky M
Golubitsky M
Golubitsky M
H W Broer
I Hoveijn
Kas A
Lichtenberg A J
Martinet J
Meyer K R
Montaldi J
Mora F
Poston T
Poènaru V
Ruijgrok M
Ruijgrok M
Sanders J A
Sturmfels B
van der Meer J-C
Wassermann G
Publication venue
Publication date: 01/01/1998
Field of study

A spring-pendulum in resonance is a time-independent Hamiltonian model system for formal reduction to one degree of freedom, where some symmetry (reversibility) is maintained. The reduction is handled by equivariant singularity theory with a distinguished parameter, yielding an integrable approximation of the Poincaré map. This makes a concise description of certain bifurcations possible. The computation of reparametrizations from normal form to the actual system is performed by Gröbner basis techniques.

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Oxford University Research Archive

University of Groningen Digital Archive

Dissertations of the University of Groningen

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

PubMed Central

FigShare

Accurate reconstruction of insertion-deletion histories by statistical phylogenetics

Author: A Heger
A Löytynoja
A Löytynoja
A Siepel
A Siepel
A Siepel
AG Clark
AM Moses
Art F. Y. Poon
B Knudsen
B Paten
B Rannala
Benedict Paten
C Lee
C Strope
DG Higgins
EF Moore
FA Matsen
FR Kschischang
G Lunter
Gerton Lunter
I Holmes
I Miklós
Ian Holmes
J Felsenstein
JD Thompson
JL Thorne
JL Thorne
JS Pedersen
K Katoh
K Liu
KM Wong
KS Pollard
L Gomez-Valero
L Zhu
M Larkin
M Mohri
MA Suchard
N de la Chaux
O Kamneva
O Westesson
Oscar Westesson
P Markova-Raina
R Mills
RA Cartwright
RC Edgar
RK Bradley
RK Bradley
S Nelesen
S Saccone
S Sinha
T Beissbarth
X Qu
Z Wang
Z Yang
Z Yang
Z Yang
Z Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with arXiv:1103.434

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

FigShare

OpEx - a validated, automated pipeline optimised for clinical exome sequence analysis.

Author: Clarke M
Elliott A
Lunter G
Münz M
Rahman N
Ramsay E
Renwick A
Ruark E
Seal S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We present an easy-to-use, open-source Optimised Exome analysis tool, OpEx (http://icr.ac.uk/opex) that accurately detects small-scale variation, including indels, to clinical standards. We evaluated OpEx performance with an experimentally validated dataset (the ICR142 NGS validation series), a large 1000 exome dataset (the ICR1000 UK exome series), and a clinical proband-parent trio dataset. The performance of OpEx for high-quality base substitutions and short indels in both small and large datasets is excellent, with overall sensitivity of 95%, specificity of 97% and low false detection rate (FDR) of 3%. Depending on the individual performance requirements the OpEx output allows one to optimise the inevitable trade-offs between sensitivity and specificity. For example, in the clinical setting one could permit a higher FDR and lower specificity to maximise sensitivity. In contexts where experimental validation is not possible, minimising the FDR and improving specificity may be a preferable trade-off for slightly lower sensitivity. OpEx is simple to install and use; the whole pipeline is run from a single command. OpEx is therefore well suited to the increasing research and clinical laboratories undertaking exome sequencing, particularly those without in-house dedicated bioinformatics expertise

PubMed Central

Oxford University Research Archive

Institute of Cancer Research Repository

Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample

Author: A Spradling
D Bentley
G Lunter
G Van der Auwera
H Li
J Abbott
J Lack
J Robinson
M Adams
M DePristo
P Cingolani
P Innocenti
R Handsaker
R Hoskins
S Richards
W Gilks
W Gilks
W Rice
Publication venue: 'F1000 Research Ltd'
Publication date: 01/11/2016
Field of study

As part of a study into the molecular genetics of sexually dimorphic complex traits, we used high-throughput sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics

Crossref

Directory of Open Access Journals

PubMed Central

Sussex Research Online

Whole-genome sequencing of bladder cancers reveals somatic CDKN1A mutations and clinicopathological associations with mutation burden

Author: A Roth
AL Gartel
C Balbas-Martinez
C Yau
D Cappellen
D Sidransky
DA Solomon
G Gundem
G Guo
G Lunter
JB Cazier
L Lacombe
LB Alexandrov
ML Lu
MS Lawrence
P Lianes
PJ Goebell
S Denzinger
S Lise
T Abbas
Y Gui
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Bladder cancers are a leading cause of death from malignancy. Molecular markers might predict disease progression and behaviour more accurately than the available prognostic factors. Here we use whole-genome sequencing to identify somatic mutations and chromosomal changes in 14 bladder cancers of different grades and stages. As well as detecting the known bladder cancer driver mutations, we report the identification of recurrent protein-inactivating mutations in CDKN1A and FAT1. The former are not mutually exclusive with TP53 mutations or MDM2 amplification, showing that CDKN1A dysfunction is not simply an alternative mechanism for p53 pathway inactivation. We find strong positive associations between higher tumour stage/grade and greater clonal diversity, the number of somatic mutations and the burden of copy number changes. In principle, the identification of sub-clones with greater diversity and/or mutation burden within early-stage or low-grade tumours could identify lesions with a high risk of invasive progression

Crossref

University of Birmingham Research Portal

PubMed Central

Oxford University Research Archive

White Rose Research Online

University of Melbourne Institutional Repository

Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution

Author: A Bais
A Halpern
A Lifanov
A Moses
A Moses
A Moses
A Siepel
B Berman
B Knudsen
C Bergman
C Bergman
C Dewey
D Halligan
D Karolchik
D Pollard
D Pollard
D Raijman
E Berezikov
E Birney
E Blackwood
E Davidson
E Dermitzakis
F Gao
G Lunter
G Lunter
G Lunter
G Stormo
G Wray
G Wray
I Holmes
I Holmes
I Holmes
I Miklos
J Berg
J Stone
J Thorne
J Thorne
J Warner
K Wong
M Brudno
M Frith
M Frith
M Hasegawa
M Ludwig
M Ludwig
M Noyes
O Hallikas
P Andolfatto
P Keightley
P Kheradpour
P Ray
P Tomancak
R Cartwright
R Durrett
R Satija
R Siddharthan
R Waterston
S Aerts
S Doniger
S Gallo
S MacArthur
S Sinha
S Sinha
Saurabh Sinha
V Mustonen
W Huang
W Wasserman
W Wong
Wyeth W. Wasserman
X Li
X Li
Xin He
Xu Ling
Z Hu
Publication venue: Public Library of Science
Publication date: 01/03/2009
Field of study

Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs) and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i) the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii) binding sites in distal bound sequences (relative to transcription start sites) tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis), ready to be applied in a broad biological context

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central