Search CORE

470 research outputs found

A whitening approach to probabilistic canonical correlation analysis for omics data integration

Author: Jendoubi Bedhiafi T
Strimmer K
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/12/2018
Field of study

ackground Canonical correlation analysis (CCA) is a classic statistical tool for investigating complex multivariate data. Correspondingly, it has found many diverse applications, ranging from molecular biology and medicine to social science and finance. Intriguingly, despite the importance and pervasiveness of CCA, only recently a probabilistic understanding of CCA is developing, moving from an algorithmic to a model-based perspective and enabling its application to large-scale settings. Results Here, we revisit CCA from the perspective of statistical whitening of random variables and propose a simple yet flexible probabilistic model for CCA in the form of a two-layer latent variable generative model. The advantages of this variant of probabilistic CCA include non-ambiguity of the latent variables, provisions for negative canonical correlations, possibility of non-normal generative variables, as well as ease of interpretation on all levels of the model. In addition, we show that it lends itself to computationally efficient estimation in high-dimensional settings using regularized inference. We test our approach to CCA analysis in simulations and apply it to two omics data sets illustrating the integration of gene expression data, lipid concentrations and methylation levels. Conclusions Our whitening approach to CCA provides a unifying perspective on CCA, linking together sphering procedures, multivariate regression and corresponding probabilistic generative models. Furthermore, we offer an efficient computer implementation in the “whitening” R package available at https://CRAN.R-project.org/package=whitening

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

The University of Manchester - Institutional Repository

Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo

Author: Fahrmeir L.
Opgen-Rhein R.
Strimmer K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Background: Coalescent theory is a general framework to model genetic variation in a population. Specifically, it allows inference about population parameters from sampled DNA sequences. However, most currently employed variants of coalescent theory only consider very simple demographic scenarios of population size changes, such as exponential growth. Results: Here we develop a coalescent approach that allows Bayesian non-parametric estimation of the demographic history using genealogies reconstructed from sampled DNA sequences. In this framework inference and model selection is done using reversible jump Markov chain Monte Carlo (MCMC). This method is computationally efficient and overcomes the limitations of related non-parametric approaches such as the skyline plot. We validate the approach using simulated data. Subsequently, we reanalyze HIV-1 sequence data from Central Africa and Hepatitis C virus (HCV) data from Egypt. Conclusions: The new method provides a Bayesian procedure for non-parametric estimation of the demographic history. By construction it additionally provides confidence limits and may be used jointly with other MCMC-based coalescent approaches

Springer - Publisher Connector

Open Access LMU

PubMed Central

The University of Manchester - Institutional Repository

Reconstructing phylogenetic level-1 networks from nondense binet and trinet sets

Author: AV Aho
B Holland
C Choy
C Semple
Celine Scornavacca
D Gusfield
D Huson
DH Huson
E Bapteste
F Pardi
G Cardona
H Poormohammadi
J Jansson
J Jansson
J Jansson
K Strimmer
Katharina T. Huber
KT Huber
KT Huber
KT Huber
Leo van Iersel
LJJ Iersel van
P Gambette
Taoyang Wu
Vincent Moulton
Y Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/11/2014
Field of study

Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set T of binary binets or trinets over a taxon set X, and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for binets. Moreover, we show that the problem is still polynomial-time solvable for inputs consisting of binets and trinets as long as the cycles in the trinets have size three. Finally, we present an O(3^{|X|} poly(|X|)) time algorithm for general sets of binets and trinets. The latter two algorithms generalise to instances containing level-1 networks with arbitrarily many leaves, and thus provide some of the first supernetwork algorithms for computing networks from a set of rooted 1 phylogenetic networks

arXiv.org e-Print Archive

CiteSeerX

Crossref

TU Delft Repository

Springer - Publisher Connector

INRIA a CCSD electronic archive server

HAL Descartes

HAL-IRD

University of East Anglia digital repository

HAL-CIRAD

Evolution of genes and repeats in the Nimrod superfamily

Author: Andrade
B. Sipos
Bork
Bork
Callebaut
Chen
D. Hultmark
Do
Doliana
E. Kurucz
Edgar
Evans
Finn
Guindon
Holt
Huelsenbeck
Hughes
I. Ando
J. Zsamboki
Ju
K. Somogyi
Kumar
Kumar
Kurucz
Liao
Mangahas
McAllister
Morgenstern
Nei
Nei
Nei
Nei
Nishikawa
Notredame
Ota
Parmley
Posada
Quesada
Redelings
Russo
Schuster-B ckler
Simmons
Stajich
Strimmer
Swanson
Swidan
Thompson
Xia
Z. Penzes
Zdobnov
Zhang
Zou
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2008
Field of study

The recently identified Nimrod superfamily is characterized by the presence of a special type of EGF repeat, the NIM repeat, located right after a typical CCXGY/W amino acid motif. On the basis of structural features, nimrod genes can be divided into three types. The proteins encoded by Draper-type genes have an EMI domain at the N-terminal part and only one copy of the NIM motif, followed by a variable number of EGF-like repeats. The products of Nimrod B-type and Nimrod C-type genes (including the eater gene) have different kinds of N-terminal domains, and lack EGF-like repeats but contain a variable number of NIM repeats. Draper and Nimrod C-type (but not Nimrod B-type) proteins carry a transmembrane domain. Several members of the superfamily were claimed to function as receptors in phagocytosis and/or binding of bacteria, which indicates an important role in the cellular immunity and the elimination of apoptotic cells. In this paper, the evolution of the Nimrod superfamily is studied with various methods on the level of genes and repeats. A hypothesis is presented in which the NIM repeat, along with the EMI domain, emerged by structural reorganizations at the end of an EGF-like repeat chain, suggesting a mechanism for the formation of novel types of repeats. The analyses revealed diverse evolutionary patterns in the sequences containing multiple NIM repeats. Although in the Nimrod B and Nimrod C proteins show characteristics of independent evolution, many internal NIM repeats in Eater sequences seem to have undergone concerted evolution. An analysis of the nimrod genes has been performed using phylogenetic and other methods and an evolutionary scenario of the origin and diversification of the Nimrod superfamily is proposed. Our study presents an intriguing example how the evolution of multigene families may contribute to the complexity of the innate immune response

Crossref

Repository of the Academy's Library

Definition of the σW regulon of Bacillus subtilis in the absence of stress

Author: A Petersohn
A Saito
Adam Driks
AJ Jervis
AW Kingston
BG Butcher
BM Alba
BM Bolstad
C Eymann
CD Ellermeier
E Padan
Emma L. Denham
G Chen
H Hahne
I Wadenpohl
J Cheng
J Heinrich
J Heinrich
J Heinrich
J Heinrich
Jan Maarten van Dijl
JC Zweers
JD Helmann
Jessica C. Zweers
K Asai
K Kanehara
K Strimmer
KT Hughes
L Steil
M Cao
M Cao
M Cao
M Ogura
M Pietiainen
M Yoshimura
MS Turner
P Bisicchia
P Nicolas
Pierre Nicolas
RE Dalbey
S Dubrac
S Jordan
S Leskela
S Rasmussen
S Schobel
S Sterberg
S Tojo
T Mascher
T Wiegert
TD Ho
Thomas Wiegert
W Eiamphungporn
X Huang
X Huang
Y Luo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Bacteria employ extracytoplasmic function (ECF) sigma factors for their responses to environmental stresses. Despite intensive research, the molecular dissection of ECF sigma factor regulons has remained a major challenge due to overlaps in the ECF sigma factor-regulated genes and the stimuli that activate the different ECF sigma factors. Here we have employed tiling arrays to single out the ECF σW regulon of the Gram-positive bacterium Bacillus subtilis from the overlapping ECF σX, σY, and σM regulons. For this purpose, we profiled the transcriptome of a B. subtilis sigW mutant under non-stress conditions to select candidate genes that are strictly σW-regulated. Under these conditions, σW exhibits a basal level of activity. Subsequently, we verified the σW-dependency of candidate genes by comparing their transcript profiles to transcriptome data obtained with the parental B. subtilis strain 168 grown under 104 different conditions, including relevant stress conditions, such as salt shock. In addition, we investigated the transcriptomes of rasP or prsW mutant strains that lack the proteases involved in the degradation of the σW anti-sigma factor RsiW and subsequent activation of the σW-regulon. Taken together, our studies identify 89 genes as being strictly σW-regulated, including several genes for non-coding RNAs. The effects of rasP or prsW mutations on the expression of σW-dependent genes were relatively mild, which implies that σW-dependent transcription under non-stress conditions is not strictly related to RasP and PrsW. Lastly, we show that the pleiotropic phenotype of rasP mutant cells, which have defects in competence development, protein secretion and membrane protein production, is not mirrored in the transcript profile of these cells. This implies that RasP is not only important for transcriptional regulation via σW, but that this membrane protease also exerts other important post-transcriptional regulatory functions

University of Groningen

Directory of Open Access Journals

HAL Descartes

Warwick Research Archives Portal Repository

ProdInra

Hal-Diderot

FigShare

Public Library of Science (PLOS)

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

PubMed Central

Dissertations of the University of Groningen

Exploiting high-throughput cell line drug screening studies to identify candidate therapeutic agents in head and neck cancer

Author: A Argiris
AC Nichols
AF Gazdar
AJ Folkes
AL Tang
B Burtness
B Efron
CO Ndubaku
D Juric
D Juric
G Rabinowits
GL Shaw
I Brana
J Aubert
J Barretina
J Rodon
JA Bonner
JC Brenner
JG Paez
JP Gillet
K Strimmer
K Strimmer
K Takeuchi
M Lacroix
M Zhao
MJ Garnett
MS van der Heijden
N Agrawal
N Stransky
PT Hennessey
S Domcke
S Nylander
S Papillon-Cavanagh
SG Baker
SV Sharma
VW Lui
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

NetDiff – Bayesian model selection for differential gene regulatory network inference

Author: AA Margolin
J Cooper-Knock
J Grau
J West
K Strimmer
M DeJesus-Hernandez
M Kanehisa
M Kanehisa
N Dâ Ambrosi
N Krämer
P Langfelder
PJ Green
R Opgen-Rhein
S Bandyopadhyay
S Okawa
S Sathasivam
S Vukosavic
T Thorne
T Wang
Z Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/11/2016
Field of study

Differential networks allow us to better understand the changes in cellular processes that are exhibited in conditions of interest, identifying variations in gene regulation or protein interaction between, for example, cases and controls, or in response to external stimuli. Here we present a novel methodology for the inference of differential gene regulatory networks from gene expression microarray data. Specifically we apply a Bayesian model selection approach to compare models of conserved and varying network structure, and use Gaussian graphical models to represent the network structures. We apply a variational inference approach to the learning of Gaussian graphical models of gene regulatory networks, that enables us to perform Bayesian model selection that is significantly more computationally efficient than Markov Chain Monte Carlo approaches. Our method is demonstrated to be more robust than independent analysis of data from multiple conditions when applied to synthetic network data, generating fewer false positive predictions of differential edges. We demonstrate the utility of our approach on real world gene expression microarray data by applying it to existing data from amyotrophic lateral sclerosis cases with and without mutations in C9orf72, and controls, where we are able to identify differential network interactions for further investigation

Central Archive at the University of Reading

Crossref

PubMed Central

Spiral - Imperial College Digital Repository

Surrey Research Insight

The highly rearranged mitochondrial genomes of the crabs Maja crispata and Maja squinado (Majidae) and gene order evolution in Brachyura

Author: AD Miller
AE Smith
AH Sahyoun
AT Beckenbach
BQ Minh
C Hahn
C Moritz
DR Wolstenholme
DV Lavrov
E Negrisolo
F Jühling
F Kilpert
FJ Lin
G Shi
G Shi
G Sotelo
G Tan
G Tang
H Ma
H Shen
H Shimodaira
H Sun
I Marcadé
IM Fearnley
J Yang
JD Thompson
JL Boore
JL Boore
JL Boore
JL Boore
JM Sung
JM Sung
JS Ki
K Strimmer
K Tamura
LM Tsang
LS Quang
LT Nguyen
M Babbucci
M Bernt
M Bernt
M Bernt
M Dowton
M Hui
MH Tan
MH Tan
MM Yamauchi
N Lartillot
NT Perna
O Rota-Stabelli
P Cantatore
P Salvato
R Lanfear
R Raimond
RD Segawa
S Grave De
S Montelli
S Saito
SF Altschul
SJ Kim
SL Cameron
TA Rawlings
TM Lowe
X Wang
Y Xing
YK Ji
YQ Yu
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Abstract We sequenced the mitochondrial genomes of the spider crabs Maja crispata and Maja squinado (Majidae, Brachyura). Both genomes contain the whole set of 37 genes characteristic of Bilaterian genomes, encoded on both \u3b1- and \u3b2-strands. Both species exhibit the same gene order, which is unique among known animal genomes. In particular, all the genes located on the \u3b2-strand form a single block. This gene order was analysed together with the other nine gene orders known for the Brachyura. Our study confirms that the most widespread gene order (BraGO) represents the plesiomorphic condition for Brachyura and was established at the onset of this clade. All other gene orders are the result of transformational pathways originating from BraGO. The different gene orders exhibit variable levels of genes rearrangements, which involve only tRNAs or all types of genes. Local homoplastic arrangements were identified, while complete gene orders remain unique and represent signatures that can have a diagnostic value. Brachyura appear to be a hot-spot of gene order diversity within the phylum Arthropoda. Our analysis, allowed to track, for the first time, the fully evolutionary pathways producing the Brachyuran gene orders. This goal was achieved by coupling sophisticated bioinformatic tools with phylogenetic analysis

Crossref

Archivio istituzionale della ricerca - Università di Padova

Cumulants and the moment algebra: tools for analysing weak measurements

Author: A.N. Kolmogorov
A.N. Kolmogorov
C.H. Bennett
C.J.C. Burges
D. Benedetto
D.B. Lenat
G. Tzanetakis
K. Strimmer
M. Li
M. Li
M. Li
M. Li
M. Li
M. Li
M.E. Lesk
P. Cimiano
R. Cilibrasi
R. Cilibrasi
R. Duda
T. Landauer
X. Chen
Publication venue
Publication date: 01/01/2006
Field of study

Recently it has been shown that cumulants significantly simplify the analysis of multipartite weak measurements. Here we consider the mathematical structure that underlies this, and find that it can be formulated in terms of what we call the moment algebra. Apart from resulting in simpler proofs, the flexibility of this structure allows generalizations of the original results to a number of weak measurement scenarios, including one where the weakly interacting pointers reach thermal equilibrium with the probed system.Comment: Journal reference added, minor correction

arXiv.org e-Print Archive

CiteSeerX

Crossref

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences

Author: A.-M. Vandamme
Alcantara
B. Galvao-Castro
Bracho
Gessain
Gifford
Hahn
Hemelaar
John-Stewart
K. Deforche
L. C. J. Alcantara
M. Van Ranst
Mahieux
O. G. Pybus
P. Libin
Rambaut
Ronquist
S. Cassol
Salemi
Schiffman
Simmonds
Strimmer
T. de Oliveira
Verdonck
Walter
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

PubMed Central

Oxford University Research Archive

RCAAP - Repositório Científico de Acesso Aberto de Portugal

UPSpace at the University of Pretoria