Search CORE

HAL Descartes

ProdInra

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Archivio della ricerca- Università di Roma La Sapienza

Accelerating Haplotype-Based Genome-Wide Association Study Using Perfect Phylogeny and Phase-Known Reference Data

Author: BL Browning
BL Browning
C Durrant
Christopher I. Amos
CI Amos
Cong Li
D Brinza
D Curtis
D Ge
D Gordon
D Gusfield
E Halperin
GK Chen
GS Hageman
Hua Ling
I Pe'er
J Akey
J Beuten
J Marchini
K Kim
Li Jin
M Stephens
MJ Minichiello
Momiao Xiong
N Wang
NJ Bray
P Scheet
S Purcell
S Su
SR Browning
T Mailund
T Niu
TA Manolio
Thomas Mailund
Y Li
Yungang He
Publication venue: Public Library of Science
Publication date: 15/07/2011
Field of study

The genome-wide association study (GWAS) has become a routine approach for mapping disease risk loci with the advent of large-scale genotyping technologies. Multi-allelic haplotype markers can provide superior power compared with single-SNP markers in mapping disease loci. However, the application of haplotype-based analysis to GWAS is usually bottlenecked by prohibitive time cost for haplotype inference, also known as phasing. In this study, we developed an efficient approach to haplotype-based analysis in GWAS. By using a reference panel, our method accelerated the phasing process and reduced the potential bias generated by unrealistic assumptions in phasing process. The haplotype-based approach delivers great power and no type I error inflation for association studies. With only a medium-size reference panel, phasing error in our method is comparable to the genotyping error afforded by commercial genotyping solutions

Combining μXANES and μXRD mapping to analyse the heterogeneity in calcium carbonate granules excreted by the earthworm Lumbricus terrestris

Author: Aizenberg
Berry
Brinza
Brinza
Etschmann
Fahrni
Fayard
Fittschen
Fraser
Gago-Duport
Graf
Hitchcock
Ilavsky
Isaure
J. Frederick W. Mosselmans
Jacobsen
Kalotina Geraki
Kirkham
Konstantin Ignatyev
Lambkin
Lee
Lerotic
Leung
Levi-Kalisman
Lombi
Loredana Brinza
Manceau
Manceau
Marcus
Mark E. Hodson
Meyer
Mosselmans
Muñoz
Paterson
Paul D. Quinn
Paul F. Schofield
Paunesku
Pickering
Piearce
Politi
Ravel
Reddy
Ries
Rindby
Rodriguez-Blanco
Sarret
Scharf
Solé
Sophie Weller
Sutton
Sutton
Tamura
Villiers
Wan
Zhang
Publication venue: 'International Union of Crystallography (IUCr)'
Publication date: 12/12/2013
Field of study

The use of fluorescence full spectral micro-X-ray absorption near-edge structure (μXANES) mapping is becoming more widespread in the hard energy regime. This experimental method using the Ca K-edge combined with micro-X-ray diffraction (μXRD) mapping of the same sample has been enabled on beamline I18 at Diamond Light Source. This combined approach has been used to probe both long- and short-range order in calcium carbonate granules produced by the earthworm Lumbricus terrestris. In granules produced by earthworms cultured in a control artificial soil, calcite and vaterite are observed in the granules. However, granules produced by earthworms cultivated in the same artificial soil amended with 500 p.p.m. Mg also contain an aragonite. The two techniques, μXRD and μXANES, probe different sample volumes but there is good agreement in the phase maps produced

White Rose Research Online

Recommended from our members

Biomineralisation by earthworms: an investigation into the stability and distribution of amorphous calcium carbonate

Author: A Al-Sawalmih
A Fraser
A Gal
A Kaestner
AA Coelho
B Demarchi
B Gotliv
B Guillement
Bea Demarchi
C Darwin
C Li
CL Freeman
CR Blue
DC Lambkin
DC Lambkin
DJ Tobler
DS Kaufman
EAA Versteegh
EJ Hendy
Emma A A Versteegh
F Stalport
G Cobourne
G Lambiv-Dzemua
G Luquet
H Setoguchi
HA Lowenstam
J Aizenberg
J Aizenberg
J Aizenberg
J Bolze
J Su
J Tao
JC Kühle
JD Rodriguez-Blanco
JD Rodriguez-Blanco
JD Rodriguez-Blanco
JD Rodriguez-Blanco
JR Clarkson
Juan D Rodriguez-Blanco
K Aoki
KEH Penkman
KEH Penkman
KEH Penkman
Kirsty E H Penkman
KM Poduska
L Addadi
L Brinza
L Brinza
L Brinza
L Gago-Duport
LB Gower
LB Gower
LG Benning
Liane G Benning
M Crisp
M Faatz
M Faatz
Mark E Hodson
MG Canti
MJI Briones
MR Lee
NA Davies
O Voigt
P Bots
Paul F Schofield
R Beck
R Chester
RE Arnold
RL Hill
RSK Lam
S Akhtar
S Bentov
S Raz
S Weiner
SS Wang
T Ogino
Y Politi
YUT Gong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background Many biominerals form from amorphous calcium carbonate (ACC), but this phase is highly unstable when synthesised in its pure form inorganically. Several species of earthworm secrete calcium carbonate granules which contain highly stable ACC. We analysed the milky fluid from which granules form and solid granules for amino acid (by liquid chromatography) and functional group (by Fourier transform infrared (FTIR) spectroscopy) compositions. Granule elemental composition was determined using inductively coupled plasma-optical emission spectroscopy (ICP-OES) and electron microprobe analysis (EMPA). Mass of ACC present in solid granules was quantified using FTIR and compared to granule elemental and amino acid compositions. Bulk analysis of granules was of powdered bulk material. Spatially resolved analysis was of thin sections of granules using synchrotron-based μ-FTIR and EMPA electron microprobe analysis. Results The milky fluid from which granules form is amino acid-rich (≤ 136 ± 3 nmol mg−1 (n = 3; ± std dev) per individual amino acid); the CaCO3 phase present is ACC. Even four years after production, granules contain ACC. No correlation exists between mass of ACC present and granule elemental composition. Granule amino acid concentrations correlate well with ACC content (r ≥ 0.7, p ≤ 0.05) consistent with a role for amino acids (or the proteins they make up) in ACC stabilisation. Intra-granule variation in ACC (RSD = 16%) and amino acid concentration (RSD = 22–35%) was high for granules produced by the same earthworm. Maps of ACC distribution produced using synchrotron-based μ-FTIR mapping of granule thin sections and the relative intensity of the ν2: ν4 peak ratio, cluster analysis and component regression using ACC and calcite standards showed similar spatial distributions of likely ACC-rich and calcite-rich areas. We could not identify organic peaks in the μ-FTIR spectra and thus could not determine whether ACC-rich domains also had relatively high amino acid concentrations. No correlation exists between ACC distribution and elemental concentrations determined by EMPA. Conclusions ACC present in earthworm CaCO3 granules is highly stable. Our results suggest a role for amino acids (or proteins) in this stability. We see no evidence for stabilisation of ACC by incorporation of inorganic components

Central Archive at the University of Reading

Springer - Publisher Connector

Irish Universities

Copenhagen University Research Information System

White Rose Research Online

Institutional Research Information System University of Turin

Learning genetic epistasis using Bayesian network scoring criteria

Author: A Heidema
A Herbert
AJ Brookes
B Han
BA Logsdon
BM Armes
CJ Verzilli
D Brinza
D Heckerman
D Thomas
DR Velez
E Castillo
E Perrier
E Segal
EM Reiman
FV Jensen
FV Jensen
GF Cooper
HJ Cordell
J Pearl
J Rissanen
J Suzuki
J Wu
JC Lambert
JH Moore
K Korb
KD Coon
LW Hahn
M Chickering
M Fishelson
M Fishelson
M Michael Barmada
M Spinola
MD Ritchie
N Friedman
N Friedman
N Friedman
N Friedman
N Friedman
P Sebastiani
P Spirtes
RE Neapolitan
RE Neapolitan
RE Neapolitan
RI Nagel
Richard E Neapolitan
RW Robinson
S Visweswaran
Shyam Visweswaran
T Silander
TT Wu
W Bateson
W Wongseree
X Jiang
X Wan
X Zhang
Xia Jiang
Y Meng
Y Meng
YM Cho
Publication venue: BioMed Central
Publication date: 01/03/2011
Field of study

Abstract Background Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully applied for detecting epistasis is <it>Multifactor Dimensionality Reduction </it>(MDR). Jiang et al. created a combinatorial epistasis learning method called <it>BNMBL </it>to learn Bayesian network (BN) epistatic models. They compared BNMBL to MDR using simulated data sets. Each of these data sets was generated from a model that associates two SNPs with a disease and includes 18 unrelated SNPs. For each data set, BNMBL and MDR were used to score all 2-SNP models, and BNMBL learned significantly more correct models. In real data sets, we ordinarily do not know the number of SNPs that influence phenotype. BNMBL may not perform as well if we also scored models containing more than two SNPs. Furthermore, a number of other BN scoring criteria have been developed. They may detect epistatic interactions even better than BNMBL. Although BNs are a promising tool for learning epistatic relationships from data, we cannot confidently use them in this domain until we determine which scoring criteria work best or even well when we try learning the correct model without knowledge of the number of SNPs in that model. Results We evaluated the performance of 22 BN scoring criteria using 28,000 simulated data sets and a real Alzheimer's GWAS data set. Our results were surprising in that the Bayesian scoring criterion with large values of a hyperparameter called α performed best. This score performed better than other BN scoring criteria and MDR at <it>recall </it>using simulated data sets, at detecting the hardest-to-detect models using simulated data sets, and at substantiating previous results using the real Alzheimer's data set. Conclusions We conclude that representing epistatic interactions using BN models and scoring them using a BN scoring criterion holds promise for identifying epistatic genetic variants in data. In particular, the Bayesian scoring criterion with large values of a hyperparameter α appears more promising than a number of alternatives.</p

D-Scholarship@Pitt

Inferring viral quasispecies spectra from 454 pyrosequencing reads

Author: A Sundquist
Alex Zelikovsky
AR Quinlan
B Gaschen
Bassam Tork
D Brinza
DC Douek
E Domingo
E Martinez-Salas
EA Duarte
G Myers
H Fakhrai-Rad
Ion Măndoiu
Irina Astrovskaya
JC de la Torre
JC Venter
JI Esteban
JJ Holland
JW Drake
K Westbrooks
Kelly Westbrooks
M Eigen
M Margulies
MC Prosperi
MJ Chaisson
N Beerenwinkel
N Eriksson
NM Laird
O Zagordi
O Zagordi
Peter Balfe
R Lippert
S Balser
S Hoffmann
S-Y Rhee
Serghei Mangul
SL Fishman
ST O’Neil
T von Hahn
V Bansal
W Brockman
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences. Results In this paper, we introduce a new Viral Spectrum Assembler (ViSpA) method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons), and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at <url>http://alla.cs.gsu.edu/~software/VISPA/vispa.html</url>. Conclusions ViSpA enables accurate viral quasispecies spectrum reconstruction from 454 pyrosequencing reads. We are currently exploring extensions applicable to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.</p

ScholarWorks @ Georgia State University

Springer - Publisher Connector

Clique-Finding for Heterogeneity and Multidimensionality in Biomarker Epidemiology Research: The CHAMBER Algorithm

Author: Aaron Kershenbaum
AS Foulkes
B Strom
D Brinza
D Erlenkotter
D Michie
DV Conti
E Lander
ER Hauser
H Thomas
I Kononenko
I Ruczinski
I Witten
J Chen
J Friedman
J Friedman
J Hoh
J Huang
J Lepre
J Moore
JG Liehr
Jonatan R. Ruiz
JR Quinlan
K Kira
L Breiman
MD Ritchie
MR Nelson
MY Park
N Tahri-Daizadeh
N Tahri-Daizadeh
NJ Schork
P Jaccard
R Mushlin
R Schapire
Richard A. Mushlin
Stephen Gallagher
TA Thornton-Wells
Timothy R. Rebbeck
TR Rebbeck
TR Rebbeck
TR Rebbeck
V Cortessis
V Vapnik
WD Shannon
Y Benjamini
Y Pavlov
Publication venue: Public Library of Science
Publication date: 16/03/2009
Field of study

Commonly-occurring disease etiology may involve complex combinations of genes and exposures resulting in etiologic heterogeneity. We present a computational algorithm that employs clique-finding for heterogeneity and multidimensionality in biomedical and epidemiological research (the "CHAMBER" algorithm).This algorithm uses graph-building to (1) identify genetic variants that influence disease risk and (2) predict individuals at risk for disease based on inherited genotype. We use a set-covering algorithm to identify optimal cliques and a Boolean function that identifies etiologically heterogeneous groups of individuals. We evaluated this approach using simulated case-control genotype-disease associations involving two- and four-gene patterns. The CHAMBER algorithm correctly identified these simulated etiologies. We also used two population-based case-control studies of breast and endometrial cancer in African American and Caucasian women considering data on genotypes involved in steroid hormone metabolism. We identified novel patterns in both cancer sites that involved genes that sulfate or glucuronidate estrogens or catecholestrogens. These associations were consistent with the hypothesized biological functions of these genes. We also identified cliques representing the joint effect of multiple candidate genes in all groups, suggesting the existence of biologically plausible combinations of hormone metabolism genes in both breast and endometrial cancer in both races.The CHAMBER algorithm may have utility in exploring the multifactorial etiology and etiologic heterogeneity in complex disease

High-Order SNP Combinations Associated with Complex Diseases: Efficient Discovery, Statistical Power and Functional Interactions

Author: A Gao
A Motsinger-Reif
A Subramanian
B Maher
B Van Ness
Brian Van Ness
C Greene
C Greene
C Herold
C Huttenhower
D Anastassiou
D Brinza
D Evans
D Goldstein
D Rabinowitz
D Stram
E Bey
E Eichler
E Schadt
G Dong
G Fang
G Fang
G Grahne
G Thorisson
Gang Fang
H Cordell
H He
H Wang
Haoyu Yu
J Hirschhorn
J Huang
J Lehár
J Marchini
J Moore
J Storey
J Storey
K Christensen
K Pattin
K Small
K Van Steen
K Wang
K Wang
L Cardon
L Ma
L Tentori
M Ashburner
M Carrasquillo
M Costanzo
M Nelson
M Norris
M Ritchie
M Steinbach
M Van Der Deen
Majda Haznadar
Michael Steinbach
N Yosef
P Kraft
R Agrawal
R Bayardo
R Cantor
R Dowell
R Gupta
S Baranzini
S Bay
S Purcell
S Vicent
T Church
T Church
T Howard
T Kam-Thong
T Manolio
Timothy R. Church
V Varadan
V Varadan
Vipin Kumar
W Zhang
Wen Wang
William S. Oetting
X Hua
X Lou
X Lou
X Wan
X Zhang
Y Oji
Y Zhang
Yu Zhang
Z Wang
Publication venue: Public Library of Science
Publication date: 19/04/2012
Field of study

There has been increased interest in discovering combinations of single-nucleotide polymorphisms (SNPs) that are strongly associated with a phenotype even if each SNP has little individual effect. Efficient approaches have been proposed for searching two-locus combinations from genome-wide datasets. However, for high-order combinations, existing methods either adopt a brute-force search which only handles a small number of SNPs (up to few hundreds), or use heuristic search that may miss informative combinations. In addition, existing approaches lack statistical power because of the use of statistics with high degrees-of-freedom and the huge number of hypotheses tested during combinatorial search. Due to these challenges, functional interactions in high-order combinations have not been systematically explored. We leverage discriminative-pattern-mining algorithms from the data-mining community to search for high-order combinations in case-control datasets. The substantially improved efficiency and scalability demonstrated on synthetic and real datasets with several thousands of SNPs allows the study of several important mathematical and statistical properties of SNP combinations with order as high as eleven. We further explore functional interactions in high-order combinations and reveal a general connection between the increase in discriminative power of a combination over its subsets and the functional coherence among the genes comprising the combination, supported by multiple datasets. Finally, we study several significant high-order combinations discovered from a lung-cancer dataset and a kidney-transplant-rejection dataset in detail to provide novel insights on the complex diseases. Interestingly, many of these associations involve combinations of common variations that occur in small fractions of population. Thus, our approach is an alternative methodology for exploring the genetics of rare diseases for which the current focus is on individually rare variations