Search CORE

35,060 research outputs found

Linkage Disequilibrium Mapping via Cladistic Analysis of Single-Nucleotide Polymorphism Haplotypes

Author: Cardon Lon R.
Deloukas Panos
Durrant Caroline
Hunt Sarah
Morris Andrew P.
Zondervan Krina T.
Publication venue: The American Society of Human Genetics. Published by Elsevier Inc.
Publication date: 01/01/2004
Field of study

We present a novel approach to disease-gene mapping via cladistic analysis of single-nucleotide polymorphism (SNP) haplotypes obtained from large-scale, population-based association studies, applicable to whole-genome screens, candidate-gene studies, or fine-scale mapping. Clades of haplotypes are tested for association with disease, exploiting the expected similarity of chromosomes with recent shared ancestry in the region flanking the disease gene. The method is developed in a logistic-regression framework and can easily incorporate covariates such as environmental risk factors or additional unlinked loci to allow for population structure. To evaluate the power of this approach to detect disease-marker association, we have developed a simulation algorithm to generate high-density SNP data with short-range linkage disequilibrium based on empirical patterns of haplotype diversity. The results of the simulation study highlight substantial gains in power over single-locus tests for a wide range of disease models, despite overcorrection for multiple testing

Elsevier - Publisher Connector

PubMed Central

Oxford University Research Archive

Routes for breaching and protecting genetic privacy

Author: A Acquisti
A Cavoukian
A Kong
A Machanavajjhala
A Narayanan
AD Johnson
AJ Pakstis
AK Manning
AL McGuire
Arvind Narayanan
B Fons
B Malin
B Malin
BA Malin
BM Henn
C Dwork
C Shannon
CD Huff
D Clayton
D He
D Zubakov
DJ Solve
DR Nyholt
DW Craig
EA Zerhouni
EE Schadt
EM Ramos
F Liu
G Church
H Lango Allen
H Li
HK Im
HS Venter
J Burn
J Gitschier
J Kaiser
J Kaye
J Kaye
J Lee
J Marchini
JE Lunshof
JH Park
JM Oliver
JP Roberts
K Benitez
K El Emam
K El Emam
K Silventoinen
KA Tryka
KB Jacobs
KS Kendler
L Kamm
L Sweeney
L Sweeney
LA Sweeney
LA Sweeney
LAP Kohn
LL Rodriguez
M Canim
M Gymrek
M Gymrek
M Kantarcioglu
M Kayser
MD Mailman
N Chatterjee
N Homer
NN Taleb
P Bohannon
P Kwok
P Ohm
P Paillier
PM Visscher
R Braun
R Drmanac
R Khan
R Noumeir
RL Bennett
S Byers
S McClure
S Sankararaman
S Walsh
SE Brenner
SF Terry
SH Friend
T Lumley
TE King
TE King
V Bafna
W Fu
W Hartzog
WG Hill
WW Lowrance
XL Ou
Yaniv Erlich
Z Lin
Publication venue
Publication date: 01/12/2013
Field of study

We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.Comment: Draft for comment

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

PubMed Central

Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies

Author: Bientinesi Paolo
Fabregat-Traver Diego
Publication venue
Publication date: 01/01/2012
Field of study

In many scientific and engineering applications, one has to solve not one but a sequence of instances of the same problem. Often times, the problems in the sequence are linked in a way that allows intermediate results to be reused. A characteristic example for this class of applications is given by the Genome-Wide Association Studies (GWAS), a widely spread tool in computational biology. GWAS entails the solution of up to trillions (

10^{12}

) of correlated generalized least-squares problems, posing a daunting challenge: the performance of petaflops (

10^{15}

floating-point operations) over terabytes of data. In this paper, we design an algorithm for performing GWAS on multi-core architectures. This is accomplished in three steps. First, we show how to exploit the relation among successive problems, thus reducing the overall computational complexity. Then, through an analysis of the required data transfers, we identify how to eliminate any overhead due to input/output operations. Finally, we study how to decompose computation into tasks to be distributed among the available cores, to attain high performance and scalability. With our algorithm, a GWAS that currently requires the use of a supercomputer may now be performed in matter of hours on a single multi-core node. The discussion centers around the methodology to develop the algorithm rather than the specific application. We believe the paper contributes valuable guidelines of general applicability for computational scientists on how to develop and optimize numerical algorithms

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

A preliminary study of genetic factors that influence susceptibility to bovine tuberculosis in the British cattle herd

Author: A Singhal
AL Price
AM Ramírez-Villescusa
Anon
AR Allen
BM Buddle
BT Garnett
C Estrada-Chávez
EJ Lyons
Erin E. Driscoll
G Ferwerda
Graham F. Medley
GS Cooke
H Li
I Messaoudi
J Bourne
J Ohashi
J Ohashi
J Slate
JI Hoffman
JJ Carrique-Mas
JK Pritchard
JK Pritchard
Joseph I. Hoffman
K Acevedo-Whitehouse
K Imai
KG Meade
Laura E. Green
LR Cardon
M-H Li
MA Novoa
ML Bermingham
MP Epstein
N Patterson
NF Schulman
P Armitage
R Bellamy
R de la Rua-Domenech
RC Galindo
S Bennett
SL Fernando
TG Schulze
Tjeerd Kimman
W Amos
W Amos
W Amos
W Strober
William Amos
WR Waters
Y Suekawa
YJ Yoo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Associations between specific host genes and susceptibility to Mycobacterial infections such as tuberculosis have been reported in several species. Bovine tuberculosis (bTB) impacts greatly the UK cattle industry, yet genetic predispositions have yet to be identified. We therefore used a candidate gene approach to study 384 cattle of which 160 had reacted positively to an antigenic skin test (‘reactors’). Our approach was unusual in that it used microsatellite markers, embraced high breed diversity and focused particularly on detecting genes showing heterozygote advantage, a mode of action often overlooked in SNP-based studies. A panel of neutral markers was used to control for population substructure and using a general linear model-based approach we were also able to control for age. We found that substructure was surprisingly weak and identified two genomic regions that were strongly associated with reactor status, identified by markers INRA111 and BMS2753. In general the strength of association detected tended to vary depending on whether age was included in the model. At INRA111 a single genotype appears strongly protective with an overall odds ratio of 2.2, the effect being consistent across nine diverse breeds. Our results suggest that breeding strategies could be devised that would appreciably increase genetic resistance of cattle to bTB (strictly, reduce the frequency of incidence of reactors) with implications for the current debate concerning badger-culling

CiteSeerX

Public Library of Science (PLOS)

Crossref

University of Birmingham Research Portal

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Publications at Bielefeld University

Methodological Issues in Multistage Genome-Wide Association Studies

Author: Casey Graham
Conti David V.
Haile Robert W.
Lewinger Juan Pablo
Stram Daniel O.
Thomas Duncan C.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

Because of the high cost of commercial genotyping chip technologies, many investigations have used a two-stage design for genome-wide association studies, using part of the sample for an initial discovery of ``promising'' SNPs at a less stringent significance level and the remainder in a joint analysis of just these SNPs using custom genotyping. Typical cost savings of about 50% are possible with this design to obtain comparable levels of overall type I error and power by using about half the sample for stage I and carrying about 0.1% of SNPs forward to the second stage, the optimal design depending primarily upon the ratio of costs per genotype for stages I and II. However, with the rapidly declining costs of the commercial panels, the generally low observed ORs of current studies, and many studies aiming to test multiple hypotheses and multiple endpoints, many investigators are abandoning the two-stage design in favor of simply genotyping all available subjects using a standard high-density panel. Concern is sometimes raised about the absence of a ``replication'' panel in this approach, as required by some high-profile journals, but it must be appreciated that the two-stage design is not a discovery/replication design but simply a more efficient design for discovery using a joint analysis of the data from both stages. Once a subset of highly-significant associations has been discovered, a truly independent ``exact replication'' study is needed in a similar population of the same promising SNPs using similar methods.Comment: Published in at http://dx.doi.org/10.1214/09-STS288 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Solving Sequences of Generalized Least-Squares Problems on Multi-threaded Architectures

Author: Aulchenko Yurii
Bientinesi Paolo
Fabregat-Traver Diego
Publication venue
Publication date: 01/01/2012
Field of study

Generalized linear mixed-effects models in the context of genome-wide association studies (GWAS) represent a formidable computational challenge: the solution of millions of correlated generalized least-squares problems, and the processing of terabytes of data. We present high performance in-core and out-of-core shared-memory algorithms for GWAS: By taking advantage of domain-specific knowledge, exploiting multi-core parallelism, and handling data efficiently, our algorithms attain unequalled performance. When compared to GenABEL, one of the most widely used libraries for GWAS, on a 12-core processor we obtain 50-fold speedups. As a consequence, our routines enable genome studies of unprecedented size

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Network-based approaches to explore complex biological systems towards network medicine

Author: Conte Federica
Farina Lorenzo
Fiscon Giulia
Paci Paola
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Network medicine relies on different types of networks: from the molecular level of protein–protein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of protein–protein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAs—including long non-coding RNAs (lncRNAs) —competing with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genes—called switch genes—critically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes

Directory of Open Access Journals

Archivio della ricerca- Università di Roma La Sapienza