Search CORE

83 research outputs found

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

Author: AA Schäffer
AL Delcher
Alejandro A Schäffer
B Brejová
B Hao
BG Barrell
DJ States
E Birney
E Birney
E Boy-Marcotte
E Boy-Marcotte
E Halperin
E Michael Gertz
EM Gertz
F Damak
F Zinoni
G Macino
H Peltola
IG Young
J Hein
J Hein
JC Wootton
L Knecht
M Gribskov
MS Boguski
MS Boguski
MS Gelfand
O Gotoh
P Steneberg
P Steneberg
R Durbin
Richa Agarwala
S Henikoff
S Kurtz
SA Chervitz
SC Low
SF Altschul
SF Altschul
SF Altschul
SF Altschul
Stephen F Altschul
TF Smith
W Gish
WJ Kent
WR Pearson
WR Pearson
WR Pearson
X Guan
X Huang
Yi-Kuo Yu
YK Yu
YK Yu
Z Zhang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. RESULTS: We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. CONCLUSION: TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

KB-Rank: efficient protein structure and functional annotation identification via text query

Author: A Bairoch
A Frolkis
A Hamosh
A Prlic
AA Adjei
AC Anderson
AG Murzin
AR Kinjo
AS Pachev
BB Araujo
C Chen
C Iverson
CA Orengo
CF Schaefer
D Devos
D Wang
DS Wishart
EG Cerami
EL Ulrich
Elchin S. Julfayev
ES Julfayev
EW Sayers
H Berman
HT Arkenau
I Halperin
K Arnold
K Degtyarenko
KT Flaherty
L Page
LC Borish
M Ashburner
M Magrane
MJ Gabanyi
O Abehsira-Amar
P Bucher
P Yue
PD Karp
PW Rose
RA Laskowski
RD Finn
Ryan J. McLaughlin
S Velankar
SF Altschul
T Eisen
T Liu
TA Chatila
TR Bai
U Consortium
VA McKusick
William A. McLaughlin
WR Pitt
Y Wang
Y Ye
Yi-Ping Tao
Publication venue: Springer Netherlands
Publication date: 01/01/2012
Field of study

The KB-Rank tool was developed to help determine the functions of proteins. A user provides text query and protein structures are retrieved together with their functional annotation categories. Structures and annotation categories are ranked according to their estimated relevance to the queried text. The algorithm for ranking first retrieves matches between the query text and the text fields associated with the structures. The structures are next ordered by their relative content of annotations that are found to be prevalent across all the structures retrieved. An interactive web interface was implemented to navigate and interpret the relevance of the structures and annotation categories retrieved by a given search. The aim of the KB-Rank tool is to provide a means to quickly identify protein structures of interest and the annotations most relevant to the queries posed by a user. Informational and navigational searches regarding disease topics are described to illustrate the tool’s utilities. The tool is available at the URL http://protein.tcmedc.org/KB-Rank

Crossref

Springer - Publisher Connector

PubMed Central

Rapid Evolution of Pandemic Noroviruses of the GII.4 Lineage

Author: AJ Hay
BV Prasad
CD Ward
DJ Allen
DP Zheng
DT Pride
E Domingo
E Domingo
E Domingo
E Domingo
E Nobusawa
Esteban Domingo
ET Tu
GS Hansman
JA Bruenn
JJ Siebenga
JJ Siebenga
JK Pfeiffer
JM Choi
John-Sebastian Eden
JW Drake
K Rispeter
K Tamura
KY Green
L Lindesmith
LA Jones
LC Lindesmith
LH Blanton
M Andreoni
M Nei
M Tan
M Tan
M Tan
M Vignuzzi
MA Pletneva
MF Boni
MK Estes
MU Mondelli
NM Ferguson
O Lund
Peter A. White
PJ Glass
R Chen
R Montville
R Sallie
RA Bull
RL Atmar
Rowena A. Bull
S Cao
S Kumar
SC Manrubia
SF Elena
T Halperin
TA Kunkel
TG Phan
TJ Doyle
William D. Rawlinson
WL DeLano
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Over the last fifteen years there have been five pandemics of norovirus (NoV) associated gastroenteritis, and the period of stasis between each pandemic has been progressively shortening. NoV is classified into five genogroups, which can be further classified into 25 or more different human NoV genotypes; however, only one, genogroup II genotype 4 (GII.4), is associated with pandemics. Hence, GII.4 viruses have both a higher frequency in the host population and greater epidemiological fitness. The aim of this study was to investigate if the accuracy and rate of replication are contributing to the increased epidemiological fitness of the GII.4 strains. The replication and mutation rates were determined using in vitro RNA dependent RNA polymerase (RdRp) assays, and rates of evolution were determined by bioinformatics. GII.4 strains were compared to the second most reported genotype, recombinant GII.b/GII.3, the rarely detected GII.3 and GII.7 and as a control, hepatitis C virus (HCV). The predominant GII.4 strains had a higher mutation rate and rate of evolution compared to the less frequently detected GII.b, GII.3 and GII.7 strains. Furthermore, the GII.4 lineage had on average a 1.7-fold higher rate of evolution within the capsid sequence and a greater number of non-synonymous changes compared to other NoVs, supporting the theory that it is undergoing antigenic drift at a faster rate. Interestingly, the non-synonymous mutations for all three NoV genotypes were localised to common structural residues in the capsid, indicating that these sites are likely to be under immune selection. This study supports the hypothesis that the ability of the virus to generate genetic diversity is vital for viral fitness

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UNSWorks

The LabelHash algorithm for substructure matching

Background: There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Results: We present LabelHash, a novel algorithm for matching substructural motifs to large collections of protein structures. The algorithm consists of two phases. In the first phase the proteins are preprocessed in a fashion that allows for instant lookup of partial matches to any motif. In the second phase, partial matches for a given motif are expanded to complete matches. The general applicability of the algorithm is demonstrated with three different case studies. First, we show that we can accurately identify members of the enolase superfamily with a single motif. Next, we demonstrate how LabelHash can complement SOIPPA, an algorithm for motif identification and pairwise substructure alignment. Finally, a large collection of Catalytic Site Atlas motifs is used to benchmark the performance of the algorithm. LabelHash runs very efficiently in parallel; matching a motif against all proteins in the 95 % sequence identity filtered non-redundant Protein Data Bank typically takes no more than a few minutes. The LabelHash algorithm is available through a web server and as a suite of standalone programs a

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Application of the PM6 semi-empirical method to modeling proteins enhances docking accuracy of AutoDock

Abstract Background Molecular docking methods are commonly used for predicting binding modes and energies of ligands to proteins. For accurate complex geometry and binding energy estimation, an appropriate method for calculating partial charges is essential. AutoDockTools software, the interface for preparing input files for one of the most widely used docking programs AutoDock 4, utilizes the Gasteiger partial charge calculation method for both protein and ligand charge calculation. However, it has already been shown that more accurate partial charge calculation - and as a consequence, more accurate docking- can be achieved by using quantum chemical methods. For docking calculations quantum chemical partial charge calculation as a routine was only used for ligands so far. The newly developed Mozyme function of MOPAC2009 allows fast partial charge calculation of proteins by quantum mechanical semi-empirical methods. Thus, in the current study, the effect of semi-empirical quantum-mechanical partial charge calculation on docking accuracy could be investigated. Results The docking accuracy of AutoDock 4 using the original AutoDock scoring function was investigated on a set of 53 protein ligand complexes using Gasteiger and PM6 partial charge calculation methods. This has enabled us to compare the effect of the partial charge calculation method on docking accuracy utilizing AutoDock 4 software. Our results showed that the docking accuracy in regard to complex geometry (docking result defined as accurate when the RMSD of the first rank docking result complex is within 2 Å of the experimentally determined X-ray structure) significantly increased when partial charges of the ligands and proteins were calculated with the semi-empirical PM6 method. Out of the 53 complexes analyzed in the course of our study, the geometry of 42 complexes were accurately calculated using PM6 partial charges, while the use of Gasteiger charges resulted in only 28 accurate geometries. The binding affinity estimation was not influenced by the partial charge calculation method - for more accurate binding affinity prediction development of a new scoring function for AutoDock is needed. Conclusion Our results demonstrate that the accuracy of determination of complex geometry using AutoDock 4 for docking calculation greatly increases with the use of quantum chemical partial charge calculation on both the ligands and proteins.</p

Crossref

Directory of Open Access Journals

PubMed Central

Scoring docking conformations using predicted protein interfaces

Author: A Brückner
A Porollo
ADJ Van Dijk
AMJJ Bonvin
AW Ghoorah
B Huang
B Li
B Pierce
C Dominguez
C Yan
D Douguet
D Kozakov
D Kozakov
D La
DW Ritchie
DW Ritchie
E Mashiach
G Kuzu
GR Smith
GR Smith
H Chen
H Hwang
H Hwang
H Neuvirth
HM Berman
HX Zhou
HX Zhou
I Halperin
IA Vakser
J Fernández-Recio
J Fernández-Recio
J Janin
J Janin
J Mintseris
J Pande
JD Thompson
Jean-Christophe Nebel
JJ Gray
JL Chung
JR Bradford
K Ethan
K Henrick
K Krawczyk
K Venkatesan
KE Gottschalk
L Li
LC Xue
LC Xue
M Ohh
M Tyagi
M Šikić
MF Lensink
MF Lensink
MG Kann
N Andrusier
N Zhao
OG Othersen
P Baldi
P Chen
P Kuo
PH Patel
PJ Kundrotas
QC Zhang
QC Zhang
QC Zhang
R Chen
R Esmaielbeiki
R Esmaielbeiki
R Grünberg
R Khashan
RA Jordan
Reyhaneh Esmaielbeiki
S Jones
S Liang
S Qin
S Qin
SF Altschul
SJ De Vries
SJ De Vries
SJ Fleishman
SR Comeau
T Fawcett
T Vreven
Y Ofran
Y Ofran
Y Ofran
Y Ofran
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

BACKGROUND: Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. In this study, we contribute to this topic by proposing T-PioDock, a framework for detection of a native-like docked complex 3D structure. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by the interface predictor, Template based Protein Interface Prediction (T-PIP). RESULTS: First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, is a state-of-the-art method. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing approach. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations. CONCLUSION: Accurate identification of near-native conformations remains a challenging task. Although availability of 3D complexes will benefit from template-based methods such as T-PioDock, we have identified specific limitations which need to be addressed. First, docking software are still not able to produce native like models for every target. Second, current interface predictors do not explicitly consider pairwise residue interactions between proteins and their interacting partners which leaves ambiguity when assessing quality of complex conformations

Crossref

Springer - Publisher Connector

PubMed Central

Kingston University Research Repository

Drug Off-Target Effects Predicted Using Structural Analysis in the Context of a Metabolic Network Model

Author: AB Zinn
AL Hopkins
AM Feist
AR Tall
AS Brickman
Bernhard Ø. Palsson
C Gellera
CJ Dickinson
D Markovich
D Weininger
DA McCarron
DB Searls
DC Hatton
DS Lee
E Kristal-Boneh
F Ferrari
F Lang
G Eshel
GJ de Grooth
GV Paolini
H Jacquet
H Shen
HL Teijema
HS Tenenhouse
IM Frey
J Sadowski
JA Kuivenhoven
JC Frölich
JD Durant
JM Jones
JT Wang
K Hyland
K Sangkuhl
KH Neumann
L Xie
L Xie
Lei Xie
Li Xie
LJ Appel
LJ Elsas
LP van den Heuvel
M Haas
M Hermann
M Miyamoto
M Ruiz
M Zeviani
MA Abdul-Ghani
MA Kamal
MA Oberhardt
MF Albertoni Borghese
MF Holick
MF Holick
MF Sanner
MJ Forrest
MK Hellerstein
ML Halperin
N Jamshidi
NC Duarte
O Shmueli
O Trott
P de Lonlay
P San-Cristobal
Philip E. Bourne
PJ Barter
PK Leong
R Berkow
R Krishna
RC Gentleman
RJ Bindels
RM Carey
RO Banks
Roger L. Chang
Roland L. Dunbrack
RW Moreadith
S de Seigneux
S Kitanaka
SA Becker
SA Becker
SF Altschul
SJ Lee
SL Kinnings
T Ito
T Nakayama
T Shlomi
TL Perry
TY Kim
V Humbertclaude
V Vitart
WF Boron
WS Cleveland
Y Konno
Y Sasaki
Z Wang
ZJ Twardowski
Publication venue: Public Library of Science
Publication date: 01/09/2010
Field of study

Recent advances in structural bioinformatics have enabled the prediction of protein-drug off-targets based on their ligand binding sites. Concurrent developments in systems biology allow for prediction of the functional effects of system perturbations using large-scale network models. Integration of these two capabilities provides a framework for evaluating metabolic drug response phenotypes in silico. This combined approach was applied to investigate the hypertensive side effect of the cholesteryl ester transfer protein inhibitor torcetrapib in the context of human renal function. A metabolic kidney model was generated in which to simulate drug treatment. Causal drug off-targets were predicted that have previously been observed to impact renal function in gene-deficient patients and may play a role in the adverse side effects observed in clinical trials. Genetic risk factors for drug treatment were also predicted that correspond to both characterized and unknown renal metabolic disorders as well as cryptic genetic deficiencies that are not expected to exhibit a renal disorder phenotype except under drug treatment. This study represents a novel integration of structural and systems biology and a first step towards computational systems medicine. The methodology introduced herein has important implications for drug development and personalized medicine

City University of New York

Crossref

Directory of Open Access Journals

PubMed Central

An interaction-motif-based scoring function for protein-ligand docking

Author: A Gutteridge
A Petitjean
A Shulman-Peleg
A Shulman-Peleg
AK Soper
AM Ruvinsky
AR Leach
B Kramer
BA Grzybowski
BR Zeeberg
C Brenner
C Brenner
DB Kitchen
E Mashiach
G Jones
G Jones
G Schneider
GJ Bartlett
GL Warren
GM Morris
H Gohlke
HF Velec
HM Berman
I Halperin
I Muegge
J Tirado-Rives
JM Yang
KD Koehntop
M Koyuturk
M Rarey
MD Cummings
MD Eldridge
Ming-Jing Hwang
MM Babu
O Roche
P Ferrara
PK Weiner
R Dobrin
R Milo
R Taylor
R Wang
RD Taylor
S Mintz
SF Sousa
SS Shen-Orr
WC Liu
Zhong-Ru Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A global reference for human genetic variation

Author: Abecasis GR
Abecasis GR
Abecasis GR
Abecasis GR
Abecasis GR
Abecasis GR
Abecasis GR
Abyzov A
Abyzov A
Albers CA
Albrecht MW
Albrecht MW
Alkan C
Alkan C
Altshuler DM
Altshuler DM
Altshuler DM
Altshuler DM
Altshuler DM
Amstislavskiy VS
Amstislavskiy VS
Ananiev V
Antaki D
Antaki D
Antunes L
Asogun D
Auton A
Auton A
Auton A
Awadalla P
Ayub Q
Ayub Q
Bafna V
Bainbridge M
Bainbridge M
Balasubramaniam S
Balasubramaniam S
Balasubramaniam S
Balasubramanian S
Balasubramanian S
Balasubramanian S
Ball EV
Banerjee R
Banks E
Banks E
Baran Y
Barker J
Barnes B
Barnes B
Barnes KC
Barnes KC
Batzer MA
Batzer MA
Bauer M
Beal K
Bedoya G
Beiswanger C
Belaia Z
Beloslyudtsev D
Bentley DR
Bentley DR
Bentley DR
Bentley DR
Bentley DR
Bhatia G
Blackburne B
Blackwell T
Bodmer W
Boerwinkle E
Borodina TA
Bouk N
Brook LD
Brooks LD
Browning BL
Browning SR
Burchard EG
Burchard EG
Burton J
Bustamante CD
Bustamante CD
Bustamante CD
Bustamante CD
Byrnes JK
Cai H
Cai Z
Campbell CL
Cao H
Caron S
Carroll AW
Casale FP
Cerezo M
Cerveira E
Cerveira E
Cerveira E
Chaisson MJ
Chakravarti A
Chakravarti A
Challis D
Chang Y
Cheetham RK
Cheetham RK
Chen C
Chen J
Chen J
Chen K
Chen T
Chen W
Chen Y
Chen Y
Chen Y
Chew E
Chong Z
Christoforides A
Christoforides A
Chu J
Church D
Church D
Churchhouse C
Clark AG
Clark AG
Clark AG
Clarke D
Clarke D
Clarke L
Clarke L
Clarke L
Clarke L
Clarke L
Clarke L
Clarke L
Cohen R
Coin L
Coin LJM
Colonna V
Colonna V
Cook C
Cooper DN
Corrah T
Cox A
Cox A
Craig DW
Craig DW
Craig DW
Craig DW
Cunningham F
Dal E
Dal E
Daly MJ
Dan X
Danecek P
Danecek P
Datta A
Davies CJ
Dayama G
De La Vega FM
Degenhardt J
DeGorter MK
DeGorter MK
del Angel G
Del Angel G
del Angel G
Delaneau O
Deng X
DePristo MA
DePristo MA
Dermitzakis ET
Dermitzakis ET
Desalle R
Devine SE
Devine SE
Ding L
Ding L
Doddapaneni H
Donnelly P
Duncanson A
Dunham I
Dunn M
Dunstan SJ
Durbin RM
Durbin RM
Durbin RM
Durbin RM
Durbin RM
Durbin RM
Dutil J
Eberle M
Eberle M
Eichler EE
Eichler EE
Eichler EE
Emery S
Emery S
Erlich Y
Erlich Y
Evani US
Fan X
Fang L
Fang X
Felsenfeld A
Feng Q
Fitzgerald T
Flicek P
Flicek P
Flicek P
Flicek P
Flicek P
Flicek P
Flicek P
Flicek P
Folarin O
Fonnie R
Fritsche L
Fritz MH
Fritz MH
Fu Y
Fu Y
Fu Y
Fuchsberger C
Fulton L
Fulton R
Fulton R
Gabriel SB
Gabriel SB
Gabriel SB
Gabriel SB
Gallo C
Gao Y
Garcia-Montero A
Gardner EJ
Garner J
Garrison EP
Garrison EP
Garrison EP
Garrison EP
Garrison EP
Garry R
Genovese G
Genovese G
Gerry NP
Gerry NP
Gerstein MB
Gerstein MB
Gerstein MB
Gerstein MB
Gharani N
Gharani N
Gibbs RA
Gibbs RA
Gibbs RA
Gibbs RA
Gibbs RA
Gibbs RA
Gignoux CR
Gignoux CR
Gil L
Gollub J
Goncalo RA
Gottipati S
Grant DS
Gravel S
Gravel S
Gravel S
Green ED
Green ED
Grocock R
Gujral M
Guo X
Guo X
Guo X
Guo X
Gupta N
Gupta N
Gupta N
Gupta-Hinch A
Gymrek M
Gymrek M
Habegger L
Hale W
Hall I
Halperin E
Han Y
Handsaker RE
Handsaker RE
Handsaker RE
Happi C
Harmanci AO
Hartl C
Hartl C
Haussler D
Haussler D
Hefferon T
Henn B
Hennis A
Hernandez RD
Herrero J
Herwig R
Hodgkinson A
Homer N
Homer N
Homer N
Hormozdiari F
Hormozdiari F
Horn H
Howie B
Huang Z
Huddleston J
Humphray S
Humphray S
Humphray S
Humphray S
Hunt SE
Hurles ME
Hurles ME
Hurles ME
Hwang J
Hwang J
Hyland FCL
Iqbal Z
Izatt T
Izatt T
Izatt T
Jallow M
James T
Jespersen JB
Jian M
Jiang H
Jin M
Jin X
Jin X
Jones D
Joof FS
Jorde L
Jorde L
Jostins L
Jun G
Jun G
Kahn S
Kahn S
Kahn S
Kahveci F
Kahveci F
Kalra D
Kang HM
Kang HM
Kang HM
Kanneh L
Kashin S
Kashin S
Kashin S
Katzman SJ
Kaye JS
Keane TM
Keane TM
Keane TM
Keinan A
Keinan A
Kelman G
Kenny EE
Kent A
Kent WJ
Kerasidou A
Khurana E
Khurana E
Khurana E
Kidd JM
Kidd JM
Kim D
Kimelman M
Kingsbury Z
Knoppers BM
Knoppers BM
Koboldt DC
Koboldt DC
Kolb-Kokocinski A
Kong Y
Konkel MK
Konkel MK
Kooner J
Korbel JO
Korbel JO
Korbel JO
Korbel JO
Korchina V
Kovar C
Kovar C
Kovar C
Kretzschmar W
Kulesha E
Kural D
Kural D
Kurdoglu AA
Kurdoglu AA
Kwiatkowski D
Lacroute P
Lacroute P
Lage K
Lam H
Lameijer E-W
Lan T
Lander ES
Lander ES
Lander ES
Lappalainen T
LaRocque R
Larson D
Larson D
Lee C
Lee C
Lee C
Lee C
Lee D
Lee S
Lee W-P
Lee W-P
Lehrach H
Lehrach H
Leinonen R
Lek M
Leong WF
Leong WF
Li B
Li G
Li G
Li G
Li H
Li H
Li J
Li Q
Li W
Li Y
Li Y
Li Y
Li Y
Li Y
Li Y
Li Y
Li Z
Lienhard M
Lienhard M
Lihm J
Lihm J
Lin H
Lindsay SJ
Liu B
Liu C
Liu J
Liu S
Liu X
Liu X
Lopez J
Louzada S
Lu J
Lu Y
Lunter G
Lunter G
Luo R
Luo R
Lyons R
Ma X
MacArthur DG
MacArthur DG
Makarov V
Makarov V
Malhotra A
Malhotra A
Malhotra A
Malig M
Maples BK
Marchini JL
Marchini JL
Marcketta A
Marcketta A
Mardis ER
Mardis ER
Mardis ER
Mardis ER
Marth GT
Marth GT
Marth GT
Marth GT
Martin AR
Martinez-Cruzado JC
Massaia A
Mathias R
Mathias RA
Mathieson I
McCarroll SA
McCarroll SA
McCarroll SA
McCarthy S
McCarthy S
McCarthy S
McCarthy S
McCarthy S
McEwen JE
McKenzie C
McLaren WM
McLaren WM
McVean GA
McVean GA
McVean GA
McVean GA
McVean GA
McVean GA
McVean GA
McVean GA
Meiers S
Mendez FL
Menelaou A
Meric P
Mertes F
Michaelson J
Mills RE
Mittelman D
Montgomery SB
Montgomery SB
Moreno-Estrada A
Moreno-Estrada A
Moses L
Mu XJ
Mu XJ
Murray L
Murray L
Muzny D
Muzny D
Muzny D
Muzny D
Myers S
Nagaswamy U
Narechania A
Nelson BJ
Nemesh JC
Nemesh JC
Nguyen TH
Nickerson DA
Ning Z
Noor A
O'Sullivan C
Oleksyk TK
Oleksyk TK
Omoniwa O
Orfao A
Ossorio PN
Ostapchuk Y
Parker M
Parrish NF
Peden J
Peltonen L
Phan L
Plewczynski D
Plewczynski D
Poletti G
Ponomarov S
Poplin RE
Poplin RE
Poznik GD
Qadri F
Quail M
Quitadamo A
Quitadamo A
Radew K
Radew K
Radhakrishnan R
Raeder B
Rasheed A
Rausch T
Rausch T
Reid JG
Reid JG
Reid JG
Resch AM
Resch AM
Rimmer A
Ritchie GR
Ritchie GRS
Roa A
Rockett K
Rodriguez-Flores JL
Rodriguez-Flores JL
Rodriguez-Flores JL
Romanovitch M
Romanovitch M
Romanovitch M
Rosenfeld JA
Rotimi CN
Royal CD
Ruiz-Linares A
Ruiz-Linares A
Sabeti PC
Sabeti PC
Sabeti PC
Sabeti PC
Sabo A
Sabo A
Saleheen D
Sandoval K
Sayres MAW
Schaffner SF
Scheller C
Schieffelin J
Schloss JA
Schmidt JP
Schmidt JP
Schneider V
Sebat J
Sebat J
Shakir K
Shao H
Shao H
Shaw R
Shaw R
Shekhtman E
Sherry ST
Sherry ST
Sherry ST
Sherry ST
Sherry ST
Shi X
Shi X
Shlyakhter I
Shringarpure SS
Shriver MD
Sidore C
Simpson JT
Sinari SA
Sirotkin K
Sisu C
Sliwerska E
Slotta D
Smirnov D
Smith RE
Smith RE
Smith RE
Smith RE
Song S
Squire K
Stalker J
Stalker J
Stegle O
Stenson PD
Streeter I
Stremlau M
Stromberg M
Stuetz AM
Stuetz AM
Su Y
Sudbrak R
Sudbrak R
Sudbrak R
Sudbrak R
Sudmant PH
Sudmant PH
Sultan M
Swaroop A
Taliun D
Tan A
Tang M
Tariyal R
Thormann A
Tian Z
Timmermann B
Tishkoff S
Toji LH
Toji LH
Toneva I
Tran TH
Tyler-Smith C
Tyler-Smith C
Tyler-Smith C
Tyler-Smith C
Underhill PA
Vaughan B
Vaydylevich Y
Via M
Vitti J
Walker JA
Walker JA
Walter K
Walter K
Wang B
Wang G
Wang J
Wang J
Wang J
Wang J
Wang J
Wang Y
Ward AN
Ward AN
Ward AN
Watson H
Webster T
Welch R
Willems TF
Willems TF
Wilson RK
Wilson RK
Wing MK
Witherspoon D
Witherspoon D
Wong B
Wu H
Wu J
Wu J
Wu R
Wu R
Xiao C
Xiao C
Xiao C
Xiao C
Xie Y
Xifara DK
Xing J
Xing J
Xiong M
Xu X
Xue Y
Xue Y
Xue Y
Yang F
Yang H
Yang H
Yang L
Yaspo M-L
Ye C
Ye C
Ye K
Ye K
Ye K
Ye K
Yin Y
Yoon SC
Yoon SC
Yu C
Yu F
Yu F
Yu H
Yu J
Zakharia F
Zerbino D
Zerbino D
Zhan X
Zhan Y
Zhang C
Zhang C
Zhang C
Zhang D
Zhang F
Zhang H
Zhang J
Zhang J
Zhang M
Zhang M
Zhang W
Zhang Y
Zhang Y
Zhang Y
Zhao J
Zhao M
Zheng H
Zheng X
Zheng X
Zheng-Bradley X
Zheng-Bradley X
Zheng-Bradley X
Zheng-Bradley X
Zheng-Bradley X
Zheng-Bradley X
Zheng-Bradley X
Zhu H
Zhu H
Zhu J
Zhu J
Zhu Y
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.We thank the many people who were generous with contributing their samples to the project: the African Caribbean in Barbados; Bengali in Bangladesh; British in England and Scotland; Chinese Dai in Xishuangbanna, China; Colombians in Medellin, Colombia; Esan in Nigeria; Finnish in Finland; Gambian in Western Division – Mandinka; Gujarati Indians in Houston, Texas, USA; Han Chinese in Beijing, China; Iberian populations in Spain; Indian Telugu in the UK; Japanese in Tokyo, Japan; Kinh in Ho Chi Minh City, Vietnam; Luhya in Webuye, Kenya; Mende in Sierra Leone; people with African ancestry in the southwest USA; people with Mexican ancestry in Los Angeles, California, USA; Peruvians in Lima, Peru; Puerto Ricans in Puerto Rico; Punjabi in Lahore, Pakistan; southern Han Chinese; Sri Lankan Tamil in the UK; Toscani in Italia; Utah residents (CEPH) with northern and western European ancestry; and Yoruba in Ibadan, Nigeria. Many thanks to the people who contributed to this project: P. Maul, T. Maul, and C. Foster; Z. Chong, X. Fan, W. Zhou, and T. Chen; N. Sengamalay, S. Ott, L. Sadzewicz, J. Liu, and L. Tallon; L. Merson; O. Folarin, D. Asogun, O. Ikpwonmosa, E. Philomena, G. Akpede, S. Okhobgenin, and O. Omoniwa; the staff of the Institute of Lassa Fever Research and Control (ILFRC), Irrua Specialist Teaching Hospital, Irrua, Edo State, Nigeria; A. Schlattl and T. Zichner; S. Lewis, E. Appelbaum, and L. Fulton; A. Yurovsky and I. Padioleau; N. Kaelin and F. Laplace; E. Drury and H. Arbery; A. Naranjo, M. Victoria Parra, and C. Duque; S. Däkel, B. Lenz, and S. Schrinner; S. Bumpstead; and C. Fletcher-Hoppe. Funding for this work was from the Wellcome Trust Core Award 090532/Z/09/Z and Senior Investigator Award 095552/Z/11/Z (P.D.), and grants WT098051 (R.D.), WT095908 and WT109497 (P.F.), WT086084/Z/08/Z and WT100956/Z/13/Z (G.M.), WT097307 (W.K.), WT0855322/Z/08/Z (R.L.), WT090770/Z/09/Z (D.K.), the Wellcome Trust Major Overseas program in Vietnam grant 089276/Z.09/Z (S.D.), the Medical Research Council UK grant G0801823 (J.L.M.), the UK Biotechnology and Biological Sciences Research Council grants BB/I02593X/1 (G.M.) and BB/I021213/1 (A.R.L.), the British Heart Foundation (C.A.A.), the Monument Trust (J.H.), the European Molecular Biology Laboratory (P.F.), the European Research Council grant 617306 (J.L.M.), the Chinese 863 Program 2012AA02A201, the National Basic Research program of China 973 program no. 2011CB809201, 2011CB809202 and 2011CB809203, Natural Science Foundation of China 31161130357, the Shenzhen Municipal Government of China grant ZYC201105170397A (J.W.), the Canadian Institutes of Health Research Operating grant 136855 and Canada Research Chair (S.G.), Banting Postdoctoral Fellowship from the Canadian Institutes of Health Research (M.K.D.), a Le Fonds de Recherche duQuébec-Santé (FRQS) research fellowship (A.H.), Genome Quebec (P.A.), the Ontario Ministry of Research and Innovation – Ontario Institute for Cancer Research Investigator Award (P.A., J.S.), the Quebec Ministry of Economic Development, Innovation, and Exports grant PSR-SIIRI-195 (P.A.), the German Federal Ministry of Education and Research (BMBF) grants 0315428A and 01GS08201 (R.H.), the Max Planck Society (H.L., G.M., R.S.), BMBF-EPITREAT grant 0316190A (R.H., M.L.), the German Research Foundation (Deutsche Forschungsgemeinschaft) Emmy Noether Grant KO4037/1-1 (J.O.K.), the Beatriu de Pinos Program grants 2006 BP-A 10144 and 2009 BP-B 00274 (M.V.), the Spanish National Institute for Health Research grant PRB2 IPT13/0001-ISCIII-SGEFI/FEDER (A.O.), Ewha Womans University (C.L.), the Japan Society for the Promotion of Science Fellowship number PE13075 (N.P.), the Louis Jeantet Foundation (E.T.D.), the Marie Curie Actions Career Integration grant 303772 (C.A.), the Swiss National Science Foundation 31003A_130342 and NCCR “Frontiers in Genetics” (E.T.D.), the University of Geneva (E.T.D., T.L., G.M.), the US National Institutes of Health National Center for Biotechnology Information (S.S.) and grants U54HG3067 (E.S.L.), U54HG3273 and U01HG5211 (R.A.G.), U54HG3079 (R.K.W., E.R.M.), R01HG2898 (S.E.D.), R01HG2385 (E.E.E.), RC2HG5552 and U01HG6513 (G.T.M., G.R.A.), U01HG5214 (A.C.), U01HG5715 (C.D.B.), U01HG5718 (M.G.), U01HG5728 (Y.X.F.), U41HG7635 (R.K.W., E.E.E., P.H.S.), U41HG7497 (C.L., M.A.B., K.C., L.D., E.E.E., M.G., J.O.K., G.T.M., S.A.M., R.E.M., J.L.S., K.Y.), R01HG4960 and R01HG5701 (B.L.B.), R01HG5214 (G.A.), R01HG6855 (S.M.), R01HG7068 (R.E.M.), R01HG7644 (R.D.H.), DP2OD6514 (P.S.), DP5OD9154 (J.K.), R01CA166661 (S.E.D.), R01CA172652 (K.C.), P01GM99568 (S.R.B.), R01GM59290 (L.B.J., M.A.B.), R01GM104390 (L.B.J., M.Y.Y.), T32GM7790 (C.D.B., A.R.M.), P01GM99568 (S.R.B.), R01HL87699 and R01HL104608 (K.C.B.), T32HL94284 (J.L.R.F.), and contracts HHSN268201100040C (A.M.R.) and HHSN272201000025C (P.S.), Harvard Medical School Eleanor and Miles Shore Fellowship (K.L.), Lundbeck Foundation Grant R170-2014-1039 (K.L.), NIJ Grant 2014-DN-BX-K089 (Y.E.), the Mary Beryl Patch Turnbull Scholar Program (K.C.B.), NSF Graduate Research Fellowship DGE-1147470 (G.D.P.), the Simons Foundation SFARI award SF51 (M.W.), and a Sloan Foundation Fellowship (R.D.H.). E.E.E. is an investigator of the Howard Hughes Medical Institute

Cold Spring Harbor Laboratory Institutional Repository

Bilkent University Institutional Repository

Serveur académique lausannois

LSU Scholarly Repository (Louisiana State Univ.)

Carolina Digital Repository

Spiral - Imperial College Digital Repository

Online Research Database In Technology

MPG.PuRe

Brunel University Research Archive

HKU Scholars Hub

University of Queensland eSpace

Crossref