Search CORE

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

CERN Document Server

PileLine: a toolbox to handle genome position information in next-generation sequencing studies

Author: D Blankenberg
Daniel Glez-Peña
David G Pisano
Florentino Fdez-Riverola
Gonzalo Gómez-López
H Li
L Ding
Miguel Reboiro-Jato
ML Metzker
O Harismendy
P Kumar
V Ramensky
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

eScholarship - University of California

Recommended from our members

Exome sequencing of Finnish isolates enhances rare-variant association power.

Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power

Deriving a mutation index of carcinogenicity using protein structure and protein interfaces

Author: A Custodio
A David
A Dixit
A Hamosh
A Pal
AJ Bass
Anna Tramontano
B Reva
B Vogelstein
CJ Richardson
CM Croce
D Chasman
D Sims
D Talavera
D Xu
E Krissinel
EC Chao
ER Mardis
F Damm
Frances Pearl
G Birrane
G De Baets
H Boutselakis
H Carter
H Makishima
IA Adzhubei
IS Moreira
J Carlsson
Jarle Hakas
JM Hurst
JM Izarzugaza
JR Morris
K Wang
Konstantinos Mitsopoulos
L Breiman
L Ding
M Li
M Magrane
Marketa Zvelebil
MR Stratton
MR Stratton
MS Greenblatt
MW MacArthur
MY Frederic
Octavio Espinosa
P Flicek
P Kumar
P Srivastava
PA Chan
PA Futreal
PB Crowley
PC Ng
PC Ng
PD Stenson
PH Lee
PT Wan
PV Hornbeck
PY Chou
R Ferla
R Rajasekaran
RJ Kinsella
S Jones
S Sunyaev
S Velankar
SA Forbes
TM Anne
V Ramensky
W Huang da
W Kabsch
X Wang
X Wang
Y Bromberg
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/

CiteSeerX

Institute of Cancer Research Repository

Sussex Research Online

FigShare

DNA Sequence Analysis of SLC26A5, Encoding Prestin, in a Patient-Control Cohort: Identification of Fourteen Novel DNA Sequence Variations

Author: Amanda Ewart Toland
BW Conner
D Navaratnam
D Oliver
DB Mount
Fred A. Pereira
G Van Camp
GE Green
Hsiao-Yuan Tang
HY Tang
HY Tang
J Zheng
J Zheng
Jacob S. Minor
JC Barrett
JT den Dunnen
L Deak
L Rajagopalan
MC Liberman
P Dallos
P Dallos
P Dallos
PC Ng
Raye Lynn Alford
S Ogino
T Toth
TJ Schaechinger
V Ramensky
XZ Liu
Publication venue: Public Library of Science
Publication date: 01/06/2009
Field of study

Prestin, encoded by the gene SLC26A5, is a transmembrane protein of the cochlear outer hair cell (OHC). Prestin is required for the somatic electromotile activity of OHCs, which is absent in OHCs and causes severe hearing impairment in mice lacking prestin. In humans, the role of sequence variations in SLC26A5 in hearing loss is less clear. Although prestin is expected to be required for functional human OHCs, the clinical significance of reported putative mutant alleles in humans is uncertain.To explore the hypothesis that SLC26A5 may act as a modifier gene, affecting the severity of hearing loss caused by an independent etiology, a patient-control cohort was screened for DNA sequence variations in SLC26A5 using sequencing and allele specific methods. Patients in this study carried known pathogenic or controversial sequence variations in GJB2, encoding Connexin 26, or confirmed or suspected sequence variations in SLC26A5; controls included four ethnic populations. Twenty-three different DNA sequence variations in SLC26A5, 14 of which are novel, were observed: 4 novel sequence variations were found exclusively among patients; 7 novel sequence variations were found exclusively among controls; and, 12 sequence variations, 3 of which are novel, were found in both patients and controls. Twenty-one of the 23 DNA sequence variations were located in non-coding regions of SLC26A5. Two coding sequence variations, both novel, were observed only in patients and predict a silent change, p.S434S, and an amino acid substitution, p.I663V. In silico analysis of the p.I663V amino acid variation suggested this variant might be benign. Using Fisher's exact test, no statistically significant difference was observed between patients and controls in the frequency of the identified DNA sequence variations. Haplotype analysis using HaploView 4.0 software revealed the same predominant haplotype in patients and controls and derived haplotype blocks in the patient-control cohort similar to those generated from the International HapMap Project.Although these data fail to support a hypothesis that SLC26A5 acts as a modifier gene of GJB2-related hearing loss, the sample size is small and investigation of a larger population might be more informative. The 14 novel DNA sequence variations in SLC26A5 reported here will serve as useful research tools for future studies of prestin

Genome-Wide Analysis of Human Disease Alleles Reveals That Their Locations Are Correlated in Paralogous Proteins

Author: A Hamosh
A Raas-Rothschild
Andrew MacBride
Barry Moore
BR Bettencourt
C Le Caignec
Charles White
Chris Mungall
D Mount
David Eisenberg
DC Thomas
EW. Weisstein
Fidel Salas
G Jimenez-Sanchez
I Kobayashi
I Korf
ID Krantz
JH Menkes
JM Chen
JS Taylor
L Li
L Zhang
M Yandell
Mark Yandell
Martin G. Reese
MH Shokeir
OR Brown
PC Ng
PC Ng
PC Ng
PC Ng
PD Sponseller
PD Stenson
S Henikoff
S Sunyaev
SF Altschul
ST Sherry
TJ Goka
TR Rebbeck
V Ramensky
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

The millions of mutations and polymorphisms that occur in human populations are potential predictors of disease, of our reactions to drugs, of predisposition to microbial infections, and of age-related conditions such as impaired brain and cardiovascular functions. However, predicting the phenotypic consequences and eventual clinical significance of a sequence variant is not an easy task. Computational approaches have found perturbation of conserved amino acids to be a useful criterion for identifying variants likely to have phenotypic consequences. To our knowledge, however, no study to date has explored the potential of variants that occur at homologous positions within paralogous human proteins as a means of identifying polymorphisms with likely phenotypic consequences. In order to investigate the potential of this approach, we have assembled a unique collection of known disease-causing variants from OMIM and the Human Genome Mutation Database (HGMD) and used them to identify and characterize pairs of sequence variants that occur at homologous positions within paralogous human proteins. Our analyses demonstrate that the locations of variants are correlated in paralogous proteins. Moreover, if one member of a variant-pair is disease-causing, its partner is likely to be disease-causing as well. Thus, information about variant-pairs can be used to identify potentially disease-causing variants, extend existing procedures for polymorphism prioritization, and provide a suite of candidates for further diagnostic and therapeutic purposes

CiteSeerX

Improving the prediction of disease-related variants using protein three-dimensional structure

Author: B Li
CC Chang
E Capriotti
E Capriotti
E Capriotti
E Capriotti
E Capriotti
E Capriotti
EI Boyle
Emidio Capriotti
G Wainreb
H Berman
H Zhou
HapMap Consortium
International Human Genome Sequencing Consortium
J Pei
JS Kaminker
L Bao
L Bao
M Cargill
MA Care
ML Waters
P Baldi
P Yue
PC Ng
PC Ng
PD Thomas
PD Thomas
R Calabrese
R Guerois
R Karchin
RG Cotton
RJ Dobson
Russ B Altman
SF Altschul
SF Betz
ST Sherry
V Parthiban
V Ramensky
VG Krishnan
W Kabsch
Y Bromberg
YL Yip
Z Wang
ZQ Ye
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability. Non-synonymous SNPs occurring in coding regions result in single amino acid polymorphisms (SAPs) that may affect protein function and lead to pathology. Several methods attempt to estimate the impact of SAPs using different sources of information. Although sequence-based predictors have shown good performance, the quality of these predictions can be further improved by introducing new features derived from three-dimensional protein structures.Results: In this paper, we present a structure-based machine learning approach for predicting disease-related SAPs. We have trained a Support Vector Machine (SVM) on a set of 3,342 disease-related mutations and 1,644 neutral polymorphisms from 784 protein chains. We use SVM input features derived from the protein's sequence, structure, and function. After dataset balancing, the structure-based method (SVM-3D) reaches an overall accuracy of 85%, a correlation coefficient of 0.70, and an area under the receiving operating characteristic curve (AUC) of 0.92. When compared with a similar sequence-based predictor, SVM-3D results in an increase of the overall accuracy and AUC by 3%, and correlation coefficient by 0.06. The robustness of this improvement has been tested on different datasets and in all the cases SVM-3D performs better than previously developed methods even when compared with PolyPhen2, which explicitly considers in input protein structure information.Conclusion: This work demonstrates that structural information can increase the accuracy of disease-related SAPs identification. Our results also quantify the magnitude of improvement on a large dataset. This improvement is in agreement with previously observed results, where structure information enhanced the prediction of protein stability changes upon mutation. Although the structural information contained in the Protein Data Bank is limiting the application and the performance of our structure-based method, we expect that SVM-3D will result in higher accuracy when more structural date become available. \ua9 2011 Capriotti; licensee BioMed Central Ltd

CiteSeerX

Springer - Publisher Connector

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Predicting disease-associated substitution of a single amino acid by analyzing residue interactions

Author: A del Sol
A Liaw
AM Fernandez-Escamilla
B Li
C Ferrer-Costa
C Ferrer-Costa
C Kosiol
CT Saunders
DJ Watts
G Amitai
G Bagler
H Carter
Hui Yin
J Reumers
Jiamin Xiao
JS Kaminker
JS Kaminker
KV Brinda
L Bao
L Bao
L Breman
Lezheng Yu
LH Greene
Li Yang
M Mort
M Vendruscolo
MEJ Newman
Menglong Li
NV Dokholyan
P Yue
P Yue
P Yue
PA Alexander
PC Ng
PC Ng
PD Thomas
R Karchin
RA Gibbs
RJ Dobson
S Miyazawa
S Sunyaev
SF Altschul
ST Sherry
V Ramensky
W Kabsch
W Lee
Y Bromberg
Y Bromberg
Yizhou Li
YL Yip
YL Yip
Z Wang
Zhining Wen
ZQ Ye
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues. Results We found that SAPs can be well characterized by network topological features. Mutations are probably disease-associated when they occur at a site with a high centrality value and/or high degree value in a protein structure network. We also discovered that study of the neighboring residues around a mutation site can help to determine whether the mutation is disease-related or not. We compiled a dataset from the Swiss-Prot variant pages and constructed a model to predict disease-associated SAPs based on the random forest algorithm. The values of total accuracy and MCC were 83.0% and 0.64, respectively, as determined by 5-fold cross-validation. With an independent dataset, our model achieved a total accuracy of 80.8% and MCC of 0.59, respectively. Conclusions The satisfactory performance suggests that network topological features can be used as quantification measures to determine the importance of a site on a protein, and this approach can complement existing methods for prediction of disease-associated SAPs. Moreover, the use of this method in SAP studies would help to determine the underlying linkage between SAPs and diseases through extensive investigation of mutual interactions between residues.</p

Springer - Publisher Connector

A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context

Genome-wide association studies (GWAS) with hundreds of żthousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network-oriented analysis and prior knowledge from functional properties of a SNP

Sabanci University Research Database

Cardiac Alpha-Myosin (MYH6) Is the Predominant Sarcomeric Disease Gene for Familial Atrial Septal Defects

Secundum-type atrial septal defects (ASDII) account for approximately 10% of all congenital heart defects (CHD) and are associated with a familial risk. Mutations in transcription factors represent a genetic source for ASDII. Yet, little is known about the role of mutations in sarcomeric genes in ASDII etiology. To assess the role of sarcomeric genes in patients with inherited ASDII, we analyzed 13 sarcomeric genes (MYH7, MYBPC3, TNNT2, TCAP, TNNI3, MYH6, TPM1, MYL2, CSRP3, ACTC1, MYL3, TNNC1, and TTN kinase region) in 31 patients with familial ASDII using array-based resequencing. Genotyping of family relatives and control subjects as well as structural and homology analyses were used to evaluate the pathogenic impact of novel non-synonymous gene variants. Three novel missense mutations were found in the MYH6 gene encoding alpha-myosin heavy chain (R17H, C539R, and K543R). These mutations co-segregated with CHD in the families and were absent in 370 control alleles. Interestingly, all three MYH6 mutations are located in a highly conserved region of the alpha-myosin motor domain, which is involved in myosin-actin interaction. In addition, the cardiomyopathy related MYH6-A1004S and the MYBPC3-A833T mutations were also found in one and two unrelated subjects with ASDII, respectively. No mutations were found in the 11 other sarcomeric genes analyzed. The study indicates that sarcomeric gene mutations may represent a so far underestimated genetic source for familial recurrence of ASDII. In particular, perturbations in the MYH6 head domain seem to play a major role in the genetic origin of familial ASDII