Search CORE

Scholar Commons - University of South Florida

FigShare

Genome-wide mapping of IBD segments in an Ashkenazi PD cohort identifies associated haplotypes

Author: +10 additional authors
Bar-Shira A.
Clark L. N.
Gana-Weisz M.
Guha S.
Gurevich T.
Gusev A.
Lencz T.
Orr-Urtreger A.
Ozelius L. J.
Vacic V.
Publication venue: Donald and Barbara Zucker School of Medicine Academic Works
Publication date: 01/01/2014
Field of study

The recent series of large genome-wide association studies in European and Japanese cohorts established that Parkinson disease (PD) has a substantial genetic component. To further investigate the genetic landscape of PD, we performed a genome-wide scan in the largest to date Ashkenazi Jewish cohort of 1130 Parkinson patients and 2611 pooled controls. Motivated by the reduced disease allele heterogeneity and a high degree of identical-by-descent (IBD) haplotype sharing in this founder population, we conducted a haplotype association study based on mapping of shared IBD segments. We observed significant haplotype association signals at three previously implicated Parkinson loci: LRRK2 (OR = 12.05, P = 1.23 x 10(-56)), MAPT (OR = 0.62, P = 1.78 x 10(-11)) and GBA (multiple distinct haplotypes, OR \u3e 8.28, P = 1.13 x 10(-11) and OR = 2.50, P = 1.22 x 10(-9)). In addition, we identified a novel association signal on chr2q14.3 coming from a rare haplotype (OR = 22.58, P = 1.21 x 10(-10)) and replicated it in a secondary cohort of 306 Ashkenazi PD cases and 2583 controls. Our results highlight the power of our haplotype association method, particularly useful in studies of founder populations, and reaffirm the benefits of studying complex diseases in Ashkenazi Jewish cohorts

Hofstra Northwell Academic Works (Hofstra Northwell School of Medicine)

FAAST: Flow-space Assisted Alignment Search Tool

Author: Bengt Persson
Björn Andersson
DJ Lipman
Fredrik Lysholm
J Jerlström-Hultqvist
M Droege
M Margulies
MO Dayhoff
O Gotoh
R Kofler
S Balzer
SB Needleman
SF Altschul
SF Altschul
TF Smith
V Vacic
WR Pearson
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background High throughput pyrosequencing (454 sequencing) is the major sequencing platform for producing long read high throughput data. While most other sequencing techniques produce reading errors mainly comparable with substitutions, pyrosequencing produce errors mainly comparable with gaps. These errors are less efficiently detected by most conventional alignment programs and may produce inaccurate alignments. Results We suggest a novel algorithm for calculating the optimal local alignment which utilises flowpeak information in order to improve alignment accuracy. Flowpeak information can be retained from a 454 sequencing run through interpretation of the binary SFF-file format. This novel algorithm has been implemented in a program named FAAST (Flow-space Assisted Alignment Search Tool). Conclusions We present and discuss the results of simulations that show that FAAST, through the use of the novel algorithm, can gain several percentage points of accuracy compared to Smith-Waterman-Gotoh alignments, depending on the 454 data quality. Furthermore, through an efficient multi-thread aware implementation, FAAST is able to perform these high quality alignments at high speed. The tool is available at <url>http://www.ifm.liu.se/bioinfo/</url></p

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

FAAST: Flow-space Assisted Alignment Search Tool

Author: Fredrik Lysholm
Björn Andersson
Bengt Persson
M Margulies
M Droege
SB Needleman
TF Smith
O Gotoh
DJ Lipman
WR Pearson
SF Altschul
SF Altschul
MO Dayhoff
V Vacic
R Kofler
S Balzer
J Jerlström-Hultqvist
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Publikationer från Linköpings universitet

Aston Publications Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Effects of HMGN variants on the cellular transcription profile

Author: Ashburner
Berman
Bianchi
Birger
Birger
Bolstad
Bradbury
Bustin
Bustin
Bustin
Bustin
D. Landsman
Ding
Dunker
Fan
Fan
Garner
Gautier
Gentleman
Hock
I. Ovcharenko
Kim
L. Taher
Lee
Li
Lu
M. Bustin
M. Rochman
Paranjape
Postnikov
Postnikov
Rochman
Romero
Romero
S. Cherukuri
Sancho
Shirakawa
T. Kurahashi
Tompa
Tompa
V. N. Uversky
Vacic
Vavouri
Woodcock
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

High mobility group N (HMGN) is a family of intrinsically disordered nuclear proteins that bind to nucleosomes, alters the structure of chromatin and affects transcription. A major unresolved question is the extent of functional specificity, or redundancy, between the various members of the HMGN protein family. Here, we analyze the transcriptional profile of cells in which the expression of various HMGN proteins has been either deleted or doubled. We find that both up- and downregulation of HMGN expression altered the cellular transcription profile. Most, but not all of the changes were variant specific, suggesting limited redundancy in transcriptional regulation. Analysis of point and swap HMGN mutants revealed that the transcriptional specificity is determined by a unique combination of a functional nucleosome-binding domain and C-terminal domain. Doubling the amount of HMGN had a significantly larger effect on the transcription profile than total deletion, suggesting that the intrinsically disordered structure of HMGN proteins plays an important role in their function. The results reveal an HMGN-variant-specific effect on the fidelity of the cellular transcription profile, indicating that functionally the various HMGN subtypes are not fully redundant

USFSP Digital Archive

Scholar Commons - University of South Florida

Prediction of prognostic biomarkers for Interferon-based therapy to Hepatitis C Virus patients: a metaanalysis of the NS5A protein in subtypes 1a, 1b, and 3a

Author: A El-Shamy
A Macdonald
A Wohnsland
B Korber
B Liu
C Kuiken
C Sarrazin
D Wang
E Baralis
ea El-Hefnawi Mahmoud
GR Reyes
Iman A El-Azab
J Cohen
J Felsenstein
J Nousbaum
J Pei
J Song
JM Pawlotsky
K Tamura
M Clamp
M Torres-Puente
M Wistrand
Mahmoud M ElHefnawi
MM El Hefnawi
N Pavio
P Farci
RD Finn
SR Eddy
Suher Zada
TA Hall
U Mihm
V Vacic
V Wagner
WLaP Jiawei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Public Library of Science (PLOS)

Inherent Structural Disorder and Dimerisation of Murine Norovirus NS1-2 Protein

Author: A Campen
A Shevchenko
AK Dunker
AK Dunker
AK Dunker
AK Dunker
B Lopman
B Mészáros
B Mészáros
Bl Mészáros
C Zintz
CD Kirkwood
CE Wobus
CE Wobus
CJ Brown
CM Fauquet
D Bailey
DP Zheng
DT Jones
DT Jones
E Gasteiger
E Hébrard
Estelle S. Baker
Eugene A. Permyakov
G Belliot
H Xie
H Xie
HJ Dyson
HP Erickson
Ian N. Clarke
J Habchi
J Prilusky
JL Hyde
JL Hyde
JW Chen
JW Drake
K Bryson
K Ettayebi
Kurt L. Krause
L Whitmore
LB Thackray
LM Siegel
M Hogbom
M Tan
MJ Kim
MM Patel
N Sreerama
N Sreerema
N Tokuriki
P Lieutaud
P Radivojac
P Romero
P Romero
P Tompa
Paul R. Lambden
PE Wright
PJ Hughes
R Linding
R Linding
RE Nettles
RI Glass
RL Atmar
RL Fankhauser
SM Karst
SV Sosnovtsev
SV Sosnovtsev
SW Provencher
Sylvia R. Luckner
T Mittag
T Nugent
T Pfister
TM Sharp
V Fernandez-Vega
V Receveur-Bréchot
V Vacic
V Vacic
Vernon K. Ward
VN Uversky
VN Uversky
VN Uversky
VN Uversky
VN Uversky
VN Uversky
VN Uversky
X Li
Y He
Z Dosztányi
ZR Zang
Publication venue: Public Library of Science
Publication date: 07/02/2012
Field of study

Human noroviruses are highly infectious viruses that cause the majority of acute, non-bacterial epidemic gastroenteritis cases worldwide. The first open reading frame of the norovirus RNA genome encodes for a polyprotein that is cleaved by the viral protease into six non-structural proteins. The first non-structural protein, NS1-2, lacks any significant sequence similarity to other viral or cellular proteins and limited information is available about the function and biophysical characteristics of this protein. Bioinformatic analyses identified an inherently disordered region (residues 1–142) in the highly divergent N-terminal region of the norovirus NS1-2 protein. Expression and purification of the NS1-2 protein of Murine norovirus confirmed these predictions by identifying several features typical of an inherently disordered protein. These were a biased amino acid composition with enrichment in the disorder promoting residues serine and proline, a lack of predicted secondary structure, a hydrophilic nature, an aberrant electrophoretic migration, an increased Stokes radius similar to that predicted for a protein from the pre-molten globule family, a high sensitivity to thermolysin proteolysis and a circular dichroism spectrum typical of an inherently disordered protein. The purification of the NS1-2 protein also identified the presence of an NS1-2 dimer in Escherichia coli and transfected HEK293T cells. Inherent disorder provides significant advantages including structural flexibility and the ability to bind to numerous targets allowing a single protein to have multiple functions. These advantages combined with the potential functional advantages of multimerisation suggest a multi-functional role for the NS1-2 protein

CiteSeerX

Southampton (e-Prints Soton)

PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity

Author: C Burge
Cheng-Tsung Lu
DM Shien
E Huala
F Diella
F Gnad
FF Zhou
GE Crooks
H Steen
HD Huang
HD Huang
J Gao
J Gao
JC Obenauer
JL Heazlewood
JM Stone
KC Chou
LM Iakoucheva
M Schneider
M Steffen
MJ Hubbard
N Blom
N Blom
Neil Arvin Bretaña
P Diolez
PV Hornbeck
R Aebersold
S Luan
SC Huber
SR Eddy
TD Schneider
TY Lee
TY Lee
TY Lee
Tzong-Yi Lee
V Vacic
Y Xue
Y Xue
YH Wong
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding <it>in silico </it>prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites. Results Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species <it>Arabidopsis thaliana</it>. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using <it>Arabidopsis thaliana </it>phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms. Conclusions This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos <url>http://csb.cse.yzu.edu.tw/PlantPhos/</url>. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.</p

Public Library of Science (PLOS)

Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites

Author: A Catic
A Hershko
A Zanzoni
AL Chernorudskiy
B Boeckmann
C Chothia
C-J Lin
CN Pang
CT Su
CW Tung
CY Ou
D Xie
DM Shien
DT Jones
GE Crooks
GZ Zhang
HM Berman
Hsin-Yi Hung
J Peng
JL Fauchere
K Bryson
K Ron
L Hicke
LJ McGuffin
M Charton
P Radivojac
R Grantham
S Ahmad
SA Chen
SF Altschul
SF Altschul
Shu-An Chen
T Gilon
TA Tatusova
TD Schneider
TL Bailey
Tzong-Yi Lee
V Vacic
Vladimir Uversky
Y-Y Ou
Yu-Yen Ou
YY Ou
Z Hu
ZR Yang
Publication venue: Public Library of Science
Publication date: 09/03/2011
Field of study

Ubiquitin (Ub) is a small protein that consists of 76 amino acids about 8.5 kDa. In ubiquitin conjugation, the ubiquitin is majorly conjugated on the lysine residue of protein by Ub-ligating (E3) enzymes. Three major enzymes participate in ubiquitin conjugation. They are – E1, E2 and E3 which are responsible for activating, conjugating and ligating ubiquitin, respectively. Ubiquitin conjugation in eukaryotes is an important mechanism of the proteasome-mediated degradation of a protein and regulating the activity of transcription factors. Motivated by the importance of ubiquitin conjugation in biological processes, this investigation develops a method, UbSite, which uses utilizes an efficient radial basis function (RBF) network to identify protein ubiquitin conjugation (ubiquitylation) sites. This work not only investigates the amino acid composition but also the structural characteristics, physicochemical properties, and evolutionary information of amino acids around ubiquitylation (Ub) sites. With reference to the pathway of ubiquitin conjugation, the substrate sites for E3 recognition, which are distant from ubiquitylation sites, are investigated. The measurement of F-score in a large window size (−20∼+20) revealed a statistically significant amino acid composition and position-specific scoring matrix (evolutionary information), which are mainly located distant from Ub sites. The distant information can be used effectively to differentiate Ub sites from non-Ub sites. As determined by five-fold cross-validation, the model that was trained using the combination of amino acid composition and evolutionary information performs best in identifying ubiquitin conjugation sites. The prediction sensitivity, specificity, and accuracy are 65.5%, 74.8%, and 74.5%, respectively. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, the cross-validation result indicates that the integration of distant sequence features of Ub sites can improve predictive performance. Additionally, the independent test demonstrates that the proposed method can outperform other ubiquitylation prediction tools

Supervised multivariate analysis of sequence groups to identify specificity determining residues

Author: A Carro
A del Sol Mesa
AC Culhane
AC Culhane
AR Fersht
CD Livingstone
CL Tucker
D Charif
Desmond G Higgins
DG Higgins
DH Morgan
E Beitz
F Pazos
G Casari
G Zhang
H Yao
HM Wilks
Iain M Wallace
J Thioulouse
JC Gower
JD Thompson
JG Henikoff
KM Mayer
L Yuan
LA Mirny
M Clamp
N Saitou
O Lichtarge
OV Kalinina
OV Kalinina
RC Gentleman
RD Finn
RJ Edwards
S Dolédec
S Henikoff
SJ Hubbard
SS Hannenhalli
TD Schneider
V Vacic
W Pirovano
WR Atchley
X Gu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Proteins that evolve from a common ancestor can change functionality over time, and it is important to be able identify residues that cause this change. In this paper we show how a supervised multivariate statistical method, Between Group Analysis (BGA), can be used to identify these residues from families of proteins with different substrate specifities using multiple sequence alignments. Results We demonstrate the usefulness of this method on three different test cases. Two of these test cases, the Lactate/Malate dehydrogenase family and Nucleotidyl Cyclases, consist of two functional groups. The other family, Serine Proteases consists of three groups. BGA was used to analyse and visualise these three families using two different encoding schemes for the amino acids. Conclusion This overall combination of methods in this paper is powerful and flexible while being computationally very fast and simple. BGA is especially useful because it can be used to analyse any number of functional classes. In the examples we used in this paper, we have only used 2 or 3 classes for demonstration purposes but any number can be used and visualised.</p