29 research outputs found
Early Disease Detection Through Computational Pathology
This thesis presents computational pathology algorithms for enabling early cancer detection in Barrett’s Esophagus (BE) and early subtype diagnosis in Interstitial Lung Diseases (ILD). BE is a condition affecting 10% of heartburn sufferers, for which 0.1% of patients develop esophageal adenocarcinoma each year. For most of the 130-200 diseases included in the class of ILDs, a full recovery is expected, but for a few of these diseases, the survival rate is less than three years. For both disease classes, treatment of the malignant forms would be harmful in patients with other forms, thus diagnosis is necessary prior to beginning treatment, and early treatment is most effective in eradicating disease. Early diagnosis of both of these disease classes is complicated by a high degree of sharing of subtle disease phenotypes, leading to high pathologist disagreement rates. Computational pathology methods can aid early diagnosis of these diseases through unbiased, data-driven algorithms.
To detect precancerous changes in patients with BE, we develop an automated algorithm which identifies epithelial nuclei in biopsy samples on which nano-scale optical biomarkers, related to cancer risk, can be quantified. The automated nuclei detector produces a higher quality selection of epithelial nuclei than manual detection, resulting in enhanced characterization of precancerous phenotype perturbations. To stratify ILD patients, we develop a novel quantitative representation of pathohistology samples that models lung architecture based on computed image features and insights from pathologists, and establish its utility as part of a diagnostic classifier. Algorithms such as these applied in a clinical setting can save pathologists time by filtering out obvious cases and providing unbiased reasoning to assist diagnoses
Computational studies of Glucocerebrosidase in complex with its facilitator protein Saposin-C
Gaucher’s Disease (GD) is a rare recessive disorder produced by the dysfunction of the lysosomal enzyme Glucocerebrosidase (GCase). GCase catalyses the cleavage of the glycolipid Glucosylceramide. The lack of functional GCase leads to the accumulation of its lipid substrate in lysosomes causing GD. GD presents a great phenotypic variation, symptoms ranging from asymptomatic adults to early childhood death due to neurological damage. More than 250 mutations in the protein GCase have been discovered that result in GD. Being able to link structural modifications of each mutation to the phenotypic variation of GD would enhance the understanding of the disease. The aim of this work is to understand the structural dynamics of wild type and mutant GCase. A model of the complex of the enzyme GCase with its facilitator protein, Saposin-C (Sap-C) was generated using Protein-Protein docking (PPD). In this work, a knowledge-based docking protocol that considers experimental data of protein- protein binding has been carried out. Here, a reliable model of the enzyme GCase with its facilitator protein is presented and is consistent with the experimental data. To understand the structural mechanism of function of the enzyme GCase, it was imperative to study its structural dynamics and conformational changes influenced by its interaction with other components including lipid bilayer, facilitator protein or substrate. Coarse-Grained MD (CG-MD) was employed to study lipid self-assembly and membrane insertion of the complex. Classical Atomistic MD (AT-MD) was used to study the dynamics of the interactions between different components of the simulation. Furthermore, the results of ten different AT-MD simulations sampling 9 s have been analysed. An activation method of GCase by Sap-C has been proposed, the change in conformation of GCase when its facilitator protein is present has been highlighted, through the stabilization of the loops at the entrance of the binding site. The differences in protein-protein binding when GCase is mutated have also been emphasised. Finally, Anharmonic Conformational Analysis and Markov State Models have been used to build a kinetic model of the system. This model supports our activation mechanism hyphothesis
Transient Unfolding and Long-Range Interactions in Viral BCL2 M11 Enable Binding to the BECN1 BH3 Domain
Viral BCL2 proteins (vBCL2s) help to sustain chronic infection of host proteins to inhibit apoptosis and autophagy. However, details of conformational changes in vBCL2s that enable binding to BH3Ds remain unknown. Using all-atom, multiple microsecond-long molecular dynamic simulations (totaling 17 μs) of the murine γ-herpesvirus 68 vBCL2 (M11), and statistical inference techniques, we show that regions of M11 transiently unfold and refold upon binding of the BH3D. Further, we show that this partial unfolding/refolding within M11 is mediated by a network of hydrophobic interactions, which includes residues that are 10 Å away from the BH3D binding cleft. We experimentally validate the role of these hydrophobic interactions by quantifying the impact of mutating these residues on binding to the Beclin1/BECN1 BH3D, demonstrating that these mutations adversely affect both protein stability and binding. To our knowledge, this is the first study detailing the binding-associated conformational changes and presence of long-range interactions within vBCL2s
Anharmonic Conformational Analysis of Biomolecular Simulations
Anharmonicity in time-dependent conformational fluctuations is noted to be a key feature of functional dynamics of biomolecules. While anharmonic events are rare, long timescale ( and beyond) simulations facilitate probing of such events. However, automated analysis and visualization of anharmonic events from these long timescale simulations is proving to be a significant bottleneck. Traditional analysis tools for biomolecular simulations have focused on spatial second order statistics. Previous work involved resolving \emph{higher order spatial correlations} through quasi-anharmonic analysis (QAA). In this thesis, we extend this analysis to spatio-temporal domain in the form of anharmonic conformational analysis (ANCA).
We demonstrate ANCA on a publicly available millisecond long trajectory data of the protein Bovine pancreatic trypsin inhibitor (BPTI) using cartesian coordinates of the individual atoms selected for analysis. To overcome the limitation of finding a good reference structure through trajectory alignment, we propose ANCA in the dihedral space to make use of the internal angles derived from the backbone of a fluctuating biomolecule. We test this dihedral angle extension of ANCA on a microsecond long simulation of Drew-Dickerson Dodecamer B-DNA data. Our results indicate that ANCA provides a biophysically meaningful organizational framework for long timescale biomolecular simulations.
We have additionally built a scalable Python package for ANCA, namely pyANCA, with modules that can: (1) measure for anharmonicity in the form of higher order statistics and show its variation as a function of time, (2) output a story board representation of the simulations to identify key anharmonic conformational events, and (3) identify putative anharmonic conformational substates and visualize transitions between these substates. ANCA is available as an open-source Python package under the BSD 3-Clause license. Python tutorial notebooks, documentation and examples can be downloaded from http://csb.pitt.edu/anca
Spectral approaches for identifying kinetic features in molecular dynamics simulations of globular proteins
Proteins live in an environment of random thermal vibrations yet they convert this constant disorder into selective biological function. As data acquisition methods for resolving protein motions improve more of the randomness is also captured; there is thus a parallel need for analysis methods that filter out the disorder and clarify functionally-relevant protein behavior. Few behaviors are more relevant than folding in the first place, and this thesis opens by addressing which conformational states are kinetically relevant for promoting or inhibiting attainment of the folded native state. Our modeling approach discretizes simulation data into a network of nodes and edges representing, respectively, different protein conformations and observed conformational transitions. A perturbative strategy is then invoked to quantify the importance of each node, i.e. conformational substate, with regard to theoretical folding rates. On a test of 10 proteins this framework identifies unique ‘kinetic traps’ and ‘facilitator substates’ that sometimes evade detection with traditional RMSD-based analysis. We then apply spectral approaches and auto-regressive models to (1) address efficiency concerns for more general networks and (2) mimic protein flexibility with compact linear models
Protein Hydrogen Exchange, Dynamics, and Function
Models derived from X-ray crystallography can give the impression that proteins
are rigid structures with little mobility. NMR ensembles may suggest a more dynamic
picture, but even these represent a rather narrow range of possibilities close to the lowest
energy state. In reality proteins participate in a wide range of dynamics from the subtle
and rapid sidechain dynamics that occur in nanoseconds in the PDZ signaling domain to
the large and slow rearrangement of secondary structure that takes days in the mitotic
checkpoint protein Mad2. Between these extremes are motions on time scales typically
associated with protein function, such as those in SNase monitored by hydrogen
exchange. The dynamic character of several protein systems, including PDZ domain,
Calmodulin, SNase, and Mad2, were explored using a variety of biophysical techniques.
This broad investigation demonstrates the dynamic variability between and within
proteins. The study of PDZ and Calmodulin illustrates how a computational technique
can recapitulate experimental results and provide additional insight into signal
transduction. The case of SNase shows that HX NMR data can be exploited to reveal
protein dynamics with unprecedented detail. The Mad2 system highlighted some of the
pitfalls associated with this technique and some alternative strategies for investigating
protein dynamics
Molecular dynamics study of the allosteric control mechanisms of the glycolytic pathway
There is a growing body of interest to understand the regulation of allosteric proteins.
Allostery is a phenomenon of protein regulation whereby binding of an effector molecule at a
remote site affects binding and activity at the protein‟s active site. Over the years, these sites
have become popular drug targets as they provide advantages in terms of selectivity and
saturability. Both experimental and computational methods are being used to study and
identify allosteric sites. Although experimental methods provide us with detailed structures
and have been relatively successful in identifying these sites, they are subject to time and cost
limitations.
In the present dissertation, Molecular Dynamics Simulations (MDS) and Principal
Component Analysis (PCA) have been employed to enhance our understanding ofallostery
and protein dynamics. MD simulations generated trajectories which were then qualitatively
assessed using PCA. Both of these techniques were applied to two important trypanosomatid
drug targets and controlling enzymes of the glycolytic pathway - pyruvate kinase (PYK) and
phosphofructokinase (PFK).
Molecular Dynamics simulations were first carried out on both the effector bound and
unbound forms of the proteins. This provided a framework for direct comparison and
inspection of the conformational changes at the atomic level. Following MD simulations,
PCA was run to further analyse the motions. The principal components thus captured are in
quantitative agreement with the previously published experimental data which increased our
confidence in the reliability of our simulations. Also, the binding of FBP affects the allosteric
mechanism of PYK in a very interesting way. The inspection of the vibrational modes reveals
interesting patterns in the movement of the subunits which differ from the conventional
symmetrical pattern. Also, lowering of B-factors on effector binding provides evidence that
the effector is not only locking the R-state but is also acting as a general heat-sink to cool
down the whole tetramer. This observation suggests that protein rigidity and intrinsic heat
capacity are important factors in stabilizing allosteric proteins. Thus, this work also provides
new and promising insights into the classical Monod-Wyman-Changeux model of allostery
Interaction of ZIKV NS5 and STAT2 examined by molecular modeling, docking and simulations studies
The ZIKA virus (ZIKV) codes for the NS5 protein, which is known as a potent anatagonist specific for interphase signaling (INF). ZIKV NS5 has been associated with proteasomal degradation of the signal transducer and activator of transcription 2 (STAT2), although the complete mechanism is still unknown as experimental studies suggest that domains other than Mtase contribute to degradation.El virus ZIKA (ZIKV) codifica para la proteína NS5, la cual se conoce como un anatagonísta potente y específico de la señalización de interfetón (INF). ZIKV NS5, se ha asociado con la degradación proteosomal del transductor de señal y el activador de la transcripción 2 (STAT2), aunque el mecanismo completo aún se desconoce ya que los estudios experimentales sugieren que dominios diferentes a Mtase contribuyen a la degradacion