60 research outputs found

    Directive 02-14: Tax Obligations of Persons Purchasing Cigarettes in Interstate Commerce for which the Massachusetts Cigarette Excise Has Not Been Paid

    Get PDF
    The development of accurate clinical biomarkers has been challenging in part due to the diversity between patients and diseases. One approach to account for the diversity is to use multiple markers to classify patients, based on the concept that each individual marker contributes information from its respective subclass of patients. Here we present a new strategy for developing biomarker panels that accounts for completely distinct patient subclasses. Marker State Space (MSS) defines "marker states" based on all possible patterns of high and low values among a panel of markers. Each marker state is defined as either a case state or a control state, and a sample is classified as case or control based on the state it occupies. MSS was used to define multi-marker panels that were robust in cross validation and training-set/test-set analyses and that yielded similar classification accuracy to several other classification algorithms. A three-marker panel for discriminating pancreatic cancer patients from control subjects revealed subclasses of patients based on distinct marker states. MSS provides a straightforward approach for modeling highly divergent subclasses of patients, which may be adaptable for diverse applications.</p

    Towards Inference and Learning in Dynamic Bayesian Networks using Generalized Evidence

    No full text
    This report introduces a novel approach to performing inference and learning in Dynamic Bayesian Networks (DBN). The traditional approach to inference and learning in DBNs involves conditioning on one or more finite-length observation sequences. In this report, we consider conditioning on what we will call generalized evidence, which consists of a possibly infinite set of behaviors compactly encoded in the form of a formula, Φ , in temporal logic. We then introduce exact algorithms for solving inference problems (i.e., computing Ρ(Χ│Φ)) and learning problems (i.e., computing Ρ(Θ|Φ)) using techniques from the field of Model Checking. The advantage of our approach is that it enables scientists to pose and solve inference and learning problems that cannot be expressed using traditional approaches. The contributions of this report include: (1) the introduction of the inference and learning problems over generalized evidence, (2) exact algorithms for solving these problems for a restricted class of DBNs, and (3) a series of case studies demonstrating the scalability of our approach. We conclude by discussing directions for future research.</p

    Generalized Queries and Bayesian Statistical Model Checking in Dynamic Bayesian Networks: Application to Personalized Medicine

    No full text
    We introduce the concept of generalized probabilistic queries in Dynamic Bayesian Networks (DBN) - computing P(φ 1 |φ 2 ), where φ i is a formula in temporal logic encoding an equivalence class of trajectories through the variables of the model. Generalized queries include as special cases traditional query types for DBNs (i.e., filtering, smoothing, prediction, and classification), but can also be used to express inference problems that are either impossible, or impractical to answer using traditional algorithms for inference in DBNs. We then discuss the relationship between answering generalized queries and the Probabilistic Model Checking Problem and introduce two novel algorithms for efficiently estimating (φ 1 |φ 2 ) in a Bayesian fashion. Finally, we demonstrate our method by answering generalized queries that arise in the context of critical care medicine. Specifically, we show that our approach can be used to make treatment decisions for a cohort of 1,000 simulated sepsis patients, and that it outperforms Support Vector Machines, Neural Networks, and Random Forests on the same task

    Generative models of conformational dynamics.

    No full text
    <p>Atomistic simulations of the conformational dynamics of proteins can be performed using either Molecular Dynamics or Monte Carlo procedures. The ensembles of three-dimensional structures produced during simulation can be analyzed in a number of ways to elucidate the thermodynamic and kinetic properties of the system. The goal of this chapter is to review both traditional and emerging methods for learning generative models from atomistic simulation data. Here, the term 'generative' refers to a model of the joint probability distribution over the behaviors of the constituent atoms. In the context of molecular modeling, generative models reveal the correlation structure between the atoms, and may be used to predict how the system will respond to structural perturbations. We begin by discussing traditional methods, which produce multivariate Gaussian models. We then discuss GAMELAN (GRAPHICAL MODELS OF ENERGY LANDSCAPES), which produces generative models of complex, non-Gaussian conformational dynamics (e.g., allostery, binding, folding, etc.) from long timescale simulation data.</p

    Dynamic Invariants in Protein Folding Pathways Revealed by Tensor Analysis

    No full text
    Recent advances in molecular dynamics simulation technologies (e.g., Folding@Home, NAMD, Desmond/Anton) have, for the first time, enabled scientists to perform all-atom simulations over timescales relevant to protein folding. Unfortunately, the concomitant increase in the size of the resulting data sets presents a barrier to understanding the molecular basis of folding. In particular, long simulations make it harder to identify and characterize important microstates, and the collective conformational dynamics that influence and enable the transitions between them. We address these problems by introducing a novel tensor-based method for performing a spatio-temporal analysis of protein folding pathways. We applied our method to folding simulations of the villin head-piece generated by the Pande group using Folding@Home. Using our method, we were able to identify three regions in this protein that exhibit similar collective behaviors across multiple simulations. We were also able to identify cross-over points in these simulations leading to different conformational subspaces. Our results indicate that these three regions may act as folding units, and that the observed collective motions may represent important dynamic invariants in the folding process. Thus, our spatio-temporal analysis method shows promise as a means for obtaining novel insights into protein folding pathways

    Classifying Protein Structural Dynamics via Residual Dipolar Couplings

    No full text
    Recent advances in Nuclear Magnetic Resonance (NMR) spectroscopy present new opportunities for investigating the conformational dynamics of proteins in solution. In particular, tensors for motions relevant to biological function can be obtained via experimental measurement of residual dipolar couplings (RDCs) between nuclei. These motion tensors have been used by others to characterize the magnitude and anisotropy of the dynamics of individual bond vectors. Here, we extend these results and demonstrate that RDCs can also be used to characterize the global nature of the protein’s motion (e.g., hinge motions, shear motions, etc.). In particular, we introduce the first method for classifying protein motions from RDC data. Our classifier consists of a discriminative model trained on 2,454 different molecular dynamics trajectories spanning seven categories of motion. The classifier achieves precision and recall accuracy of 90.6% and 90.9%, respectively, using 10-fold cross-validation over these seven categories

    Detecting Protein-Protein Interaction Decoys using Fast Free Energy Calculations

    No full text
    We present a physics-based method for identifying native configurations of protein-protein interactions amongst a set of nearly native decoys (< 2.0 Å Cα RMSD to the native structure) using a fast new method for performing free energy calculations. The method uses Markov Random Fields to encode the Boltzmann distribution for a given complex, and Generalized Belief Propagation to perform the free energy calculation. Our method is fast, running in a few minutes on a single-processor workstation, making it an attractive alternative to free-energy calculations based molecular dynamics and Monte Carlo simulations, which can require hours or days on multiprocessor machines. The method is also accurate; in an experiment involving 9 targets with an average of 8 nearly native decoys, our method ranks the native structure number one 67% of the time, and in the top three for the remaining cases

    Structure based chemical shift prediction using Random Forests non-linear regression

    No full text
    Protein nuclear magnetic resonance (NMR) chemical shifts are among the most accurately measurable spectroscopic parameters and are closely correlated to protein structure because of their dependence on the local electronic environment. The precise nature of this correlation remains largely unknown. Accurate prediction of chemical shifts from existing structures’ atomic co-ordinates will permit close study of this relationship. This paper presents a novel non-linear regression based approach to chemical shift prediction from protein structure. The regression model employed combines quantum, classical and empirical variables and provides statistically significant improved prediction accuracy over existing chemical shift predictors, across protein backbone atom types. The results presented here were obtained using the Random Forest regression algorithm on a protein entry data set derived from the RefDB re-referenced chemical shift database

    A Bayesian Approach to Protein Model Quality Assessment

    No full text
    Given multiple possible models b1; b2; : : : bn for a protein structure, a common sub-task in in-silico Protein Structure Prediction is ranking these models according to their qual- ity. Extant approaches use MLE estimates of parameters ri to obtain point estimates of the Model Quality. We describe a Bayesian alternative to assessing the quality of these models that builds an MRF over the parame- ters of each model and performs approximate inference to integrate over them. Hyper- parameters w are learnt by optimizing a list- wise loss function over training data. Our results indicate that our Bayesian approach can significantly outperform MLE estimates and that optimizing the hyper-parameters can further improve results

    Using Bit Vector Decision Procedures for Analysis of Protein Folding Pathways

    No full text
    We explore the use of bit-vector decision procedures for the analysis of protein folding pathways. We argue that the protein folding problem is not identical to the classical probabilistic model checking problem in verification. Motivated by the different nature of the protein folding problem, we present a translation of the protein folding pathways analysis problem into a bounded model checking framework with bit vector decision procedures.We also present initial results of our experiments using the UCLID bit-vector decision procedure
    • …
    corecore