1,114 research outputs found

    CLP-based protein fragment assembly

    Full text link
    The paper investigates a novel approach, based on Constraint Logic Programming (CLP), to predict the 3D conformation of a protein via fragments assembly. The fragments are extracted by a preprocessor-also developed for this work- from a database of known protein structures that clusters and classifies the fragments according to similarity and frequency. The problem of assembling fragments into a complete conformation is mapped to a constraint solving problem and solved using CLP. The constraint-based model uses a medium discretization degree Ca-side chain centroid protein model that offers efficiency and a good approximation for space filling. The approach adapts existing energy models to the protein representation used and applies a large neighboring search strategy. The results shows the feasibility and efficiency of the method. The declarative nature of the solution allows to include future extensions, e.g., different size fragments for better accuracy.Comment: special issue dedicated to ICLP 201

    Information recovery in the biological sciences : protein structure determination by constraint satisfaction, simulation and automated image processing

    Get PDF
    Regardless of the field of study or particular problem, any experimental science always poses the same question: ÒWhat object or phenomena generated the data that we see, given what is known?Ó In the field of 2D electron crystallography, data is collected from a series of two-dimensional images, formed either as a result of diffraction mode imaging or TEM mode real imaging. The resulting dataset is acquired strictly in the Fourier domain as either coupled Amplitudes and Phases (as in TEM mode) or Amplitudes alone (in diffraction mode). In either case, data is received from the microscope in a series of CCD or scanned negatives of images which generally require a significant amount of pre-processing in order to be useful. Traditionally, processing of the large volume of data collected from the microscope was the time limiting factor in protein structure determination by electron microscopy. Data must be initially collected from the microscope either on film-negatives, which in turn must be developed and scanned, or from CCDs of sizes typically no larger than 2096x2096 (though larger models are in operation). In either case, data are finally ready for processing as 8-bit, 16-bit or (in principle) 32-bit grey-scale images. Regardless of data source, the foundation of all crystallographic methods is the presence of a regular Fourier lattice. Two dimensional cryo-electron microscopy of proteins introduces special challenges as multiple crystals may be present in the same image, producing in some cases several independent lattices. Additionally, scanned negatives typically have a rectangular region marking the film number and other details of image acquisition that must be removed prior to processing. If the edges of the images are not down-tapered, vertical and horizontal ÒstreaksÓ will be present in the Fourier transform of the image --arising from the high-resolution discontinuities between the opposite edges of the image. These streaks can overlap with lattice points which fall close to the vertical and horizontal axes and disrupt both the information they contain and the ability to detect them. Lastly, SpotScanning (Downing, 1991) is a commonly used process where-by circular discs are individually scanned in an image. The large-scale regularity of the scanning patter produces a low frequency lattice which can interfere and overlap with any protein crystal lattices. We introduce a series of methods packaged into 2dx (Gipson, et al., 2007) which simultaneously addresses these problems, automatically detecting accurate crystal lattice parameters for a majority of images. Further a template is described for the automation of all subsequent image processing steps on the road to a fully processed dataset. The broader picture of image processing is one of reproducibility. The lattice parameters, for instance, are only one of hundreds of parameters which must be determined or provided and subsequently stored and accessed in a regular way during image processing. Numerous steps, from correct CTF and tilt-geometry determination to the final stages of symmetrization and optimal image recovery must be performed sequentially and repeatedly for hundreds of images. The goal in such a project is then to automatically process as significant a portion of the data as possible and to reduce unnecessary, repetitive data entry by the user. Here also, 2dx (Gipson, et al., 2007), the image processing package designed to automatically process individual 2D TEM images is introduced. This package focuses on reliability, ease of use and automation to produce finished results necessary for full three-dimensional reconstruction of the protein in question. Once individual 2D images have been processed, they contribute to a larger project-wide 3-dimensional dataset. Several challenges exist in processing this dataset, besides simply the organization of results and project-wide parameters. In particular, though tilt-geometry, relative amplitude scaling and absolute orientation are in principle known (or obtainable from an individual image) errors, uncertainties and heterogeneous data-types provide for a 3D-dataset with many parameters to be optimized. 2dx_merge (Gipson, et al., 2007) is the follow-up to the first release of 2dx which had originally processed only individual images. Based on the guiding principles of the earlier release, 2dx_merge focuses on ease of use and automation. The result is a fully qualified 3D structure determination package capable of turning hundreds of electron micrograph images, nearly completely automatically, into a full 3D structure. Most of the processing performed in the 2dx package is based on the excellent suite of programs termed collectively as the MRC package (Crowther, et al., 1996). Extensions to this suite and alternative algorithms continue to play an essential role in image processing as computers become faster and as advancements are made in the mathematics of signal processing. In this capacity, an alternative procedure to generate a 3D structure from processed 2D images is presented. This algorithm, entitled ÒProjective Constraint OptimizationÓ (PCO), leverages prior known information, such as symmetry and the fact that the protein is bound in a membrane, to extend the normal boundaries of resolution. In particular, traditional methods (Agard, 1983) make no attempt to account for the Òmissing coneÓ a vast, un-sampled, region in 3D Fourier space arising from specimen tilt limitations in the microscope. Provided sufficient data, PCO simultaneously refines the dataset, accounting for error, as well as attempting to fill this missing cone. Though PCO provides a near-optimal 3D reconstruction based on data, depending on initial data quality and amount of prior knowledge, there may be a host of solutions, and more importantly pseudo-solutions, which are more-or-less consistent with the provided dataset. Trying to find a global best-fit for known information and data can be a daunting challenge mathematically, to this end the use of meta-heuristics is addressed. Specifically, in the case of many pseudo-solutions, so long as a suitably defined error metric can be found, quasi-evolutionary swarm algorithms can be used that search solution space, sharing data as they go. Given sufficient computational power, such algorithms can dramatically reduce the search time for global optimums for a given dataset. Once the structure of a protein has been determined, many questions often remain about its function. Questions about the dynamics of a protein, for instance, are not often readily interpretable from structure alone. To this end an investigation into computationally optimized structural dynamics is described. Here, in order to find the most likely path a protein might take through Òconformation spaceÓ between two conformations, a graphics processing unit (GPU) optimized program and set of libraries is written to speed of the calculation of this process 30x. The tools and methods developed here serve as a conceptual template as to how GPU coding was applied to other aspects of the work presented here as well as GPU programming generally. The final portion of the thesis takes an apparent step in reverse, presenting a dramatic, yet highly predictive, simplification of a complex biological process. Kinetic Monte Carlo simulations idealize thousands of proteins as interacting agents by a set of simple rules (i.e. react/dissociate), offering highly-accurate insights into the large-scale cooperative behavior of proteins. This work demonstrates that, for many applications, structure, dynamics or even general knowledge of a protein may not be necessary for a meaningful biological story to emerge. Additionally, even in cases where structure and function is known, such simulations can help to answer the biological question in its entirety from structure, to dynamics, to ultimate function

    Doctor of Philosophy

    Get PDF
    dissertationCrystal structure prediction is an important field of study, both for the development of new compounds and materials, and for the advancement of understanding crystallization processes. The Modified Genetic Algorithm for Crystal Structure Prediction, MGAC, is a software package for structure prediction that has had varying success in predicting the structures of many molecules. However, several advancements in the field of structure prediction have prompted a revision to the software, both from a scientific and technical standpoint. In this dissertation, the evaluation of a new method for energy calculation and structural optimization, dispersion corrected density functional theory, is presented, along with practical parameterizations for using density functional theory in crystal structure prediction. Next, a preliminary implementation of MGAC using density functional theory is outlined, including some key changes to the construction of unit cells, along with successful prediction results for the molecules glycine and histamine. Finally, a new implementation of MGAC is proposed to handle multiple space group prediction effectively, with accompanying preliminary prediction results for histamine using the new implementation of MGAC, called MGAC2

    A Constraint Solver for Flexible Protein Models

    Get PDF
    This paper proposes the formalization and implementation of a novel class of constraints aimed at modeling problems related to placement of multi-body systems in the 3-dimensional space. Each multi-body is a system composed of body elements, connected by joint relationships and constrained by geometric properties. The emphasis of this investigation is the use of multi-body systems to model native conformations of protein structures---where each body represents an entity of the protein (e.g., an amino acid, a small peptide) and the geometric constraints are related to the spatial properties of the composing atoms. The paper explores the use of the proposed class of constraints to support a variety of different structural analysis of proteins, such as loop modeling and structure prediction. The declarative nature of a constraint-based encoding provides elaboration tolerance and the ability to make use of any additional knowledge in the analysis studies. The filtering capabilities of the proposed constraints also allow to control the number of representative solutions that are withdrawn from the conformational space of the protein, by means of criteria driven by uniform distribution sampling principles. In this scenario it is possible to select the desired degree of precision and/or number of solutions. The filtering component automatically excludes configurations that violate the spatial and geometric properties of the composing multi-body system. The paper illustrates the implementation of a constraint solver based on the multi-body perspective and its empirical evaluation on protein structure analysis problems

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Full text link
    Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Bayesian signal reconstruction, Markov random fields, and x-ray crystallography

    Get PDF
    Cover title.Includes bibliographical references (p. 35-37).Research supported by the Army Research Office. DAAL03-86-K-0171 Research supported by the Air Force Office of Scientific Research. AFOSR-89-0276 Research supported by the National Science Foundation. ECS-8700903Peter C. Doerschuk

    Nature’s Optics and Our Understanding of Light

    Get PDF
    Optical phenomena visible to everyone abundantly illustrate important ideas in science and mathematics. The phenomena considered include rainbows, sparkling reflections on water, green flashes, earthlight on the moon, glories, daylight, crystals, and the squint moon. The concepts include refraction, wave interference, numerical experiments, asymptotics, Regge poles, polarisation singularities, conical intersections, and visual illusions

    Machine Learning on Neutron and X-Ray Scattering

    Get PDF
    Neutron and X-ray scattering represent two state-of-the-art materials characterization techniques that measure materials' structural and dynamical properties with high precision. These techniques play critical roles in understanding a wide variety of materials systems, from catalysis to polymers, nanomaterials to macromolecules, and energy materials to quantum materials. In recent years, neutron and X-ray scattering have received a significant boost due to the development and increased application of machine learning to materials problems. This article reviews the recent progress in applying machine learning techniques to augment various neutron and X-ray scattering techniques. We highlight the integration of machine learning methods into the typical workflow of scattering experiments. We focus on scattering problems that faced challenge with traditional methods but addressable using machine learning, such as leveraging the knowledge of simple materials to model more complicated systems, learning with limited data or incomplete labels, identifying meaningful spectra and materials' representations for learning tasks, mitigating spectral noise, and many others. We present an outlook on a few emerging roles machine learning may play in broad types of scattering and spectroscopic problems in the foreseeable future.Comment: 56 pages, 12 figures. Feedback most welcom
    corecore