Search CORE

1,114 research outputs found

CLP-based protein fragment assembly

Author: AGOSTINO DOVIER
ALESSANDRO DAL PALÙ
Dal Palù
Dotu
ENRICO PONTELLI
FEDERICO FOGOLARI
Levinthal
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2010
Field of study

The paper investigates a novel approach, based on Constraint Logic Programming (CLP), to predict the 3D conformation of a protein via fragments assembly. The fragments are extracted by a preprocessor-also developed for this work- from a database of known protein structures that clusters and classifies the fragments according to similarity and frequency. The problem of assembling fragments into a complete conformation is mapped to a constraint solving problem and solved using CLP. The constraint-based model uses a medium discretization degree Ca-side chain centroid protein model that offers efficiency and a good approximation for space filling. The approach adapts existing energy models to the protein representation used and applies a large neighboring search strategy. The results shows the feasibility and efficiency of the method. The declarative nature of the solution allows to include future extensions, e.g., different size fragments for better accuracy.Comment: special issue dedicated to ICLP 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Archivio istituzionale della ricerca - Università degli Studi di Udine

Information recovery in the biological sciences : protein structure determination by constraint satisfaction, simulation and automated image processing

Author: Gipson Bryant
Publication venue
Publication date: 01/01/2010
Field of study

Regardless of the field of study or particular problem, any experimental science always poses the same question: ÒWhat object or phenomena generated the data that we see, given what is known?Ó In the field of 2D electron crystallography, data is collected from a series of two-dimensional images, formed either as a result of diffraction mode imaging or TEM mode real imaging. The resulting dataset is acquired strictly in the Fourier domain as either coupled Amplitudes and Phases (as in TEM mode) or Amplitudes alone (in diffraction mode). In either case, data is received from the microscope in a series of CCD or scanned negatives of images which generally require a significant amount of pre-processing in order to be useful. Traditionally, processing of the large volume of data collected from the microscope was the time limiting factor in protein structure determination by electron microscopy. Data must be initially collected from the microscope either on film-negatives, which in turn must be developed and scanned, or from CCDs of sizes typically no larger than 2096x2096 (though larger models are in operation). In either case, data are finally ready for processing as 8-bit, 16-bit or (in principle) 32-bit grey-scale images. Regardless of data source, the foundation of all crystallographic methods is the presence of a regular Fourier lattice. Two dimensional cryo-electron microscopy of proteins introduces special challenges as multiple crystals may be present in the same image, producing in some cases several independent lattices. Additionally, scanned negatives typically have a rectangular region marking the film number and other details of image acquisition that must be removed prior to processing. If the edges of the images are not down-tapered, vertical and horizontal ÒstreaksÓ will be present in the Fourier transform of the image --arising from the high-resolution discontinuities between the opposite edges of the image. These streaks can overlap with lattice points which fall close to the vertical and horizontal axes and disrupt both the information they contain and the ability to detect them. Lastly, SpotScanning (Downing, 1991) is a commonly used process where-by circular discs are individually scanned in an image. The large-scale regularity of the scanning patter produces a low frequency lattice which can interfere and overlap with any protein crystal lattices. We introduce a series of methods packaged into 2dx (Gipson, et al., 2007) which simultaneously addresses these problems, automatically detecting accurate crystal lattice parameters for a majority of images. Further a template is described for the automation of all subsequent image processing steps on the road to a fully processed dataset. The broader picture of image processing is one of reproducibility. The lattice parameters, for instance, are only one of hundreds of parameters which must be determined or provided and subsequently stored and accessed in a regular way during image processing. Numerous steps, from correct CTF and tilt-geometry determination to the final stages of symmetrization and optimal image recovery must be performed sequentially and repeatedly for hundreds of images. The goal in such a project is then to automatically process as significant a portion of the data as possible and to reduce unnecessary, repetitive data entry by the user. Here also, 2dx (Gipson, et al., 2007), the image processing package designed to automatically process individual 2D TEM images is introduced. This package focuses on reliability, ease of use and automation to produce finished results necessary for full three-dimensional reconstruction of the protein in question. Once individual 2D images have been processed, they contribute to a larger project-wide 3-dimensional dataset. Several challenges exist in processing this dataset, besides simply the organization of results and project-wide parameters. In particular, though tilt-geometry, relative amplitude scaling and absolute orientation are in principle known (or obtainable from an individual image) errors, uncertainties and heterogeneous data-types provide for a 3D-dataset with many parameters to be optimized. 2dx_merge (Gipson, et al., 2007) is the follow-up to the first release of 2dx which had originally processed only individual images. Based on the guiding principles of the earlier release, 2dx_merge focuses on ease of use and automation. The result is a fully qualified 3D structure determination package capable of turning hundreds of electron micrograph images, nearly completely automatically, into a full 3D structure. Most of the processing performed in the 2dx package is based on the excellent suite of programs termed collectively as the MRC package (Crowther, et al., 1996). Extensions to this suite and alternative algorithms continue to play an essential role in image processing as computers become faster and as advancements are made in the mathematics of signal processing. In this capacity, an alternative procedure to generate a 3D structure from processed 2D images is presented. This algorithm, entitled ÒProjective Constraint OptimizationÓ (PCO), leverages prior known information, such as symmetry and the fact that the protein is bound in a membrane, to extend the normal boundaries of resolution. In particular, traditional methods (Agard, 1983) make no attempt to account for the Òmissing coneÓ a vast, un-sampled, region in 3D Fourier space arising from specimen tilt limitations in the microscope. Provided sufficient data, PCO simultaneously refines the dataset, accounting for error, as well as attempting to fill this missing cone. Though PCO provides a near-optimal 3D reconstruction based on data, depending on initial data quality and amount of prior knowledge, there may be a host of solutions, and more importantly pseudo-solutions, which are more-or-less consistent with the provided dataset. Trying to find a global best-fit for known information and data can be a daunting challenge mathematically, to this end the use of meta-heuristics is addressed. Specifically, in the case of many pseudo-solutions, so long as a suitably defined error metric can be found, quasi-evolutionary swarm algorithms can be used that search solution space, sharing data as they go. Given sufficient computational power, such algorithms can dramatically reduce the search time for global optimums for a given dataset. Once the structure of a protein has been determined, many questions often remain about its function. Questions about the dynamics of a protein, for instance, are not often readily interpretable from structure alone. To this end an investigation into computationally optimized structural dynamics is described. Here, in order to find the most likely path a protein might take through Òconformation spaceÓ between two conformations, a graphics processing unit (GPU) optimized program and set of libraries is written to speed of the calculation of this process 30x. The tools and methods developed here serve as a conceptual template as to how GPU coding was applied to other aspects of the work presented here as well as GPU programming generally. The final portion of the thesis takes an apparent step in reverse, presenting a dramatic, yet highly predictive, simplification of a complex biological process. Kinetic Monte Carlo simulations idealize thousands of proteins as interacting agents by a set of simple rules (i.e. react/dissociate), offering highly-accurate insights into the large-scale cooperative behavior of proteins. This work demonstrates that, for many applications, structure, dynamics or even general knowledge of a protein may not be necessary for a meaningful biological story to emerge. Additionally, even in cases where structure and function is known, such simulations can help to answer the biological question in its entirety from structure, to dynamics, to ultimate function

edoc

Doctor of Philosophy

Author: Lund Albert Merrill
Publication venue: University of Utah
Publication date: 01/01/2016
Field of study

dissertationCrystal structure prediction is an important field of study, both for the development of new compounds and materials, and for the advancement of understanding crystallization processes. The Modified Genetic Algorithm for Crystal Structure Prediction, MGAC, is a software package for structure prediction that has had varying success in predicting the structures of many molecules. However, several advancements in the field of structure prediction have prompted a revision to the software, both from a scientific and technical standpoint. In this dissertation, the evaluation of a new method for energy calculation and structural optimization, dispersion corrected density functional theory, is presented, along with practical parameterizations for using density functional theory in crystal structure prediction. Next, a preliminary implementation of MGAC using density functional theory is outlined, including some key changes to the construction of unit cells, along with successful prediction results for the molecules glycine and histamine. Finally, a new implementation of MGAC is proposed to handle multiple space group prediction effectively, with accompanying preliminary prediction results for histamine using the new implementation of MGAC, called MGAC2

The University of Utah: J. Willard Marriott Digital Library

A Constraint Solver for Flexible Protein Models

Author: Campeotto Federico
Dal Pal\uf9 A.
Dovier Agostino
Fioretto Ferdinando
Pontelli E.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2013
Field of study

This paper proposes the formalization and implementation of a novel class of constraints aimed at modeling problems related to placement of multi-body systems in the 3-dimensional space. Each multi-body is a system composed of body elements, connected by joint relationships and constrained by geometric properties. The emphasis of this investigation is the use of multi-body systems to model native conformations of protein structures---where each body represents an entity of the protein (e.g., an amino acid, a small peptide) and the geometric constraints are related to the spatial properties of the composing atoms. The paper explores the use of the proposed class of constraints to support a variety of different structural analysis of proteins, such as loop modeling and structure prediction. The declarative nature of a constraint-based encoding provides elaboration tolerance and the ability to make use of any additional knowledge in the analysis studies. The filtering capabilities of the proposed constraints also allow to control the number of representative solutions that are withdrawn from the conformational space of the protein, by means of criteria driven by uniform distribution sampling principles. In this scenario it is possible to select the desired degree of precision and/or number of solutions. The filtering component automatically excludes configurations that violate the spatial and geometric properties of the composing multi-body system. The paper illustrates the implementation of a constraint solver based on the multi-body perspective and its empirical evaluation on protein structure analysis problems

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Archivio istituzionale della ricerca - Università degli Studi di Udine

Open Access Repository

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Author: Adams Keir
Anandkumar Anima
Aspuru-Guzik Alán
Azizzadenesheli Kamyar
Barzilay Regina
Bekkers Erik
Bohde Montgomery
Bronstein Michael
Coley Connor W.
Daigavane Ameya
Du Yuanqi
Edwards Carl
Ermon Stefano
Fang Ada
Fu Cong
Fu Tianfan
Fu Xiang
Gao Nicholas
Gui Shurui
Günnemann Stephan
Helwig Jacob
Hofgard Elyssa F.
Huang Qian
Jaakkola Tommi
Ji Heng
Ji Shuiwang
Joshi Chaitanya K.
Kurtin Jerry
Ladera Adriana
Lawrence Hannah
Leskovec Jure
Li Xiner
Lin Yuchao
Ling Hongyi
Liu Meng
Liu Yi
Liò Pietro
Luo Youzhi
Mathis Simon V.
Phung Tuong
Qian Xiaofeng
Qian Xiaoning
Saxton Alexandra
Smidt Tess
Strasser Alex
Stärk Hannes
Sun Jimeng
Tehrani Aria Mansouri
Wang Limei
Wang Rui
Wang Yucheng
Weiler Maurice
Wu Tailin
Xie Yaochen
Xie YuQing
Xu Minkai
Xu Shenglong
Xu Zhao
Yan Keqiang
Yu Haiyang
Yu Rose
Zhang Xuan
Zitnik Marinka
Publication venue
Publication date: 15/11/2023
Field of study

Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science

arXiv.org e-Print Archive

Recommended from our members

Predicting multibody assembly of proteins

Author: Rasheed Md. Muhibur
Publication venue
Publication date: 25/09/2014
Field of study

textThis thesis addresses the multi-body assembly (MBA) problem in the context of protein assemblies. [...] In this thesis, we chose the protein assembly domain because accurate and reliable computational modeling, simulation and prediction of such assemblies would clearly accelerate discoveries in understanding of the complexities of metabolic pathways, identifying the molecular basis for normal health and diseases, and in the designing of new drugs and other therapeutics. [...] [We developed] F²Dock (Fast Fourier Docking) which includes a multi-term function which includes both a statistical thermodynamic approximation of molecular free energy as well as several of knowledge-based terms. Parameters of the scoring model were learned based on a large set of positive/negative examples, and when tested on 176 protein complexes of various types, showed excellent accuracy in ranking correct configurations higher (F² Dock ranks the correcti solution as the top ranked one in 22/176 cases, which is better than other unsupervised prediction software on the same benchmark). Most of the protein-protein interaction scoring terms can be expressed as integrals over the occupied volume, boundary, or a set of discrete points (atom locations), of distance dependent decaying kernels. We developed a dynamic adaptive grid (DAG) data structure which computes smooth surface and volumetric representations of a protein complex in O(m log m) time, where m is the number of atoms assuming that the smallest feature size h is [theta](r[subscript max]) where r[subscript max] is the radius of the largest atom; updates in O(log m) time; and uses O(m)memory. We also developed the dynamic packing grids (DPG) data structure which supports quasi-constant time updates (O(log w)) and spherical neighborhood queries (O(log log w)), where w is the word-size in the RAM. DPG and DAG together results in O(k) time approximation of scoring terms where k << m is the size of the contact region between proteins. [...] [W]e consider the symmetric spherical shell assembly case, where multiple copies of identical proteins tile the surface of a sphere. Though this is a restricted subclass of MBA, it is an important one since it would accelerate development of drugs and antibodies to prevent viruses from forming capsids, which have such spherical symmetry in nature. We proved that it is possible to characterize the space of possible symmetric spherical layouts using a small number of representative local arrangements (called tiles), and their global configurations (tiling). We further show that the tilings, and the mapping of proteins to tilings on arbitrary sized shells is parameterized by 3 discrete parameters and 6 continuous degrees of freedom; and the 3 discrete DOF can be restricted to a constant number of cases if the size of the shell is known (in terms of the number of protein n). We also consider the case where a coarse model of the whole complex of proteins are available. We show that even when such coarse models do not show atomic positions, they can be sufficient to identify a general location for each protein and its neighbors, and thereby restricts the configurational space. We developed an iterative refinement search protocol that leverages such multi-resolution structural data to predict accurate high resolution model of protein complexes, and successfully applied the protocol to model gp120, a protein on the spike of HIV and currently the most feasible target for anti-HIV drug design.Computer Science

Texas ScholarWorks

Seventh Biennial Report : June 2003 - March 2005

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2005
Field of study

MPG.PuRe

Bayesian signal reconstruction, Markov random fields, and x-ray crystallography

Author
Publication venue: Massachusetts Institute of Technology, Laboratory for Information and Decision Systems
Publication date: 01/01/1990
Field of study

Cover title.Includes bibliographical references (p. 35-37).Research supported by the Army Research Office. DAAL03-86-K-0171 Research supported by the Air Force Office of Scientific Research. AFOSR-89-0276 Research supported by the National Science Foundation. ECS-8700903Peter C. Doerschuk

DSpace@MIT

Nature’s Optics and Our Understanding of Light

Author: Berry Michael
Publication venue: Arab Journals Platform
Publication date: 15/03/2022
Field of study

Optical phenomena visible to everyone abundantly illustrate important ideas in science and mathematics. The phenomena considered include rainbows, sparkling reflections on water, green flashes, earthlight on the moon, glories, daylight, crystals, and the squint moon. The concepts include refraction, wave interference, numerical experiments, asymptotics, Regge poles, polarisation singularities, conical intersections, and visual illusions

Arab Journals Platform

Machine Learning on Neutron and X-Ray Scattering

Author: Andrejevic Nina
Chan Maria
Chen Zhantao
Drucker Nathan
Ernstorfer Ralph
Li Mingda
Nguyen Thanh
Smidt Tess
Tennant Alan
Wang Yao
Xian R Patrick
Publication venue: 'AIP Publishing'
Publication date: 05/02/2021
Field of study

Neutron and X-ray scattering represent two state-of-the-art materials characterization techniques that measure materials' structural and dynamical properties with high precision. These techniques play critical roles in understanding a wide variety of materials systems, from catalysis to polymers, nanomaterials to macromolecules, and energy materials to quantum materials. In recent years, neutron and X-ray scattering have received a significant boost due to the development and increased application of machine learning to materials problems. This article reviews the recent progress in applying machine learning techniques to augment various neutron and X-ray scattering techniques. We highlight the integration of machine learning methods into the typical workflow of scattering experiments. We focus on scattering problems that faced challenge with traditional methods but addressable using machine learning, such as leveraging the knowledge of simple materials to model more complicated systems, learning with limited data or incomplete labels, identifying meaningful spectra and materials' representations for learning tasks, mitigating spectral noise, and many others. We present an outlook on a few emerging roles machine learning may play in broad types of scattering and spectroscopic problems in the foreseeable future.Comment: 56 pages, 12 figures. Feedback most welcom

arXiv.org e-Print Archive

MPG.PuRe