191 research outputs found
Computational Methods in Science and Engineering : Proceedings of the Workshop SimLabs@KIT, November 29 - 30, 2010, Karlsruhe, Germany
In this proceedings volume we provide a compilation of article contributions equally covering applications from different research fields and ranging from capacity up to capability computing. Besides classical computing aspects such as parallelization, the focus of these proceedings is on multi-scale approaches and methods for tackling algorithm and data complexity. Also practical aspects regarding the usage of the HPC infrastructure and available tools and software at the SCC are presented
Complexity, Emergent Systems and Complex Biological Systems:\ud Complex Systems Theory and Biodynamics. [Edited book by I.C. Baianu, with listed contributors (2011)]
An overview is presented of System dynamics, the study of the behaviour of complex systems, Dynamical system in mathematics Dynamic programming in computer science and control theory, Complex systems biology, Neurodynamics and Psychodynamics.\u
Metal Cations in Protein Force Fields: From Data Set Creation and Benchmarks to Polarizable Force Field Implementation and Adjustment
Metal cations are essential to life. About one-third of all proteins require metal cofactors to accurately fold or to function. Computer simulations using empirical parameters and classical molecular mechanics models (force fields) are the standard tool to investigate proteins’ structural dynamics and functions in silico. Despite many successes, the accuracy of force fields is limited when cations are involved. The focus of this thesis is the development of tools and strategies to create system-specific force field parameters to accurately describe cation-protein interactions. The accuracy of a force field mainly relies on (i) the parameters derived from increasingly large quantum chemistry or experimental data and (ii) the physics behind the energy formula.
The first part of this thesis presents a large and comprehensive quantum chemistry data set on a consistent computational footing that can be used for force field parameterization and benchmarking. The data set covers dipeptides of the 20 proteinogenic amino acids with different possible side chain protonation states, 3 divalent cations (Ca2+, Mg2+, and Ba2+), and a wide relative energy range. Crucial properties related to force field development, such as partial charges, interaction energies, etc., are also provided. To make the data available, the data set was uploaded to the NOMAD repository and its data structure was formalized in an ontology.
Besides a proper data basis for parameterization, the physics covered by the terms of the additive force field formulation model impacts its applicability. The second part of this thesis
benchmarks three popular non-polarizable force fields and the polarizable Drude model against a quantum chemistry data set. After some adjustments, the Drude model was found to reproduce the reference interaction energy substantially better than the non-polarizable force fields, which showed the importance of explicitly addressing polarization effects. Tweaking of the Drude model involved Boltzmann-weighted fitting to optimize Thole factors and Lennard-Jones parameters. The obtained parameters were validated by (i) their ability to reproduce reference interaction energies and (ii) molecular dynamics simulations of the N-lobe of calmodulin. This work facilitates the improvement of polarizable force fields for cation-protein interactions by quantum chemistry-driven parameterization combined with molecular dynamics simulations in the condensed phase.
While the Drude model exhibits its potential simulating cation-protein interactions, it lacks description of charge transfer effects, which are significant between cation and protein. The CTPOL model extends the classical force field formulation by charge transfer (CT) and polarization (POL). Since the CTPOL model is not readily available in any of the popular molecular-dynamics packages, it was implemented in OpenMM. Furthermore, an open-source parameterization tool, called FFAFFURR, was implemented that enables the (system specific) parameterization of OPLS-AA and CTPOL models. Following the method established in the previous part, the performance of FFAFFURR was evaluated by its ability to reproduce quantum chemistry energies and molecular dynamics simulations of the zinc finger protein.
In conclusion, this thesis steps towards the development of next-generation force fields to accurately describe cation-protein interactions by providing (i) reference data, (ii) a force field model that includes charge transfer and polarization, and (iii) a freely-available parameterization tool.Metallkationen sind für das Leben unerlässlich. Etwa ein Drittel aller Proteine benötigen Metall-Cofaktoren, um sich korrekt zu falten oder zu funktionieren. Computersimulationen unter Verwendung empirischer Parameter und klassischer Molekülmechanik-Modelle (Kraftfelder) sind ein Standardwerkzeug zur Untersuchung der strukturellen Dynamik und Funktionen von Proteinen in silico. Trotz vieler Erfolge ist die Genauigkeit der Kraftfelder begrenzt, wenn Kationen beteiligt sind. Der Schwerpunkt dieser Arbeit liegt auf der Entwicklung von Werkzeugen und Strategien zur Erstellung systemspezifischer Kraftfeldparameter zur genaueren Beschreibung von Kationen-Protein-Wechselwirkungen. Die Genauigkeit eines Kraftfelds hängt hauptsächlich von (i) den Parametern ab, die aus immer größeren quantenchemischen oder experimentellen Daten abgeleitet werden, und (ii) der Physik hinter der Kraftfeld-Formel.
Im ersten Teil dieser Arbeit wird ein großer und umfassender quantenchemischer Datensatz auf einer konsistenten rechnerischen Grundlage vorgestellt, der für die Parametrisierung und das Benchmarking von Kraftfeldern verwendet werden kann. Der Datensatz umfasst Dipeptide der 20 proteinogenen Aminosäuren mit verschiedenen möglichen Seitenketten-Protonierungszuständen, 3 zweiwertige Kationen (Ca2+, Mg2+ und Ba2+) und einen breiten relativen Energiebereich. Wichtige Eigenschaften für die Entwicklung von Kraftfeldern, wie Wechselwirkungsenergien, Partialladungen usw., werden ebenfalls bereitgestellt. Um die Daten verfügbar zu machen, wurde der Datensatz in das NOMAD-Repository hochgeladen und seine Datenstruktur wurde in einer Ontologie formalisiert.
Neben einer geeigneten Datenbasis für die Parametrisierung beeinflusst die Physik, die von den Termen des additiven Kraftfeld-Modells abgedeckt wird, dessen Anwendbarkeit. Der zweite Teil dieser Arbeit vergleicht drei populäre nichtpolarisierbare Kraftfelder und das polarisierbare Drude-Modell mit einem Datensatz aus der Quantenchemie. Nach einigen Anpassungen stellte sich heraus, dass das Drude-Modell die Referenzwechselwirkungsenergie wesentlich besser reproduziert als die nichtpolarisierbaren Kraftfelder, was zeigt, wie wichtig es ist, Polarisationseffekte explizit zu berücksichtigen. Die Anpassung des Drude-Modells umfasste eine Boltzmann-gewichtete Optimierung der Thole-Faktoren und Lennard-Jones-Parameter. Die erhaltenen Parameter wurden validiert durch (i) ihre Fähigkeit, Referenzwechselwirkungsenergien zu reproduzieren und (ii) Molekulardynamik-Simulationen des Calmodulin-N-Lobe. Diese Arbeit demonstriert die Verbesserung polarisierbarer Kraftfelder für Kationen-Protein-Wechselwirkungen durch quantenchemisch gesteuerte Parametrisierung in Kombination mit Molekulardynamiksimulationen in der kondensierten Phase.
Während das Drude-Modell sein Potenzial bei der Simulation von Kation - Protein - Wechselwirkungen zeigt, fehlt ihm die Beschreibung von Ladungstransfereffekten, die zwischen Kation und Protein von Bedeutung sind. Das CTPOL-Modell erweitert die klassische Kraftfeldformulierung um den Ladungstransfer (CT) und die Polarisation (POL). Da das CTPOL-Modell in keinem der gängigen Molekulardynamik-Pakete verfügbar ist, wurde es in OpenMM implementiert. Außerdem wurde ein Open-Source-Parametrisierungswerkzeug namens FFAFFURR implementiert, welches die (systemspezifische) Parametrisierung von OPLS-AA und CTPOL-Modellen ermöglicht. In Anlehnung an die im vorangegangenen Teil etablierte Methode wurde die Leistung von FFAFFURR anhand seiner Fähigkeit, quantenchemische Energien und Molekulardynamiksimulationen des Zinkfingerproteins zu reproduzieren, bewertet.
Zusammenfassend lässt sich sagen, dass diese Arbeit einen Schritt in Richtung der Entwicklung von Kraftfeldern der nächsten Generation zur genauen Beschreibung von Kationen-Protein-Wechselwirkungen darstellt, indem sie (i) Referenzdaten, (ii) ein Kraftfeldmodell, das Ladungstransfer und Polarisation einschließt, und (iii) ein frei verfügbares Parametrisierungswerkzeug bereitstellt
At the crossroads of big science, open science, and technology transfer
Les grans infraestructures cientĂfiques s’enfronten a demandes creixents de responsabilitat pĂşblica, no nomĂ©s per la seva contribuciĂł al descobriment cientĂfic, sinĂł tambĂ© per la seva capacitat de generar valor econòmic secundari. Per construir i operar les seves infraestructures sofisticades, sovint generen tecnologies frontereres dissenyant i construint solucions tècniques per a problemes d’enginyeria complexos i sense precedents. En paral·lel, la dècada anterior ha presenciat la rĂ pida irrupciĂł de canvis tecnològics que han afectat la manera com es fa i es comparteix la ciència, cosa que ha comportat l’emergència del concepte d’Open Science (OS). Els governs avancen rĂ pidament vers aquest paradigma de OS i demanen a les grans infraestructures cientĂfiques que "obrin" els seus processos cientĂfics. No obstant, aquestes dues forces s'oposen, ja que la comercialitzaciĂł de tecnologies i resultats cientĂfics requereixen normalment d’inversions financeres importants i les empreses nomĂ©s estan disposades a assumir aquest cost si poden protegir la innovaciĂł de la imitaciĂł o de la competència deslleial. Aquesta tesi doctoral tĂ© com a objectiu comprendre com les noves aplicacions de les TIC afecten els resultats de la recerca i la transferència de tecnologia resultant en el context de les grans infraestructures cientĂfiques. La tesis pretĂ©n descobrir les tensions entre aquests dos vectors normatius, aixĂ com identificar els mecanismes que s’utilitzen per superar-les. La tesis es compon de quatre estudis: 1) Un estudi que aplica un mètode de recerca mixt que combina dades de dues enquestes d’escala global realitzades online (2016, 2018), amb dos cas d’estudi de dues comunitats cientĂfiques en fĂsica d’alta energia i biologia molecular que avaluen els factors explicatius darrere les prĂ ctiques de compartir dades per part dels cientĂfics; 2) Un estudi de cas d’Open Targets, una infraestructura d’informaciĂł basada en dades considerades bens comuns, on el Laboratori Europeu de Biologia Molecular-EBI i empreses farmacèutiques col·laboren i comparteixen dades cientĂfiques i eines tecnològiques per accelerar el descobriment de medicaments; 3) Un estudi d’un conjunt de dades Ăşnic de 170 projectes finançats en el marc d’ATTRACT (un nou instrument de la ComissiĂł Europea liderat per les grans infraestructures cientĂfiques europees) que tĂ© com a objectiu comprendre la naturalesa del procĂ©s de serendipitat que hi ha darrere de la transiciĂł de tecnologies de grans infraestructures cientĂfiques a aplicacions comercials abans no anticipades. ; i 4) un cas d’estudi sobre la tecnologia White Rabbit, un hardware sofisticat de codi obert desenvolupat al Consell Europeu per a la Recerca Nuclear (CERN) en col·laboraciĂł amb un extens ecosistema d’empreses.Las grandes infraestructuras cientĂficas se enfrentan a crecientes demandas de responsabilidad pĂşblica, no solo por su contribuciĂłn al descubrimiento cientĂfico sino tambiĂ©n por su capacidad de generar valor econĂłmico para la sociedad. Para construir y operar sus sofisticadas infraestructuras, a menudo generan tecnologĂas de vanguardia al diseñar y construir soluciones tĂ©cnicas para problemas de ingenierĂa complejos y sin precedentes. Paralelamente, la dĂ©cada anterior ha visto la irrupciĂłn de rápidos cambios tecnolĂłgicos que afectan la forma en que se genera y comparte la ciencia, lo que ha llevado a acuñar el concepto de Open Science (OS). Los gobiernos se están moviendo rápidamente hacia este nuevo paradigma y están pidiendo a las grandes infraestructuras cientĂficas que "abran" el proceso cientĂfico. Sin embargo, estas dos fuerzas se oponen, ya que la comercializaciĂłn de tecnologĂa y productos cientĂficos generalmente requiere importantes inversiones financieras y las empresas están dispuestas a asumir este coste solo si pueden proteger la innovaciĂłn de la imitaciĂłn o la competencia desleal. Esta tesis doctoral tiene como objetivo comprender cĂłmo las nuevas aplicaciones de las TIC están afectando los resultados cientĂficos y la transferencia de tecnologĂa resultante en el contexto de las grandes infraestructuras cientĂficas. La tesis pretende descubrir las tensiones entre estas dos fuerzas normativas e identificar los mecanismos que se emplean para superarlas. La tesis se compone de cuatro estudios: 1) Un estudio que emplea un mĂ©todo mixto de investigaciĂłn que combina datos de dos encuestas de escala global realizadas online (2016, 2018), con dos caso de estudio sobre dos comunidades cientĂficas distintas -fĂsica de alta energĂa y biologĂa molecular- que evalĂşan los factores explicativos detrás de las prácticas de intercambio de datos cientĂficos; 2) Un caso de estudio sobre Open Targets, una infraestructura de informaciĂłn basada en datos considerados como bienes comunes, donde el Laboratorio Europeo de BiologĂa Molecular-EBI y compañĂas farmacĂ©uticas colaboran y comparten datos cientĂficos y herramientas tecnolĂłgicas para acelerar el descubrimiento de fármacos; 3) Un estudio de un conjunto de datos Ăşnico de 170 proyectos financiados bajo ATTRACT, un nuevo instrumento de la ComisiĂłn Europea liderado por grandes infraestructuras cientĂficas europeas, que tiene como objetivo comprender la naturaleza del proceso fortuito detrás de la transiciĂłn de las tecnologĂas de grandes infraestructuras cientĂficas a aplicaciones comerciales previamente no anticipadas ; y 4) un estudio de caso de la tecnologĂa White Rabbit, un sofisticado hardware de cĂłdigo abierto desarrollado en el Consejo Europeo de InvestigaciĂłn Nuclear (CERN) en colaboraciĂłn con un extenso ecosistema de empresas.Big science infrastructures are confronting increasing demands for public accountability, not only within scientific discovery but also their capacity to generate secondary economic value. To build and operate their sophisticated infrastructures, big science often generates frontier technologies by designing and building technical solutions to complex and unprecedented engineering problems. In parallel, the previous decade has seen the disruption of rapid technological changes impacting the way science is done and shared, which has led to the coining of the concept of Open Science (OS). Governments are quickly moving towards the OS paradigm and asking big science centres to "open up” the scientific process. Yet these two forces run in opposition as the commercialization of scientific outputs usually requires significant financial investments and companies are willing to bear this cost only if they can protect the innovation from imitation or unfair competition. This PhD dissertation aims at understanding how new applications of ICT are affecting primary research outcomes and the resultant technology transfer in the context of big and OS. It attempts to uncover the tensions in these two normative forces and identify the mechanisms that are employed to overcome them. The dissertation is comprised of four separate studies: 1) A mixed-method study combining two large-scale global online surveys to research scientists (2016, 2018), with two case studies in high energy physics and molecular biology scientific communities that assess explanatory factors behind scientific data-sharing practices; 2) A case study of Open Targets, an information infrastructure based upon data commons, where European Molecular Biology Laboratory-EBI and pharmaceutical companies collaborate and share scientific data and technological tools to accelerate drug discovery; 3) A study of a unique dataset of 170 projects funded under ATTRACT -a novel policy instrument of the European Commission lead by European big science infrastructures- which aims to understand the nature of the serendipitous process behind transitioning big science technologies to previously unanticipated commercial applications; and 4) a case study of White Rabbit technology, a sophisticated open-source hardware developed at the European Council for Nuclear Research (CERN) in collaboration with an extensive ecosystem of companies
Recommended from our members
Development of X-ray Methods for the Investigation of Protein Dynamics
Throughout history, methodological innovations have resulted in breakthroughs in our understanding of biology. Methods for determining static protein structures, as well as those for probing protein dynamics, are well-established. Nonetheless, visualizing molecules as dynamic entities that respond to their environment is still an outstanding challenge. Specifically, it is challenging to measure the spatial position of all the atoms within a molecule as a function of time. That challenge is the broad focus of this dissertation.In chapter one, I begin by diving into modern crystallographic techniques that enable one to solve protein structures from sub-micron-sized crystals. I compare and contrast two methods, serial crystallography and electron crystallography, asking how each technique affects the protein’s structure. A primary factor differentiating these two methods is the temperature of the sample during the experiment. Despite this difference, both methods enable one to solve high-resolution structures from small crystals. This is advantageous for time-resolved experiments. Since there are fewer molecules in a small crystal, the perturbation is more uniform, which provides a clearer time-resolved signal. In chapter two, I investigate temperature-jumps as a generalized perturbation for resolving the energy landscape of proteins. In this work, I focus on solution scattering experiments, which allow one to examine large-scale perturbations to a protein, as well as changes in the solvent shell surrounding the molecule. By mutating selected residues, we inhibited specific protein motions. Comparing these mutants to the wild-type protein allowed us to resolve the motions driven by an infrared laser. Nonetheless, we wished to gain all-atom spatial resolution, which required us to perform a temperature-jump within the context of crystallography rather than solution scattering.In chapter three, I expand upon the temperature-jump detection method described in chapter two. By adapting this method to accommodate X-ray diffraction images, I demonstrate that we can detect temperature-jumps within a crystalline context. This is a crucial step in the development of a generalized perturbation for time-resolved crystallography. Given the timescale of the measurements, reading out the temperature directly from the X-ray data is the only effective way to track the sample’s response. Thus, our method offers proof-of-principle that IR laser-based temperature-jumps are feasible for time-resolved crystallography. While measuring the diffuse scattering signal is useful for temperature-jump detection, the diffuse signal also holds the potential to inform our understanding of protein dynamics.In chapter four, I review the field of macromolecular diffuse scattering, as of late 2017. I begin by considering data collection practices, which requires extremely careful and controlled measurements. Then I examine different group's approaches to processing the data, as well as their models of the disorder that drives it. Finally, I consider the broader impact of diffuse analysis upon the field, ranging from the improvement of molecular dynamics forcefields to improved phasing and resolution extension. While these impacts hold exciting implications, it is clear that collecting high-quality is the first challenge to solve.In chapter five, I examine the challenges of collecting high-quality diffuse scattering from protein crystals. I describe how parasitic scattering can confound our ability to develop rigorous models of the crystalline disorder that gives rise to the diffuse signal. Then I work through experimental measures that we took to minimize parasitic scattering while maximizing diffuse scatter driven by protein motions
Computational Design of Protein Structure and Prediction of Ligand Binding
Proteins perform a tremendous array of finely-tuned functions which are not only critical in living organisms, but can be used for industrial and medical purposes. The ability to rationally design these molecular machines could provide a wealth of opportunities, for example to improve human health and to expand the range and reduce cost of many industrial chemical processes. The modularity of a protein sequence combined with many degrees of structural freedom yield a problem that can frequently be best tackled using computational methods. These computational methods, which include the use of: bioinformatics analysis, molecular dynamics, empirical forcefields, statistical potentials, and machine learning approaches, amongst others, are collectively known as Computational Protein Design (CPD). Here CPD is examined from the perspective of four different goals: successful design of an intended structure, the prediction of folding and unfolding kinetics from structure (kinetic stability in particular), engineering of improved stability, and prediction of binding sites and energetics.
A considerable proportion of protein folds, and the majority of the most common folds ("superfolds"), are internally symmetric, suggesting emergence from an ancient repetition event. CPD, an increasingly popular and successful method for generating de novo folded sequences and topologies, suffers from exponential scaling of complexity with protein size. Thus, the overwhelming majority of successful designs are of relatively small proteins (< 100 amino acids). Designing proteins comprised of repeated modular elements allows the design space to be partitioned into more manageable portions. Here, a bioinformatics analysis of a "superfold", the beta-trefoil, demonstrated that formation of a globular fold via repetition was not only an ancient event, but an ongoing means of generating diverse and functional sequences. Modular repetition also promotes rapid evolution for binding multivalent targets in the "evolutionary arms race" between host and pathogen. Finally, modular repetition was used to successfully design, on the first attempt, a well-folded and functional beta-trefoil, called ThreeFoil.
Improving protein design requires understanding the outcomes of design and not simply the 3D structure. To this end, I undertook an extensive biophysical characterization of ThreeFoil, with the key finding that its unfolding is extraordinarily slow, with a half-life of almost a decade. This kinetic stability grants ThreeFoil near-immunity to common denaturants as well as high resistance to proteolysis. A large scale analysis of hundreds of proteins, and coarse-grained modelling of ThreeFoil and other beta-trefoils, indicates that high kinetic stability results from a folded structure rich in contacts between residues distant in sequence (long-range contacts). Furthermore, an analysis of unrelated proteins known to have similar protease resistance, demonstrates that the topological complexity resulting from these long-range contacts may be a general mechanism by which proteins remain folded in harsh environments.
Despite the wonderful kinetic stability of ThreeFoil, it has only moderate thermodynamic stability. I sought to improve this in order to provide a stability buffer for future functional engineering and mutagenesis. Numerous computational tools which predict stability change upon point mutation were used, and 10 mutations made based on their recommendations. Despite claims of >80% accuracy for these predictions, only 2 of the 10 mutations were stabilizing. An in-depth analysis of more than 20 such tools shows that, to a large extent, while they are capable of recognizing highly destabilizing mutations, they are unable to distinguish between moderately destabilizing and stabilizing mutations.
Designing protein structure tests our understanding of the determinants of protein folding, but useful function is often the final goal of protein engineering. I explored protein-ligand binding using molecular dynamics for several protein-ligand systems involving both flexible ligand binding to deep pockets and more rigid ligand binding to shallow grooves. I also used various levels of simulation complexity, from gas-phase, to implicit solvent, to fully explicit solvent, as well as simple equilibrium simulations to interrogate known interactions to more complex energetically biased simulations to explore diverse configurations and gain novel information
- …