193 research outputs found

    A Kinetic Model of Trp-Cage Folding from Multiple Biased Molecular Dynamics Simulations

    Get PDF
    Trp-cage is a designed 20-residue polypeptide that, in spite of its size, shares several features with larger globular proteins. Although the system has been intensively investigated experimentally and theoretically, its folding mechanism is not yet fully understood. Indeed, some experiments suggest a two-state behavior, while others point to the presence of intermediates. In this work we show that the results of a bias-exchange metadynamics simulation can be used for constructing a detailed thermodynamic and kinetic model of the system. The model, although constructed from a biased simulation, has a quality similar to those extracted from the analysis of long unbiased molecular dynamics trajectories. This is demonstrated by a careful benchmark of the approach on a smaller system, the solvated Ace-Ala3-Nme peptide. For the Trp-cage folding, the model predicts that the relaxation time of 3100 ns observed experimentally is due to the presence of a compact molten globule-like conformation. This state has an occupancy of only 3% at 300 K, but acts as a kinetic trap. Instead, non-compact structures relax to the folded state on the sub-microsecond timescale. The model also predicts the presence of a state at of 4.4 Ã… from the NMR structure in which the Trp strongly interacts with Pro12. This state can explain the abnormal temperature dependence of the and chemical shifts. The structures of the two most stable misfolded intermediates are in agreement with NMR experiments on the unfolded protein. Our work shows that, using biased molecular dynamics trajectories, it is possible to construct a model describing in detail the Trp-cage folding kinetics and thermodynamics in agreement with experimental data

    Characterizing Structure and Free Energy Landscape of Proteins by NMR-guided Metadynamics

    Get PDF
    In the last two decades, a series of experimental and theoretical advances has made it possible to obtain a detailed understanding of the molecular mechanisms underlying the folding process of proteins. With the increasing power of computer technology, as well as with the improvements in force fields, atomistic simulations are also becoming increasingly important because they can generate highly detailed descriptions of the motions of proteins. A supercomputer specifically designed to integrate the Newton's equations of motion of proteins, Anton, has been recently able to break the millisecond time barrier. This achievement has allowed the direct calculation of repeated folding events for several fast-folding proteins and to characterize the molecular mechanisms underlying protein dynamics and function. However these exceptional resources are available only to few research groups in the world and moreover the observation of few event of a specific process is usually not enough to provide a statistically significant picture of the phenomenon. In parallel, it has also been realized that by bringing together experimental measurements and computational methods it is possible to expand the range of problems that can be addressed. For example, by incorporating structural informations as structural restraints in molecular dynamics simulations it is possible to obtain structural models of these transiently populated states, as well as of native and non-native intermediates explored during the folding process. By applying this strategy to structural parameters measured by nuclear magnetic resonance (NMR) spectroscopy, one can determine the atomic-level structures and characterize the dynamics of proteins. In these approaches the experimental information is exploited to create an additional term in the force field that penalizes the deviations from the measured values, thus restraining the sampling of the conformational space to regions close to those observed experimentally. In this thesis we propose an alternative strategy to exploit experimental information in molecular dynamics simulations. In this approach the measured parameters are not used as structural restraints in the simulations, but rather to build collective variables within metadynamics calculations. In metadynamics , the conformational sampling is enhanced by constructing a time-dependent potential that discourages the explorations of regions already visited in terms of specific functions of the atomic coordinates called collective variables. In this work we show that NMR chemical shifts can be used as collective variables to guide the sampling of conformational space in molecular dynamics simulations. Since the method that we discuss here enables the conformational sampling to be enhanced without modifying the force field through the introduction of structural restraints, it allows estimating reliably the statistical weights corresponding to the force field used in the molecular dynamics simulations. In the present implementation we used the bias exchange metadynamics method, an enhanced sampling technique that allows reconstructing the free energy as a simultaneous function of several variables. By using this approach, we have been able to compute the free energy landscape of two different proteins by explicit solvent molecular dynamics simulations. In the application to a well-structured globular protein, the third immunoglobulin-binding domain of streptococcal protein G (GB3), our calculation predicts the native fold as the lowest free energy minimum, identifying also the presence of an on-pathway compact intermediate with non-native topological elements. In addition, we provide a detailed atomistic picture of the structure at the folding barrier, which shares with the native state only a fraction of the secondary structure elements. The further application to the case of the 40-residue form of Amyloid beta, allows us another remarkable achievement: the quantitative description of the free energy landscape for an intrinsically disordered protein. This kind of proteins are indeed characterized by the absence of a well-defined three-dimensional structure under native conditions and are therefore hard to investigate experimentally. We found that the free energy landscape of this peptide has approximately inverted features with respect to normal globular proteins. Indeed, the global minimum consists of highly disordered structures while higher free energy regions correspond to partially folded conformations. These structures are kinetically committed to the disordered state, but they are transiently explored even at room temperature. This makes our findings particularly relevant since this protein is involved in the Alzheimer's disease because it is prone to aggregate in oligomers determined by the interaction of the monomer in extended beta-strand organization, toxic for the cells. Our structural and energetic characterization allows defining a library of possible metastable states which are involved in the aggregation process. These results have been obtained using relatively limited computational resources. The total simulation time required to reconstruct the thermodynamics of GB3 for example is about three orders of magnitude less than the typical timescale of folding of similar proteins, simulated also by Anton. We thus anticipate that the technique introduced in this thesis will allow the determination of the free energy landscapes of wide range of proteins for which NMR chemical shifts are available. Finally, since chemical shifts are the only external information used to guide the folding of the proteins, our methods can be also successfully applied to the challenging purpose of NMR structure determination, as we have demonstrated in a blind prediction test on the last CASD-NMR target

    Experimental design methods for nano-fabrication processes

    Get PDF
    Most design of experiments assumes predetermined design regions. Design regions with uncertainty are of interest in the first chapter. This chapter proposes optimal designs under a two-part model to handle the uncertainty in the design region. In particular, the logit model in the two-part model is used to assess the uncertainty on the boundary of the design region. The second chapter proposes an efficient and effective multi-layer data collection scheme (Layers of Experiments) for building accurate statistical models to meet tight tolerance requirement commonly encountered in nano-fabrication. "Layers-of-Experiments" (LOE) obtain sub-regions of interest (layer) where the process optimum is expected to lie and collect more data in the sub-regions with concentrated focus. The third chapter contributes a new design criterion combining model-based optimal design and model-free space-filling design in a constraint manner. The proposed design is useful when the fitted statistical model is required to have both characteristics: accuracy in statistical inference and design space exploration. The fourth chapter proposes adaptive combined designs in the layers of experiments. This chapter also develops methods to improve model quality by combining information from various layers and from engineering models. Combined designs are modified to improve its efficiency by incorporate collected field data from several layers of experiments. Updated engineering models are used to build more accurate statistical models.PhDCommittee Chair: Lu, Jye-Chyi; Committee Co-Chair: Grover, Martha; Committee Member: Jeong, Myong K.; Committee Member: Mei, Yajun; Committee Member: Shi, Jianju

    On the electrical and structural properties of boron delta layers in silicon

    Get PDF
    This thesis describes the first successful growth of boron δ layers using silicon MBE. SIMS has been used to demonstrate that the layer widths are ∽2nm as has been confirmed by TEM. This is probably an overestimate, an average value of (0.3+-0.5)nm being obtained from XRD, suggesting that these are the thinnest 6 layers produced to date. Hall and XRD measurements indicate that the boron dopant is fully activated up to sheet coverages of 1/2 monolayer, i. e. ∽3.5x10^14cm-2. The CV profile obtained for a B δ layer of sheet density 2.5x10^12cm-2 has FWHM ∽3nm, a result which is shown to be consisitent with δ doping in the light of recent theoretical work. Resistivity, magnetoresistance and the Hall effect have been measured at temperatures down to 0.3K using magnetic fields of up to 12T on samples of sheet density in the range 4x10^12cm-2 to 8x10^13cm-2. 2D weak localisation and associated electron-electron interaction effects have been observed in samples of sheet density above 1.8x10^13cm-2 with evidence of spin-orbit scattering. These samples are shown to undergo a "metal-insulator" transition in high magnetic fields with variable range hopping at 12T. Samples of sheet density ≤ 1x10^13cm-2, show activated transport from which it is concluded that the critical acceptor separation for the metal-insulator transition in this system is significantly less than the value found in bulk, uniformly doped, Si:B. It is suggested that this may be due to the splitting of the valence band degeneracy due to quantum confinement

    Arquitectura, técnicas y modelos para posibilitar la Ciencia de Datos en el Archivo de la Misión Gaia

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 26/05/2017.The massive amounts of data that the world produces every day pose new challenges to modern societies in terms of how to leverage their inherent value. Social networks, instant messaging, video, smart devices and scientific missions are just mere examples of the vast number of sources generating data every second. As the world becomes more and more digitalized, new needs arise for organizing, archiving, sharing, analyzing, visualizing and protecting the ever-increasing data sets, so that we can truly develop into a data-driven economy that reduces inefficiencies and increases sustainability, creating new business opportunities on the way. Traditional approaches for harnessing data are not suitable any more as they lack the means for scaling to the larger volumes in a timely and cost efficient manner. This has somehow changed with the advent of Internet companies like Google and Facebook, which have devised new ways of tackling this issue. However, the variety and complexity of the value chains in the private sector as well as the increasing demands and constraints in which the public one operates, needs an ongoing research that can yield newer strategies for dealing with data, facilitate the integration of providers and consumers of information, and guarantee a smooth and prompt transition when adopting these cutting-edge technological advances. This thesis aims at providing novel architectures and techniques that will help perform this transition towards Big Data in massive scientific archives. It highlights the common pitfalls that must be faced when embracing it and how to overcome them, especially when the data sets, their transformation pipelines and the tools used for the analysis are already present in the organizations. Furthermore, a new perspective for facilitating a smoother transition is laid out. It involves the usage of higher-level and use case specific frameworks and models, which will naturally bridge the gap between the technological and scientific domains. This alternative will effectively widen the possibilities of scientific archives and therefore will contribute to the reduction of the time to science. The research will be applied to the European Space Agency cornerstone mission Gaia, whose final data archive will represent a tremendous discovery potential. It will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), providing unprecedented position, parallax and proper motion measurements for about one billion stars. The successful exploitation of this data archive will depend to a large degree on the ability to offer the proper architecture, i.e. infrastructure and middleware, upon which scientists will be able to do exploration and modeling with this huge data set. In consequence, the approach taken needs to enable data fusion with other scientific archives, as this will produce the synergies leading to an increment in scientific outcome, both in volume and in quality. The set of novel techniques and frameworks presented in this work addresses these issues by contextualizing them with the data products that will be generated in the Gaia mission. All these considerations have led to the foundations of the architecture that will be leveraged by the Science Enabling Applications Work Package. Last but not least, the effectiveness of the proposed solution will be demonstrated through the implementation of some ambitious statistical problems that will require significant computational capabilities, and which will use Gaia-like simulated data (the first Gaia data release has recently taken place on September 14th, 2016). These ambitious problems will be referred to as the Grand Challenge, a somewhat grandiloquent name that consists in inferring a set of parameters from a probabilistic point of view for the Initial Mass Function (IMF) and Star Formation Rate (SFR) of a given set of stars (with a huge sample size), from noisy estimates of their masses and ages respectively. This will be achieved by using Hierarchical Bayesian Modeling (HBM). In principle, the HBM can incorporate stellar evolution models to infer the IMF and SFR directly, but in this first step presented in this thesis, we will start with a somewhat less ambitious goal: inferring the PDMF and PDAD. Moreover, the performance and scalability analyses carried out will also prove the suitability of the models for the large amounts of data that will be available in the Gaia data archive.Las grandes cantidades de datos que se producen en el mundo diariamente plantean nuevos retos a la sociedad en términos de cómo extraer su valor inherente. Las redes sociales, mensajería instantánea, los dispositivos inteligentes y las misiones científicas son meros ejemplos del gran número de fuentes generando datos en cada momento. Al mismo tiempo que el mundo se digitaliza cada vez más, aparecen nuevas necesidades para organizar, archivar, compartir, analizar, visualizar y proteger la creciente cantidad de datos, para que podamos desarrollar economías basadas en datos e información que sean capaces de reducir las ineficiencias e incrementar la sostenibilidad, creando nuevas oportunidades de negocio por el camino. La forma en la que se han manejado los datos tradicionalmente no es la adecuada hoy en día, ya que carece de los medios para escalar a los volúmenes más grandes de datos de una forma oportuna y eficiente. Esto ha cambiado de alguna manera con la llegada de compañías que operan en Internet como Google o Facebook, ya que han concebido nuevas aproximaciones para abordar el problema. Sin embargo, la variedad y complejidad de las cadenas de valor en el sector privado y las crecientes demandas y limitaciones en las que el sector público opera, necesitan una investigación continua en la materia que pueda proporcionar nuevas estrategias para procesar las enormes cantidades de datos, facilitar la integración de productores y consumidores de información, y garantizar una transición rápida y fluida a la hora de adoptar estos avances tecnológicos innovadores. Esta tesis tiene como objetivo proporcionar nuevas arquitecturas y técnicas que ayudarán a realizar esta transición hacia Big Data en archivos científicos masivos. La investigación destaca los escollos principales a encarar cuando se adoptan estas nuevas tecnologías y cómo afrontarlos, principalmente cuando los datos y las herramientas de transformación utilizadas en el análisis existen en la organización. Además, se exponen nuevas medidas para facilitar una transición más fluida. Éstas incluyen la utilización de software de alto nivel y específico al caso de uso en cuestión, que haga de puente entre el dominio científico y tecnológico. Esta alternativa ampliará de una forma efectiva las posibilidades de los archivos científicos y por tanto contribuirá a la reducción del tiempo necesario para generar resultados científicos a partir de los datos recogidos en las misiones de astronomía espacial y planetaria. La investigación se aplicará a la misión de la Agencia Espacial Europea (ESA) Gaia, cuyo archivo final de datos presentará un gran potencial para el descubrimiento y hallazgo desde el punto de vista científico. La misión creará el catálogo en tres dimensiones más grande y preciso de nuestra galaxia (la Vía Láctea), proporcionando medidas sin precedente acerca del posicionamiento, paralaje y movimiento propio de alrededor de mil millones de estrellas. Las oportunidades para la explotación exitosa de este archivo de datos dependerán en gran medida de la capacidad de ofrecer la arquitectura adecuada, es decir infraestructura y servicios, sobre la cual los científicos puedan realizar la exploración y modelado con esta inmensa cantidad de datos. Por tanto, la estrategia a realizar debe ser capaz de combinar los datos con otros archivos científicos, ya que esto producirá sinergias que contribuirán a un incremento en la ciencia producida, tanto en volumen como en calidad de la misma. El conjunto de técnicas e infraestructuras innovadoras presentadas en este trabajo aborda estos problemas, contextualizándolos con los productos de datos que se generarán en la misión Gaia. Todas estas consideraciones han conducido a los fundamentos de la arquitectura que se utilizará en el paquete de trabajo de aplicaciones que posibilitarán la ciencia en el archivo de la misión Gaia (Science Enabling Applications). Por último, la eficacia de la solución propuesta se demostrará a través de la implementación de dos problemas estadísticos que requerirán cantidades significativas de cómputo, y que usarán datos simulados en el mismo formato en el que se producirán en el archivo de la misión Gaia (la primera versión de datos recogidos por la misión está disponible desde el día 14 de Septiembre de 2016). Estos ambiciosos problemas representan el Gran Reto (Grand Challenge), un nombre grandilocuente que consiste en inferir una serie de parámetros desde un punto de vista probabilístico para la función de masa inicial (Initial Mass Function) y la tasa de formación estelar (Star Formation Rate) dado un conjunto de estrellas (con una muestra grande), desde estimaciones con ruido de sus masas y edades respectivamente. Esto se abordará utilizando modelos jerárquicos bayesianos (Hierarchical Bayesian Modeling). Enprincipio,losmodelospropuestos pueden incorporar otros modelos de evolución estelar para inferir directamente la función de masa inicial y la tasa de formación estelar, pero en este primer paso presentado en esta tesis, empezaremos con un objetivo algo menos ambicioso: la inferencia de la función de masa y distribución de edades actual (Present-Day Mass Function y Present-Day Age Distribution respectivamente). Además, se llevará a cabo el análisis de rendimiento y escalabilidad para probar la idoneidad de la implementación de dichos modelos dadas las enormes cantidades de datos que estarán disponibles en el archivo de la misión Gaia...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Acta Cybernetica : Volume 17. Number 3.

    Get PDF
    • …
    corecore