2,204 research outputs found
Folding@home: achievements from over twenty years of citizen science herald the exascale era
Simulations of biomolecules have enormous potential to inform our
understanding of biology but require extremely demanding calculations. For over
twenty years, the Folding@home distributed computing project has pioneered a
massively parallel approach to biomolecular simulation, harnessing the
resources of citizen scientists across the globe. Here, we summarize the
scientific and technical advances this perspective has enabled. As the
project's name implies, the early years of Folding@home focused on driving
advances in our understanding of protein folding by developing statistical
methods for capturing long-timescale processes and facilitating insight into
complex dynamical processes. Success laid a foundation for broadening the scope
of Folding@home to address other functionally relevant conformational changes,
such as receptor signaling, enzyme dynamics, and ligand binding. Continued
algorithmic advances, hardware developments such as GPU-based computing, and
the growing scale of Folding@home have enabled the project to focus on new
areas where massively parallel sampling can be impactful. While previous work
sought to expand toward larger proteins with slower conformational changes, new
work focuses on large-scale comparative studies of different protein sequences
and chemical compounds to better understand biology and inform the development
of small molecule drugs. Progress on these fronts enabled the community to
pivot quickly in response to the COVID-19 pandemic, expanding to become the
world's first exascale computer and deploying this massive resource to provide
insight into the inner workings of the SARS-CoV-2 virus and aid the development
of new antivirals. This success provides a glimpse of what's to come as
exascale supercomputers come online, and Folding@home continues its work.Comment: 24 pages, 6 figure
Recommended from our members
Accelerating Materials Discovery with Machine Learning
As we enter the data age, ever-increasing amounts of human knowledge are being recorded in machine-readable formats.
This has opened up new opportunities to leverage data to accelerate scientific discovery.
This thesis focuses on how we can use historical and computational data to aid the discovery and development of new materials.
We begin by looking at a traditional materials informatics task -- elucidating the structure-function relationships of high-temperature cuprate superconductors.
One of the most significant challenges for materials informatics is the limited availability of relevant data.
We propose a simple calibration-based approach to estimate the apical and in-plane copper-oxygen distances from more readily available lattice parameter data to address this challenge for cuprate superconductors.
Our investigation uncovers a large, unexplored region of materials space that may yield cuprates with higher critical temperatures.
We propose two experimental avenues that may enable this region to be accessed.
Computational materials exploration is bottle-necked by our ability to provide input structures to feed our workflows.
Whilst \textit{ab-intio} structure identification is possible, it is computationally burdensome and we lack design rules for deciding where to target searches in high-throughput setups.
To address this, there is a need to develop tools that suggest promising candidates, enabling automated deployment and increased efficiency.
Machine learning models are well suited to this task, however, current approaches typically use hand-engineered inputs.
This means that their performance is circumscribed by the intuitions reflected in the chosen inputs.
We propose a novel way to formulate the machine learning task as a set regression problem over the elements in a material.
We show that our approach leads to higher sample efficiency than other well-established composition-based approaches.
Having demonstrated the ability of machine learning to aid in the selection of promising compound compositions, we next explore how useful machine learning might be for identifying fabrication routes.
Using a recently released data-mined data set of solid-state synthesis reactions, we design a two-stage model to predict the products of inorganic reactions.
We critically explore the performance of this model, showing that whilst the predictions fall short of the accuracy required to be chemically discriminative, the model provides valuable insights into understanding inorganic reactions.
Through careful investigation of the model's failure modes, we explore the challenges that remain in the construction of forward inorganic reaction prediction models and suggest some pathways to tackle the identified issues.
One of the principal ways that material scientists understand and categorise materials is in terms of their symmetries.
Crystal structure prototypes are assigned based on the presence of symmetrically equivalent sites known as Wyckoff positions.
We show that a powerful coarse-grained representation of materials structures can be constructed from the Wyckoff positions by discarding information about their coordinates within crystal structures.
One of the strengths of this representation is that it maintains the ability of structure-based methods to distinguish polymorphs whilst also allowing combinatorial enumeration akin to composition-based approaches.
We construct an end-to-end differentiable model that takes our proposed Wyckoff representation as input.
The performance of this approach is examined on a suite of materials discovery experiments showing that it leads to strong levels of enrichment in materials discovery tasks.
The research presented in this thesis highlights the promise of applying data-driven workflows and machine learning in materials discovery and development.
This thesis concludes by speculating about promising research directions for applying machine learning within materials discovery
Mechanotransduction and Ion Transport of the Endothelial Glycocalyx: A Large-Scale Molecular Dynamics Study
In our vessels, the endothelial glycocalyx is the first and foremost barrier directly exposed to the blood in the lumen. The functions of the normal endothelial glycocalyx under physiological conditions are widely accepted as a physical barrier to prevent the abnormal transportation of blood components (e.g. ions, proteins, albumin and etc.) and a mechanosensor and mechanotransducer to sense and transmit mechanical signals from the blood flow to cytoplasm. In this study, a series of large-scale molecular dynamics simulations were undertaken to study atomic events of the endothelial glycocalyx layers interacting with flow. This research is a pioneer study in which flow in the physiologically relevant range is accomplished based on an atomistic model of the glycocalyx with the to-date and detailed structural information. The coupled dynamics of flow and endothelial glycocalyx show that the glycocalyx constituents swing and swirl when the flow passes by. The active motion of the glycocalyx, as a result, disturbs the flow by modifying the velocity distributions. The glycocalyx also controls the emergence of strong shear stresses. Moreover, flow regime on complex surface was proposed based on results from a series of cases with varying surface configurations and flow velocities. Based on the dynamics of subdomains of the glycocalyx core protein, mechanism for mechanotransduction of the endothelial glycocalyx was established. The force from blood flow shear stress is transmitted via a scissor-like motion alongside the bending of the core protein with an order of magnitude of 10~ 100 pN. Finally, the mechanism of flow impact on ion transport was investigated and improved Starling principle was proposed. The flow modifies sugar chain conformations and transfers momentum to ions. The conformational changes of sugar chains then affect the Na+/sugar-chain interactions. The effects of flow velocity on the interactions are non-linear. An estimation in accordance to the improved Starling principle suggests that a physiological flow changes the osmotic part of Na+ transport by 8% compared with stationary transport
Advances in Artificial Intelligence: Models, Optimization, and Machine Learning
The present book contains all the articles accepted and published in the Special Issue “Advances in Artificial Intelligence: Models, Optimization, and Machine Learning” of the MDPI Mathematics journal, which covers a wide range of topics connected to the theory and applications of artificial intelligence and its subfields. These topics include, among others, deep learning and classic machine learning algorithms, neural modelling, architectures and learning algorithms, biologically inspired optimization algorithms, algorithms for autonomous driving, probabilistic models and Bayesian reasoning, intelligent agents and multiagent systems. We hope that the scientific results presented in this book will serve as valuable sources of documentation and inspiration for anyone willing to pursue research in artificial intelligence, machine learning and their widespread applications
Understanding space weather to shield society: A global road map for 2015-2025 commissioned by COSPAR and ILWS
There is a growing appreciation that the environmental conditions that we
call space weather impact the technological infrastructure that powers the
coupled economies around the world. With that comes the need to better shield
society against space weather by improving forecasts, environmental
specifications, and infrastructure design. [...] advanced understanding of
space weather requires a coordinated international approach to effectively
provide awareness of the processes within the Sun-Earth system through
observation-driven models. This roadmap prioritizes the scientific focus areas
and research infrastructure that are needed to significantly advance our
understanding of space weather of all intensities and of its implications for
society. Advancement of the existing system observatory through the addition of
small to moderate state-of-the-art capabilities designed to fill observational
gaps will enable significant advances. Such a strategy requires urgent action:
key instrumentation needs to be sustained, and action needs to be taken before
core capabilities are lost in the aging ensemble. We recommend advances through
priority focus (1) on observation-based modeling throughout the Sun-Earth
system, (2) on forecasts more than 12 hrs ahead of the magnetic structure of
incoming coronal mass ejections, (3) on understanding the geospace response to
variable solar-wind stresses that lead to intense geomagnetically-induced
currents and ionospheric and radiation storms, and (4) on developing a
comprehensive specification of space climate, including the characterization of
extreme space storms to guide resilient and robust engineering of technological
infrastructures. The roadmap clusters its implementation recommendations by
formulating three action pathways, and outlines needed instrumentation and
research programs and infrastructure for each of these. [...]Comment: In press for Advances of Space Research: an international roadmap on
the science of space weather, commissioned by COSPAR and ILWS (63 pages and 4
figures
The blessings of explainable AI in operations & maintenance of wind turbines
Wind turbines play an integral role in generating clean energy, but regularly suffer from operational inconsistencies and failures leading to unexpected downtimes and significant Operations & Maintenance (O&M) costs. Condition-Based Monitoring (CBM) has been utilised in the past to monitor operational inconsistencies in turbines by applying signal processing techniques to vibration data. The last decade has witnessed growing interest in leveraging Supervisory Control & Acquisition (SCADA) data from turbine sensors towards CBM. Machine Learning (ML) techniques have been utilised to predict incipient faults in turbines and forecast vital operational parameters with high accuracy by leveraging SCADA data and alarm logs. More recently, Deep Learning (DL) methods have outperformed conventional ML techniques, particularly for anomaly prediction. Despite demonstrating immense promise in transitioning to Artificial Intelligence (AI), such models are generally black-boxes that cannot provide rationales behind their predictions, hampering the ability of turbine operators to rely on automated decision making. We aim to help combat this challenge by providing a novel perspective on Explainable AI (XAI) for trustworthy decision support.This thesis revolves around three key strands of XAI – DL, Natural Language Generation (NLG) and Knowledge Graphs (KGs), which are investigated by utilising data from an operational turbine. We leverage DL and NLG to predict incipient faults and alarm events in the turbine in natural language as well as generate human-intelligible O&M strategies to assist engineers in fixing/averting the faults. We also propose specialised DL models which can predict causal relationships in SCADA features as well as quantify the importance of vital parameters leading to failures. The thesis finally culminates with an interactive Question- Answering (QA) system for automated reasoning that leverages multimodal domain-specific information from a KG, facilitating engineers to retrieve O&M strategies with natural language questions. By helping make turbines more reliable, we envisage wider adoption of wind energy sources towards tackling climate change
Fatias de rede fim-a-fim : da extração de perfis de funções de rede a SLAs granulares
Orientador: Christian Rodolfo Esteve RothenbergTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Nos últimos dez anos, processos de softwarização de redes vêm sendo continuamente diversi- ficados e gradativamente incorporados em produção, principalmente através dos paradigmas de Redes Definidas por Software (ex.: regras de fluxos de rede programáveis) e Virtualização de Funções de Rede (ex.: orquestração de funções virtualizadas de rede). Embasado neste processo o conceito de network slice surge como forma de definição de caminhos de rede fim- a-fim programáveis, possivelmente sobre infrastruturas compartilhadas, contendo requisitos estritos de desempenho e dedicado a um modelo particular de negócios. Esta tese investiga a hipótese de que a desagregação de métricas de desempenho de funções virtualizadas de rede impactam e compõe critérios de alocação de network slices (i.e., diversas opções de utiliza- ção de recursos), os quais quando realizados devem ter seu gerenciamento de ciclo de vida implementado de forma transparente em correspondência ao seu caso de negócios de comu- nicação fim-a-fim. A verificação de tal assertiva se dá em três aspectos: entender os graus de liberdade nos quais métricas de desempenho de funções virtualizadas de rede podem ser expressas; métodos de racionalização da alocação de recursos por network slices e seus re- spectivos critérios; e formas transparentes de rastrear e gerenciar recursos de rede fim-a-fim entre múltiplos domínios administrativos. Para atingir estes objetivos, diversas contribuições são realizadas por esta tese, dentre elas: a construção de uma plataforma para automatização de metodologias de testes de desempenho de funções virtualizadas de redes; a elaboração de uma metodologia para análises de alocações de recursos de network slices baseada em um algoritmo classificador de aprendizado de máquinas e outro algoritmo de análise multi- critério; e a construção de um protótipo utilizando blockchain para a realização de contratos inteligentes envolvendo acordos de serviços entre domínios administrativos de rede. Por meio de experimentos e análises sugerimos que: métricas de desempenho de funções virtualizadas de rede dependem da alocação de recursos, configurações internas e estímulo de tráfego de testes; network slices podem ter suas alocações de recursos coerentemente classificadas por diferentes critérios; e acordos entre domínios administrativos podem ser realizados de forma transparente e em variadas formas de granularidade por meio de contratos inteligentes uti- lizando blockchain. Ao final deste trabalho, com base em uma ampla discussão as perguntas de pesquisa associadas à hipótese são respondidas, de forma que a avaliação da hipótese proposta seja realizada perante uma ampla visão das contribuições e trabalhos futuros desta teseAbstract: In the last ten years, network softwarisation processes have been continuously diversified and gradually incorporated into production, mainly through the paradigms of Software Defined Networks (e.g., programmable network flow rules) and Network Functions Virtualization (e.g., orchestration of virtualized network functions). Based on this process, the concept of network slice emerges as a way of defining end-to-end network programmable paths, possibly over shared network infrastructures, requiring strict performance metrics associated to a par- ticular business case. This thesis investigate the hypothesis that the disaggregation of network function performance metrics impacts and composes a network slice footprint incurring in di- verse slicing feature options, which when realized should have their Service Level Agreement (SLA) life cycle management transparently implemented in correspondence to their fulfilling end-to-end communication business case. The validation of such assertive takes place in three aspects: the degrees of freedom by which performance of virtualized network functions can be expressed; the methods of rationalizing the footprint of network slices; and transparent ways to track and manage network assets among multiple administrative domains. In order to achieve such goals, a series of contributions were achieved by this thesis, among them: the construction of a platform for automating methodologies for performance testing of virtual- ized network functions; an elaboration of a methodology for the analysis of footprint features of network slices based on a machine learning classifier algorithm and a multi-criteria analysis algorithm; and the construction of a prototype using blockchain to carry out smart contracts involving service level agreements between administrative systems. Through experiments and analysis we suggest that: performance metrics of virtualized network functions depend on the allocation of resources, internal configurations and test traffic stimulus; network slices can have their resource allocations consistently analyzed/classified by different criteria; and agree- ments between administrative domains can be performed transparently and in various forms of granularity through blockchain smart contracts. At the end of his thesis, through a wide discussion we answer all the research questions associated to the investigated hypothesis in such way its evaluation is performed in face of wide view of the contributions and future work of this thesisDoutoradoEngenharia de ComputaçãoDoutor em Engenharia ElétricaFUNCAM
A Molecular View of Self-Assembly and Liquid-Liquid Phase Separation of Intrinsically Disordered Proteins
Intracellular compartmentalization of biomolecules into non-membrane-bound compartments, commonly referred to as membraneless organelles (MLOs), has been observed for over a century. The past decade has seen a massive surge of research interest on this topic due to evidence that a liquid-liquid phase separation (LLPS) process is responsible for the assembly of biomolecules into liquid-like compartments constituting MLOs. Since the initial discovery, dozens of cellular systems have been explored with this in mind, and have also been shown to have liquid-like properties, owing several unique functions to their liquid-like nature. MLOs such as stress granules may spontaneously form in response to cellular stress, while others such as the nucleolus may form multi-layer architectures that accelerate multi-step assembly processes, similar to an assembly line. The ability of biomolecules to undergo LLPS has been largely attributed to the presence of disordered proteins and nucleic acids. Intrinsically disordered proteins (IDPs) are proteins which lack a native, folded structure while remaining physiologically functional are able to promote LLPS due to their polymeric nature which allows for transient multivalent interactions between many amino acids. In this thesis, I work toward a greater understanding of the relationship between an IDP’s sequence, and its ability to undergo LLPS. Using a combination of all-atom and coarse-grained simulations, I make several important contributions of significant and general interest to the field of IDP-driven LLPS. I start by developing a coarse-grained modelling framework which explicitly represents amino acid sequences, and is the first of its kind to directly simulate phase coexistence of IDPs at sequence resolution. I then leverage this model to demonstrate the relationship between a single IDP chain, and a condensed phase of the same IDP, showing that one can predict conditions where LLPS will be possible, simply by observing the single-chain behavior and infinitely-dilute two-chain binding affinity. I then provide a rationalization of the thermoresponsive behavior of some proteins which undergo lower critical solution temperature (LCST) phase transitions, and how amino acid composition can lead to different thermoresponsive behaviors. Finally, I present atomic-resolution simulations showing the different interaction modes responsible for driving LLPS of two particular IDPs
HESS Opinions: Functional units: a novel framework to explore the link between spatial organization and hydrological functioning of intermediate scale catchments
This opinion paper proposes a novel framework for exploring how spatial organization alongside
with spatial heterogeneity controls functioning of intermediate scale catchments of organized
complexity. Key idea is that spatial organization in landscapes implies that functioning of
intermediate scale catchments is controlled by a hierarchy of functional units: hillslope scale
lead topologies and embedded elementary functional units (EFUs). We argue that similar soils and
vegetation communities and thus also soil structures "co-developed" within EFUs in an adaptive,
self-organizing manner as they have been exposed to similar flows of energy, water and nutrients
from the past to the present. Class members of the same EFU (class) are thus deemed to belong to
the same ensemble with respect to controls of the energy balance and related vertical flows of
capillary bounded soil water and heat. Class members of superordinate lead topologies are
characterized by the same spatially organized arrangement of EFUs along the gradient driving
lateral flows of free water as well as a similar surface and bedrock topography. We hence
postulate that they belong to the same ensemble with respect to controls on rainfall runoff
transformation and related vertical and lateral fluxes of free water. We expect class members of
these functional units to have a distinct way how their architecture controls the interplay of
state dynamics and integral flows, which is typical for all members of one class but dissimilar
among the classes. This implies that we might infer on the typical dynamic behavior of the most
important classes of EFU and lead topologies in a catchment, by thoroughly characterizing a few
members of each class. A major asset of the proposed framework, which steps beyond the concept of
hydrological response units, is that it can be tested experimentally. In this respect, we reflect
on suitable strategies based on stratified observations drawing from process hydrology, soil
physics, geophysics, ecology and remote sensing which are currently conducted in replicates of
candidate functional units in the Attert basin (Luxembourg), to search for typical and similar
functional and structural characteristics. A second asset of this framework is that it blueprints
a way towards a structurally more adequate model concept for water and energy cycles in
intermediate scale catchments, which balances necessary complexity with falsifiability. This is
because EFU and lead topologies are deemed to mark a hierarchy of "scale breaks" where
simplicity with respect to the energy balance and stream flow generation emerges from spatially organized
process-structure interactions. This offers the opportunity for simplified descriptions of these
processes that are nevertheless physically and thermodynamically consistent. In this respect we
reflect on a candidate model structure that (a) may accommodate distributed observations of states
and especially terrestrial controls on driving gradients to constrain the space of feasible model
structures and (b) allows testing the possible added value of organizing principles to understand
the role of spatial organization from an optimality perspective
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
- …