451 research outputs found
Toward Accessible Multilevel Modeling in Systems Biology: A Rule-based Language Concept
Promoted by advanced experimental techniques for obtaining high-quality data and the steadily accumulating knowledge about the complexity of life, modeling biological systems at multiple interrelated levels of organization attracts more and more attention recently. Current approaches for modeling multilevel systems typically lack an accessible formal modeling language or have major limitations with respect to expressiveness. The aim of this thesis is to provide a comprehensive discussion on associated problems and needs and to propose a concrete solution addressing them
Spatio-temporal Dynamics of the Wnt/beta-catenin Signaling Pathway: A Computational Systems Biology Approach
The Wnt/β-catenin signaling pathway is involved in human neural progenitor cell differentiation. This dissertation employs the cyclic workflow of computational systems biology to investigate the pathway spatio-temporal dynamics during differentiation. Quantitative in vitro analyses show biphasic kinetics of the pathway proteins. A computational model is developed to investigate in silico these kinetics in correlation with cell cycle and self-induced signaling. We show the importance of stochastic approach and suggest further experiments, hence closing the computational systems biology loop
Domain-specific languages for modeling and simulation
Simulation models and simulation experiments are increasingly complex. One way to handle this complexity is developing software languages tailored to specific application domains, so-called domain-specific languages (DSLs). This thesis explores the potential of employing DSLs in modeling and simulation. We study different DSL design and implementation techniques and illustrate their benefits for expressing simulation models as well as simulation experiments with several examples.Simulationsmodelle und -experimente werden immer komplexer. Eine Möglichkeit, dieser Komplexität zu begegnen, ist, auf bestimmte Anwendungsgebiete spezialisierte Softwaresprachen, sogenannte domänenspezifische Sprachen (\emph{DSLs, domain-specific languages}), zu entwickeln. Die vorliegende Arbeit untersucht, wie DSLs in der Modellierung und Simulation eingesetzt werden können. Wir betrachten verschiedene Techniken für Entwicklung und Implementierung von DSLs und illustrieren ihren Nutzen für das Ausdrücken von Simulationsmodellen und -experimenten anhand einiger Beispiele
The Attributed Pi Calculus
International audienceThe attributed pi calculus (pi(L)) forms an extension of the pi calculus with attributed processes and attribute dependent synchronization. To ensure flexibility, the calculus is parametrized with the language L which defines possible values of attributes. pi(L) can express polyadic synchronization as in pi@ and thus diverse compartment organizations. A non-deterministic and a stochastic semantics, where rates may depend on attribute values, is introduced. The stochastic semantics is based on continuous time Markov chains. A simulation algorithm is developed which is firmly rooted in this stochastic semantics. Two examples, the movement processes in the phototaxis of Euglena and the cooperative binding in the gene regulation of the lambda Phage, underline the applicability of pi(L) to systems biology
Mathematical models of cellular signaling and supramolecular self-assembly
Synthetic biologists endeavor to predict how the increasing complexity of multi-step signaling cascades impacts the fidelity of molecular signaling, whereby cellular state information is often transmitted with proteins diffusing by a pseudo-one-dimensional stochastic process. We address this problem by using a one-dimensional drift-diffusion model to derive an approximate lower bound on the degree of facilitation needed to achieve single-bit informational efficiency in signaling cascades as a function of their length. We find that a universal curve of the Shannon-Hartley form describes the information transmitted by a signaling chain of arbitrary length and depends upon only a small number of physically measurable parameters. This enables our model to be used in conjunction with experimental measurements to aid in the selective design of biomolecular systems.
Another important concept in the cellular world is molecular self-assembly. As manipulating the self-assembly of supramolecular and nanoscale constructs at the single-molecule level increasingly becomes the norm, new theoretical scaffolds must be erected to replace the classical thermodynamic and kinetics-based models. The models we propose use state probabilities as its fundamental objects and directly model the transition probabilities between the initial and final states of a trajectory. We leverage these probabilities in the context of molecular self-assembly to compute the overall likelihood that a specified experimental condition leads to a desired structural outcome. We also investigated a larger complex self-assembly system, the heterotypic interactions between amyloid-beta and fatty acids by an independent ensemble kinetic simulation using an underlying differential equation-based system which was validated by biophysical experiments
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
In recent years, groundbreaking advancements in natural language processing
have culminated in the emergence of powerful large language models (LLMs),
which have showcased remarkable capabilities across a vast array of domains,
including the understanding, generation, and translation of natural language,
and even tasks that extend beyond language processing. In this report, we delve
into the performance of LLMs within the context of scientific discovery,
focusing on GPT-4, the state-of-the-art language model. Our investigation spans
a diverse range of scientific areas encompassing drug discovery, biology,
computational chemistry (density functional theory (DFT) and molecular dynamics
(MD)), materials design, and partial differential equations (PDE). Evaluating
GPT-4 on scientific tasks is crucial for uncovering its potential across
various research domains, validating its domain-specific expertise,
accelerating scientific progress, optimizing resource allocation, guiding
future model development, and fostering interdisciplinary research. Our
exploration methodology primarily consists of expert-driven case assessments,
which offer qualitative insights into the model's comprehension of intricate
scientific concepts and relationships, and occasionally benchmark testing,
which quantitatively evaluates the model's capacity to solve well-defined
domain-specific problems. Our preliminary exploration indicates that GPT-4
exhibits promising potential for a variety of scientific applications,
demonstrating its aptitude for handling complex problem-solving and knowledge
integration tasks. Broadly speaking, we evaluate GPT-4's knowledge base,
scientific understanding, scientific numerical calculation abilities, and
various scientific prediction capabilities.Comment: 230 pages report; 181 pages for main content
Simulation Intelligence: Towards a New Generation of Scientific Methods
The original "Seven Motifs" set forth a roadmap of essential methods for the
field of scientific computing, where a motif is an algorithmic method that
captures a pattern of computation and data movement. We present the "Nine
Motifs of Simulation Intelligence", a roadmap for the development and
integration of the essential algorithms necessary for a merger of scientific
computing, scientific simulation, and artificial intelligence. We call this
merger simulation intelligence (SI), for short. We argue the motifs of
simulation intelligence are interconnected and interdependent, much like the
components within the layers of an operating system. Using this metaphor, we
explore the nature of each layer of the simulation intelligence operating
system stack (SI-stack) and the motifs therein: (1) Multi-physics and
multi-scale modeling; (2) Surrogate modeling and emulation; (3)
Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based
modeling; (6) Probabilistic programming; (7) Differentiable programming; (8)
Open-ended optimization; (9) Machine programming. We believe coordinated
efforts between motifs offers immense opportunity to accelerate scientific
discovery, from solving inverse problems in synthetic biology and climate
science, to directing nuclear energy experiments and predicting emergent
behavior in socioeconomic settings. We elaborate on each layer of the SI-stack,
detailing the state-of-art methods, presenting examples to highlight challenges
and opportunities, and advocating for specific ways to advance the motifs and
the synergies from their combinations. Advancing and integrating these
technologies can enable a robust and efficient hypothesis-simulation-analysis
type of scientific method, which we introduce with several use-cases for
human-machine teaming and automated science
Text Mining for Pathway Curation
Biolog:innen untersuchen häufig Pathways, Netzwerke von Interaktionen zwischen Proteinen und Genen mit einer spezifischen Funktion. Neue Erkenntnisse über Pathways werden in der Regel zunächst in Publikationen veröffentlicht und dann in strukturierter Form in Lehrbüchern, Datenbanken oder mathematischen Modellen weitergegeben. Deren Kuratierung kann jedoch aufgrund der hohen Anzahl von Publikationen sehr aufwendig sein. In dieser Arbeit untersuchen wir wie Text Mining Methoden die Kuratierung unterstützen können. Wir stellen PEDL vor, ein Machine-Learning-Modell zur Extraktion von Protein-Protein-Assoziationen (PPAs) aus biomedizinischen Texten. PEDL verwendet Distant Supervision und vortrainierte Sprachmodelle, um eine höhere Genauigkeit als vergleichbare Methoden zu erreichen. Eine Evaluation durch Expert:innen bestätigt die Nützlichkeit von PEDLs für Pathway-Kurator:innen. Außerdem stellen wir PEDL+ vor, ein Kommandozeilen-Tool, mit dem auch Nicht-Expert:innen PPAs effizient extrahieren können. Drei Kurator:innen bewerten 55,6 % bis 79,6 % der von PEDL+ gefundenen PPAs als nützlich für ihre Arbeit. Die große Anzahl von PPAs, die durch Text Mining identifiziert werden, kann für Forscher:innen überwältigend sein. Um hier Abhilfe zu schaffen, stellen wir PathComplete vor, ein Modell, das nützliche Erweiterungen eines Pathways vorschlägt. Es ist die erste Pathway-Extension-Methode, die auf überwachtem maschinellen Lernen basiert. Unsere Experimente zeigen, dass PathComplete wesentlich genauer ist als existierende Methoden. Schließlich schlagen wir eine Methode vor, um Pathways mit komplexen Ereignisstrukturen zu erweitern. Hier übertrifft unsere neue Methode zur konditionalen Graphenmodifikation die derzeit beste Methode um 13-24% Genauigkeit in drei Benchmarks. Insgesamt zeigen unsere Ergebnisse, dass Deep Learning basierte Informationsextraktion eine vielversprechende Grundlage für die Unterstützung von Pathway-Kurator:innen ist.Biological knowledge often involves understanding the interactions between molecules, such as proteins and genes, that form functional networks called pathways. New knowledge about pathways is typically communicated through publications and later condensed into structured formats such as textbooks, pathway databases or mathematical models. However, curating updated pathway models can be labour-intensive due to the growing volume of publications. This thesis investigates text mining methods to support pathway curation. We present PEDL (Protein-Protein-Association Extraction with Deep Language Models), a machine learning model designed to extract protein-protein associations (PPAs) from biomedical text. PEDL uses distant supervision and pre-trained language models to achieve higher accuracy than the state of the art. An expert evaluation confirms its usefulness for pathway curators. We also present PEDL+, a command-line tool that allows non-expert users to efficiently extract PPAs. When applied to pathway curation tasks, 55.6% to 79.6% of PEDL+ extractions were found useful by curators. The large number of PPAs identified by text mining can be overwhelming for researchers. To help, we present PathComplete, a model that suggests potential extensions to a pathway. It is the first method based on supervised machine learning for this task, using transfer learning from pathway databases. Our evaluations show that PathComplete significantly outperforms existing methods. Finally, we generalise pathway extension from PPAs to more realistic complex events. Here, our novel method for conditional graph modification outperforms the current best by 13-24% accuracy on three benchmarks. We also present a new dataset for event-based pathway extension.
Overall, our results show that deep learning-based information extraction is a promising basis for supporting pathway curators
- …