3,178 research outputs found

    D-CIPHER: Discovery of Closed-form Partial Differential Equations

    Full text link
    Closed-form differential equations, including partial differential equations and higher-order ordinary differential equations, are one of the most important tools used by scientists to model and better understand natural phenomena. Discovering these equations directly from data is challenging because it requires modeling relationships between various derivatives that are not observed in the data (equation-data mismatch) and it involves searching across a huge space of possible equations. Current approaches make strong assumptions about the form of the equation and thus fail to discover many well-known systems. Moreover, many of them resolve the equation-data mismatch by estimating the derivatives, which makes them inadequate for noisy and infrequently sampled systems. To this end, we propose D-CIPHER, which is robust to measurement artifacts and can uncover a new and very general class of differential equations. We further design a novel optimization procedure, CoLLie, to help D-CIPHER search through this class efficiently. Finally, we demonstrate empirically that it can discover many well-known equations that are beyond the capabilities of current methods.Comment: To appear in the Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023

    Mining Explicit and Implicit Relationships in Data Using Symbolic Regression

    Full text link
    Identification of implicit and explicit relations within observed data is a generic problem commonly encountered in several domains including science, engineering, finance, and more. It forms the core component of data analytics, a process of discovering useful information from data sets that are potentially huge and otherwise incomprehensible. In industries, such information is often instrumental for profitable decision making, whereas in science and engineering it is used to build empirical models, propose new or verify existing theories and explain natural phenomena. In recent times, digital and internet based technologies have proliferated, making it viable to generate and collect large amount of data at low cost. This inturn has resulted in an ever growing need for methods to analyse and draw interpretations from such data quickly and reliably. With this overarching goal, this thesis attempts to make contributions towards developing accurate and efficient methods for discovering such relations through evolutionary search, a method commonly referred to as Symbolic Regression (SR). A data set of input variables x and a corresponding observed response y is given. The aim is to find an explicit function y = f (x) or an implicit function f (x, y) = 0, which represents the data set. While seemingly simple, the problem is challenging for several reasons. Some of the conventional regression methods try to “guess” a functional form such as linear/quadratic/polynomial, and attempt to do a curve-fitting of the data to the equation, which may limit the possibility of discovering more complex relations, if they exist. On the other hand, there are meta-modelling techniques such as response surface method, Kriging, etc., that model the given data accurately, but provide a “black-box” predictor instead of an expression. Such approximations convey little or no insights about how the variables and responses are dependent on each other, or their relative contribution to the output. SR attempts to alleviate the above two extremes by providing a structure which evolves mathematical expressions instead of assuming them. Thus, it is flexible enough to represent the data, but at the same time provides useful insights instead of a black-box predictor. SR can be categorized as part of Explainable Artificial Intelligence and can contribute to Trustworthy Artificial Intelligence. The works proposed in this thesis aims to integrate the concept of “semantics” deeper into Genetic Programming (GP) and Evolutionary Feature Synthesis, which are the two algorithms usually employed for conducting SR. The semantics will be integrated into well-known components of the algorithms such as compactness, diversity, recombination, constant optimization, etc. The main contribution of this thesis is the proposal of two novel operators to generate expressions based on Linear Programming and Mixed Integer Programming with the aim of controlling the length of the discovered expressions without compromising on the accuracy. In the experiments, these operators are proven to be able to discover expressions with better accuracy and interpretability on many explicit and implicit benchmarks. Moreover, some applications of SR on real-world data sets are shown to demonstrate the practicality of the proposed approaches. Besides, in related to practical problems, how GP can be applied to effectively solve the Resource Constrained Scheduling Problems is also presented

    Discovering Causal Relations and Equations from Data

    Full text link
    Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventional studies in the system under study. With the advent of big data and the use of data-driven methods, causal and equation discovery fields have grown and made progress in computer science, physics, statistics, philosophy, and many applied fields. All these domains are intertwined and can be used to discover causal relations, physical laws, and equations from observational data. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is being revolutionised with the efficient exploitation of observational data, modern machine learning algorithms and the interaction with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.Comment: 137 page

    AI Hilbert: A New Paradigm for Scientific Discovery by Unifying Data and Background Knowledge

    Full text link
    The discovery of scientific formulae that parsimoniously explain natural phenomena and align with existing background theory is a key goal in science. Historically, scientists have derived natural laws by manipulating equations based on existing knowledge, forming new equations, and verifying them experimentally. In recent years, data-driven scientific discovery has emerged as a viable competitor in settings with large amounts of experimental data. Unfortunately, data-driven methods often fail to discover valid laws when data is noisy or scarce. Accordingly, recent works combine regression and reasoning to eliminate formulae inconsistent with background theory. However, the problem of searching over the space of formulae consistent with background theory to find one that fits the data best is not well-solved. We propose a solution to this problem when all axioms and scientific laws are expressible via polynomial equalities and inequalities and argue that our approach is widely applicable. We further model notions of minimal complexity using binary variables and logical constraints, solve polynomial optimization problems via mixed-integer linear or semidefinite optimization, and prove the validity of our scientific discoveries in a principled manner using Positivestellensatz certificates. Remarkably, the optimization techniques leveraged in this paper allow our approach to run in polynomial time with fully correct background theory, or non-deterministic polynomial (NP) time with partially correct background theory. We demonstrate that some famous scientific laws, including Kepler's Third Law of Planetary Motion, the Hagen-Poiseuille Equation, and the Radiated Gravitational Wave Power equation, can be derived in a principled manner from background axioms and experimental data.Comment: Slightly revised from version 1, in particular polished the figure

    Interactive Feature Extraction using Implicit Knowledge Elicitation : Application to Power System Expertise

    Get PDF
    Industrial systems such as power networks are continuously monitored by human experts who quickly identify potentially dangerous situations by their experience. As current energy trends increase the complexity of day-to-day grid operations, it becomes necessary to assist experts in their monitoring tasks. This paper proposes an interactive approach to create human-readable analytical expressions that describe physical phenomena by their most impacting quantities. We present an interactive platform that brings experts in the training loop to guide the expression search using their expertise. It uses an evolutionary approach based on Probabilistic Grammar Guided Genetic Programming with expertly created and updated grammars. Interactivity is multi-level: users can distill their knowledge both within and between evolutionary runs. We proposed two usage scenarios on a real-world dataset where the non-interactive algorithm either provides (case 1) or not (case 2) satisfactory solutions. We show improvements regarding the solution's precision (case 1) and complexity (case 2)

    Simulation Intelligence: Towards a New Generation of Scientific Methods

    Full text link
    The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science

    A Framework Based on Symbolic Regression Coupled with eXtended Physics-Informed Neural Networks for Gray-Box Learning of Equations of Motion from Data

    Full text link
    We propose a framework and an algorithm to uncover the unknown parts of nonlinear equations directly from data. The framework is based on eXtended Physics-Informed Neural Networks (X-PINNs), domain decomposition in space-time, but we augment the original X-PINN method by imposing flux continuity across the domain interfaces. The well-known Allen-Cahn equation is used to demonstrate the approach. The Frobenius matrix norm is used to evaluate the accuracy of the X-PINN predictions and the results show excellent performance. In addition, symbolic regression is employed to determine the closed form of the unknown part of the equation from the data, and the results confirm the accuracy of the X-PINNs based approach. To test the framework in a situation resembling real-world data, random noise is added to the datasets to mimic scenarios such as the presence of thermal noise or instrument errors. The results show that the framework is stable against significant amount of noise. As the final part, we determine the minimal amount of data required for training the neural network. The framework is able to predict the correct form and coefficients of the underlying dynamical equation when at least 50\% data is used for training

    GALAXY: A new hybrid MOEA for the Optimal Design of Water Distribution Systems

    Get PDF
    This is the final version of the article. Available from American Geophysical Union via the DOI in this record.The first author would like to appreciate the financial support given by both the University of Exeter and the China Scholarship Council (CSC) toward the PhD research. We also appreciate the three anonymous reviewers, who help improve the quality of this paper substantially. The source code of the latest versions of NSGA-II and ε-MOEA can be downloaded from the official website of Kanpur Genetic Algorithms Laboratory via http://www.iitk.ac.in/kangal/codes.shtml. The description of each benchmark problem used in this paper, including the input file of EPANET and the associated best-known Pareto front, can be accessed from the following link to the Centre for Water Systems (http://tinyurl.com/cwsbenchmarks/). GALAXY can be accessed via http://tinyurl.com/cws-galaxy

    Automated Telescience: Active Machine Learning Of Remote Dynamical Systems

    Full text link
    Automated science is an emerging field of research and technology that aims to extend the role of computers in science from a tool that stores and analyzes data to one that generates hypotheses and designs experiments. Despite the tremendous discoveries and advancements brought forth by the scientific method, it is a process that is fundamentally driven by human insight and ingenuity. Automated science aims to develop algorithms, protocols and design philosophies that are capable of automating the scientific process. This work presents advances the field of automated science and the specific contributions of this work fall into three categories: coevolutionary search methods and applications, inferring the underlying structure of dynamical systems, and remote controlled automated science. First, a collection of coevolutionary search methods and applications are presented. These approaches include: a method to reduce the computational overhead of evolutionary algorithms via trainer selection strategies in a rank predictor framework, an approach for optimal experiment design for nonparametric models using Shannon information, and an application of coevolutionary algorithms to infer kinematic poses from RGBD images. Second, three algorithms are presented that infer the underlying structure of dynamical systems: a method to infer discrete-continuous hybrid dynamical systems from unlabeled data, an approach to discovering ordinary differential equations of arbitrary order, and a principle to uncover the existence and dynamics of hidden state variables that correspond to physical quantities from nonlinear differential equations. All of these algorithms are able to uncover structure in an unsupervised manner without any prior domain knowledge. Third, a remote controlled, distributed system is demonstrated to autonomously generate scientific models by perturbing and observing a system in an intelligent fashion. By automating the components of physical experimentation, scientific modeling and experimental design, models of luminescent chemical reactions and multi-compartmental pharmacokinetic systems were discovered without any human intervention, which illustrates how a set of distributed machines can contribute scientific knowledge while scaling beyond geographic constraints

    Physics-informed learning of governing equations from scarce data

    Full text link
    Harnessing data to discover the underlying governing laws or equations that describe the behavior of complex physical systems can significantly advance our modeling, simulation and understanding of such systems in various science and engineering disciplines. This work introduces a novel physics-informed deep learning framework to discover governing partial differential equations (PDEs) from scarce and noisy data for nonlinear spatiotemporal systems. In particular, this approach seamlessly integrates the strengths of deep neural networks for rich representation learning, physics embedding, automatic differentiation and sparse regression to (1) approximate the solution of system variables, (2) compute essential derivatives, as well as (3) identify the key derivative terms and parameters that form the structure and explicit expression of the PDEs. The efficacy and robustness of this method are demonstrated, both numerically and experimentally, on discovering a variety of PDE systems with different levels of data scarcity and noise accounting for different initial/boundary conditions. The resulting computational framework shows the potential for closed-form model discovery in practical applications where large and accurate datasets are intractable to capture.Comment: 46 pages; 1 table, 6 figures and 3 extended data figures in main text; 2 tables and 12 figures in supplementary informatio