3,178 research outputs found
D-CIPHER: Discovery of Closed-form Partial Differential Equations
Closed-form differential equations, including partial differential equations
and higher-order ordinary differential equations, are one of the most important
tools used by scientists to model and better understand natural phenomena.
Discovering these equations directly from data is challenging because it
requires modeling relationships between various derivatives that are not
observed in the data (equation-data mismatch) and it involves searching across
a huge space of possible equations. Current approaches make strong assumptions
about the form of the equation and thus fail to discover many well-known
systems. Moreover, many of them resolve the equation-data mismatch by
estimating the derivatives, which makes them inadequate for noisy and
infrequently sampled systems. To this end, we propose D-CIPHER, which is robust
to measurement artifacts and can uncover a new and very general class of
differential equations. We further design a novel optimization procedure,
CoLLie, to help D-CIPHER search through this class efficiently. Finally, we
demonstrate empirically that it can discover many well-known equations that are
beyond the capabilities of current methods.Comment: To appear in the Proceedings of the 37th Conference on Neural
Information Processing Systems (NeurIPS 2023
Mining Explicit and Implicit Relationships in Data Using Symbolic Regression
Identification of implicit and explicit relations within observed data is a generic problem commonly encountered in several domains including science, engineering, finance, and more. It forms the core component of data analytics, a process of discovering useful information from data sets that are potentially huge and otherwise incomprehensible. In industries, such information is often instrumental for profitable decision making, whereas in science and engineering it is used to build empirical models, propose new or verify existing theories and explain natural phenomena. In recent times, digital and internet based technologies have proliferated, making it viable to generate and collect large amount of data at low cost. This inturn has resulted in an ever growing need for methods to analyse and draw interpretations from such data quickly and reliably. With this overarching goal, this thesis attempts to make contributions towards developing accurate and efficient methods for discovering such relations through evolutionary search, a method commonly referred to as Symbolic Regression (SR).
A data set of input variables x and a corresponding observed response y is given. The aim is to find an explicit function y = f (x) or an implicit function f (x, y) = 0, which represents the data set. While seemingly simple, the problem is challenging for several reasons. Some of the conventional regression methods try to “guess” a functional form such as linear/quadratic/polynomial, and attempt to do a curve-fitting of the data to the equation, which may limit the possibility of discovering more complex relations, if they exist. On the other hand, there are meta-modelling techniques such as response surface method, Kriging, etc., that model the given data accurately, but provide a “black-box” predictor instead of an expression. Such approximations convey little or no insights about how the variables and responses are dependent on each other, or their relative contribution to the output. SR attempts to alleviate the above two extremes by providing a structure which evolves mathematical expressions instead of assuming them. Thus, it is flexible enough to represent the data, but at the same time provides useful insights instead of a black-box predictor. SR can be categorized as part of Explainable Artificial Intelligence and can contribute to Trustworthy Artificial Intelligence.
The works proposed in this thesis aims to integrate the concept of “semantics” deeper into Genetic Programming (GP) and Evolutionary Feature Synthesis, which are the two algorithms usually employed for conducting SR. The semantics will be integrated into well-known components of the algorithms such as compactness, diversity, recombination, constant optimization, etc. The main contribution of this thesis is the proposal of two novel operators to generate expressions based on Linear Programming and Mixed Integer Programming with the aim of controlling the length of the discovered expressions without compromising on the accuracy. In the experiments, these operators are proven to be able to discover expressions with better accuracy and interpretability on many explicit and implicit benchmarks. Moreover, some applications of SR on real-world data sets are shown to demonstrate the practicality of the proposed approaches. Besides, in related to practical problems, how GP can be applied to effectively solve the Resource Constrained Scheduling Problems is also presented
Discovering Causal Relations and Equations from Data
Physics is a field of science that has traditionally used the scientific
method to answer questions about why natural phenomena occur and to make
testable models that explain the phenomena. Discovering equations, laws and
principles that are invariant, robust and causal explanations of the world has
been fundamental in physical sciences throughout the centuries. Discoveries
emerge from observing the world and, when possible, performing interventional
studies in the system under study. With the advent of big data and the use of
data-driven methods, causal and equation discovery fields have grown and made
progress in computer science, physics, statistics, philosophy, and many applied
fields. All these domains are intertwined and can be used to discover causal
relations, physical laws, and equations from observational data. This paper
reviews the concepts, methods, and relevant works on causal and equation
discovery in the broad field of Physics and outlines the most important
challenges and promising future lines of research. We also provide a taxonomy
for observational causal and equation discovery, point out connections, and
showcase a complete set of case studies in Earth and climate sciences, fluid
dynamics and mechanics, and the neurosciences. This review demonstrates that
discovering fundamental laws and causal relations by observing natural
phenomena is being revolutionised with the efficient exploitation of
observational data, modern machine learning algorithms and the interaction with
domain knowledge. Exciting times are ahead with many challenges and
opportunities to improve our understanding of complex systems.Comment: 137 page
AI Hilbert: A New Paradigm for Scientific Discovery by Unifying Data and Background Knowledge
The discovery of scientific formulae that parsimoniously explain natural
phenomena and align with existing background theory is a key goal in science.
Historically, scientists have derived natural laws by manipulating equations
based on existing knowledge, forming new equations, and verifying them
experimentally. In recent years, data-driven scientific discovery has emerged
as a viable competitor in settings with large amounts of experimental data.
Unfortunately, data-driven methods often fail to discover valid laws when data
is noisy or scarce. Accordingly, recent works combine regression and reasoning
to eliminate formulae inconsistent with background theory. However, the problem
of searching over the space of formulae consistent with background theory to
find one that fits the data best is not well-solved. We propose a solution to
this problem when all axioms and scientific laws are expressible via polynomial
equalities and inequalities and argue that our approach is widely applicable.
We further model notions of minimal complexity using binary variables and
logical constraints, solve polynomial optimization problems via mixed-integer
linear or semidefinite optimization, and prove the validity of our scientific
discoveries in a principled manner using Positivestellensatz certificates.
Remarkably, the optimization techniques leveraged in this paper allow our
approach to run in polynomial time with fully correct background theory, or
non-deterministic polynomial (NP) time with partially correct background
theory. We demonstrate that some famous scientific laws, including Kepler's
Third Law of Planetary Motion, the Hagen-Poiseuille Equation, and the Radiated
Gravitational Wave Power equation, can be derived in a principled manner from
background axioms and experimental data.Comment: Slightly revised from version 1, in particular polished the figure
Interactive Feature Extraction using Implicit Knowledge Elicitation : Application to Power System Expertise
Industrial systems such as power networks are continuously monitored by human experts who quickly identify potentially dangerous situations by their experience. As current energy trends increase the complexity of day-to-day grid operations, it becomes necessary to assist experts in their monitoring tasks. This paper proposes an interactive approach to create human-readable analytical expressions that describe physical phenomena by their most impacting quantities. We present an interactive platform that brings experts in the training loop to guide the expression search using their expertise. It uses an evolutionary approach based on Probabilistic Grammar Guided Genetic Programming with expertly created and updated grammars. Interactivity is multi-level: users can distill their knowledge both within and between evolutionary runs. We proposed two usage scenarios on a real-world dataset where the non-interactive algorithm either provides (case 1) or not (case 2) satisfactory solutions. We show improvements regarding the solution's precision (case 1) and complexity (case 2)
Simulation Intelligence: Towards a New Generation of Scientific Methods
The original "Seven Motifs" set forth a roadmap of essential methods for the
field of scientific computing, where a motif is an algorithmic method that
captures a pattern of computation and data movement. We present the "Nine
Motifs of Simulation Intelligence", a roadmap for the development and
integration of the essential algorithms necessary for a merger of scientific
computing, scientific simulation, and artificial intelligence. We call this
merger simulation intelligence (SI), for short. We argue the motifs of
simulation intelligence are interconnected and interdependent, much like the
components within the layers of an operating system. Using this metaphor, we
explore the nature of each layer of the simulation intelligence operating
system stack (SI-stack) and the motifs therein: (1) Multi-physics and
multi-scale modeling; (2) Surrogate modeling and emulation; (3)
Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based
modeling; (6) Probabilistic programming; (7) Differentiable programming; (8)
Open-ended optimization; (9) Machine programming. We believe coordinated
efforts between motifs offers immense opportunity to accelerate scientific
discovery, from solving inverse problems in synthetic biology and climate
science, to directing nuclear energy experiments and predicting emergent
behavior in socioeconomic settings. We elaborate on each layer of the SI-stack,
detailing the state-of-art methods, presenting examples to highlight challenges
and opportunities, and advocating for specific ways to advance the motifs and
the synergies from their combinations. Advancing and integrating these
technologies can enable a robust and efficient hypothesis-simulation-analysis
type of scientific method, which we introduce with several use-cases for
human-machine teaming and automated science
A Framework Based on Symbolic Regression Coupled with eXtended Physics-Informed Neural Networks for Gray-Box Learning of Equations of Motion from Data
We propose a framework and an algorithm to uncover the unknown parts of
nonlinear equations directly from data. The framework is based on eXtended
Physics-Informed Neural Networks (X-PINNs), domain decomposition in space-time,
but we augment the original X-PINN method by imposing flux continuity across
the domain interfaces. The well-known Allen-Cahn equation is used to
demonstrate the approach. The Frobenius matrix norm is used to evaluate the
accuracy of the X-PINN predictions and the results show excellent performance.
In addition, symbolic regression is employed to determine the closed form of
the unknown part of the equation from the data, and the results confirm the
accuracy of the X-PINNs based approach. To test the framework in a situation
resembling real-world data, random noise is added to the datasets to mimic
scenarios such as the presence of thermal noise or instrument errors. The
results show that the framework is stable against significant amount of noise.
As the final part, we determine the minimal amount of data required for
training the neural network. The framework is able to predict the correct form
and coefficients of the underlying dynamical equation when at least 50\% data
is used for training
GALAXY: A new hybrid MOEA for the Optimal Design of Water Distribution Systems
This is the final version of the article. Available from American Geophysical Union via the DOI in this record.The first author would like to appreciate the financial support given by both the University of Exeter and the China Scholarship Council (CSC) toward the PhD research. We also appreciate the three anonymous reviewers, who help improve the quality of this paper substantially. The source code of the latest versions of NSGA-II and ε-MOEA can be downloaded from the official website of Kanpur Genetic Algorithms Laboratory via http://www.iitk.ac.in/kangal/codes.shtml. The description of each benchmark problem used in this paper, including the input file of EPANET and the associated best-known Pareto front, can be accessed from the following link to the Centre for Water Systems (http://tinyurl.com/cwsbenchmarks/). GALAXY can be accessed via http://tinyurl.com/cws-galaxy
Automated Telescience: Active Machine Learning Of Remote Dynamical Systems
Automated science is an emerging field of research and technology that aims to extend the role of computers in science from a tool that stores and analyzes data to one that generates hypotheses and designs experiments. Despite the tremendous discoveries and advancements brought forth by the scientific method, it is a process that is fundamentally driven by human insight and ingenuity. Automated science aims to develop algorithms, protocols and design philosophies that are capable of automating the scientific process. This work presents advances the field of automated science and the specific contributions of this work fall into three categories: coevolutionary search methods and applications, inferring the underlying structure of dynamical systems, and remote controlled automated science. First, a collection of coevolutionary search methods and applications are presented. These approaches include: a method to reduce the computational overhead of evolutionary algorithms via trainer selection strategies in a rank predictor framework, an approach for optimal experiment design for nonparametric models using Shannon information, and an application of coevolutionary algorithms to infer kinematic poses from RGBD images. Second, three algorithms are presented that infer the underlying structure of dynamical systems: a method to infer discrete-continuous hybrid dynamical systems from unlabeled data, an approach to discovering ordinary differential equations of arbitrary order, and a principle to uncover the existence and dynamics of hidden state variables that correspond to physical quantities from nonlinear differential equations. All of these algorithms are able to uncover structure in an unsupervised manner without any prior domain knowledge. Third, a remote controlled, distributed system is demonstrated to autonomously generate scientific models by perturbing and observing a system in an intelligent fashion. By automating the components of physical experimentation, scientific modeling and experimental design, models of luminescent chemical reactions and multi-compartmental pharmacokinetic systems were discovered without any human intervention, which illustrates how a set of distributed machines can contribute scientific knowledge while scaling beyond geographic constraints
Physics-informed learning of governing equations from scarce data
Harnessing data to discover the underlying governing laws or equations that
describe the behavior of complex physical systems can significantly advance our
modeling, simulation and understanding of such systems in various science and
engineering disciplines. This work introduces a novel physics-informed deep
learning framework to discover governing partial differential equations (PDEs)
from scarce and noisy data for nonlinear spatiotemporal systems. In particular,
this approach seamlessly integrates the strengths of deep neural networks for
rich representation learning, physics embedding, automatic differentiation and
sparse regression to (1) approximate the solution of system variables, (2)
compute essential derivatives, as well as (3) identify the key derivative terms
and parameters that form the structure and explicit expression of the PDEs. The
efficacy and robustness of this method are demonstrated, both numerically and
experimentally, on discovering a variety of PDE systems with different levels
of data scarcity and noise accounting for different initial/boundary
conditions. The resulting computational framework shows the potential for
closed-form model discovery in practical applications where large and accurate
datasets are intractable to capture.Comment: 46 pages; 1 table, 6 figures and 3 extended data figures in main
text; 2 tables and 12 figures in supplementary informatio
- …