Search CORE

8 research outputs found

Efficient hybrid modeling and sorption model discovery for non-linear advection-diffusion-sorption systems: A systematic scientific machine learning approach

Author: Costa Erbet
Nogueira Idelfonso B. R.
Rackauckas Christopher
Rebello Carine M.
Ribeiro Ana Mafalda
Santana Vinicius V.
Publication venue
Publication date: 22/03/2023
Field of study

This study presents a systematic machine learning approach for creating efficient hybrid models and discovering sorption uptake models in non-linear advection-diffusion-sorption systems. It demonstrates an effective method to train these complex systems using gradientbased optimizers, adjoint sensitivity analysis, and JIT-compiled vector Jacobian products, combined with spatial discretization and adaptive integrators. Sparse and symbolic regression were employed to identify missing functions in the artificial neural network. The robustness of the proposed method was tested on an in-silico data set of noisy breakthrough curve observations of fixed-bed adsorption, resulting in a well-fitted hybrid model. The study successfully reconstructed sorption uptake kinetics using sparse and symbolic regression, and accurately predicted breakthrough curves using identified polynomials, highlighting the potential of the proposed framework for discovering sorption kinetic law structures.Comment: Preprint paper to be submitted soon in Elsevier Journa

arXiv.org e-Print Archive

A systematic grey-box modeling methodology via data reconciliation and SOS constrained regression

Author: Pitarch Pérez José Luis
Prada Moraga César de
Sala Antonio
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Producción CientíficaDeveloping the so-called grey box or hybrid models of limited complexity for process systems is the cornerstone in advanced control and real-time optimization routines. These models must be based on fundamental principles and customized with sub-models obtained from process experimental data. This allows the engineer to transfer the available process knowledge into a model. However, there is still a lack of a flexible but systematic methodology for grey-box modeling which ensures certain coherence of the experimental sub-models with the process physics. This paper proposes such a methodology based in data reconciliation (DR) and polynomial constrained regression. A nonlinear optimization of limited complexity is to be solved in the DR stage, whereas the proposed constrained regression is based in sum-of-squares (SOS) convex programming. It is shown how several desirable features on the polynomial regressors can be naturally enforced in this optimization framework. The goodnesses of the proposed methodology are illustrated through: (1) an academic example and (2) an industrial evaporation plant with real experimental data.Ministerio de Economía, Industria y Competitividad (grant DPI2016-81002-R

Repositorio Documental de la Universidad de Valladolid

Directory of Open Access Journals

Symbolic regression is NP-hard

Author: Pissis S. (Solon)
Virgolin M. (Marco)
Publication venue
Publication date: 25/10/2022
Field of study

Symbolic regression (SR) is the task of learning a model of data in the form of a mathematical expression. By their nature, SR models have the potential to be accurate and human-interpretable at the same time. Unfortunately, finding such models, i.e., performing SR, appears to be a computationally intensive task. Historically, SR has been tackled with heuristics such as greedy or genetic algorithms and, while some works have hinted at the possible hardness of SR, no proof has yet been given that SR is, in fact, NP-hard. This begs the question: Is there an exact polynomial-time algorithm to compute SR models? We provide evidence suggesting that the answer is probably negative by showing that SR is NP-hard

CWI's Institutional Repository

A new formulation for symbolic regression to identify physico-chemical laws from experimental data

Author: Cao L
Lapkin AA
Neumann P
Russo D
Vassiliadis VS
Publication venue: Chemical Engineering Journal
Publication date: 01/01/2020
Field of study

A modification to the mixed-integer nonlinear programming (MINLP) formulation for symbolic regression was proposed with the aim of identification of physical models from noisy experimental data. In the proposed formulation, a binary tree in which equations are represented as directed, acyclic graphs, is fully constructed for a pre-defined number of layers. The introduced modification results in the reduction in the number of required binary variables and removal of redundancy due to possible symmetry of the tree formulation. The formulation was tested using numerical models and was found to be more efficient than the previous literature example with respect to the numbers of predictor variables and training data points. The globally optimal search was extended to identify physical models and to cope with noise in the experimental data predictor variable. The methodology was proven to be successful in identifying the correct physical models describing the relationship between shear stress and shear rate for both Newtonian and non-Newtonian fluids, and simple kinetic laws of chemical reactions. Future work will focus on addressing the limitations of the present formulation and solver to enable extension of target problems to larger, more complex physical models.EPSRC EP/R009902/

Publikationsserver der RWTH Aachen University

Apollo (Cambridge)

AI Hilbert: A New Paradigm for Scientific Discovery by Unifying Data and Background Knowledge

Author: Cornelio Cristina
Cory-Wright Ryan
Dash Sanjeeb
Horesh Lior
Khadir Bachir El
Publication venue
Publication date: 23/09/2023
Field of study

The discovery of scientific formulae that parsimoniously explain natural phenomena and align with existing background theory is a key goal in science. Historically, scientists have derived natural laws by manipulating equations based on existing knowledge, forming new equations, and verifying them experimentally. In recent years, data-driven scientific discovery has emerged as a viable competitor in settings with large amounts of experimental data. Unfortunately, data-driven methods often fail to discover valid laws when data is noisy or scarce. Accordingly, recent works combine regression and reasoning to eliminate formulae inconsistent with background theory. However, the problem of searching over the space of formulae consistent with background theory to find one that fits the data best is not well-solved. We propose a solution to this problem when all axioms and scientific laws are expressible via polynomial equalities and inequalities and argue that our approach is widely applicable. We further model notions of minimal complexity using binary variables and logical constraints, solve polynomial optimization problems via mixed-integer linear or semidefinite optimization, and prove the validity of our scientific discoveries in a principled manner using Positivestellensatz certificates. Remarkably, the optimization techniques leveraged in this paper allow our approach to run in polynomial time with fully correct background theory, or non-deterministic polynomial (NP) time with partially correct background theory. We demonstrate that some famous scientific laws, including Kepler's Third Law of Planetary Motion, the Hagen-Poiseuille Equation, and the Radiated Gravitational Wave Power equation, can be derived in a principled manner from background axioms and experimental data.Comment: Slightly revised from version 1, in particular polished the figure

arXiv.org e-Print Archive

Coordinating industrial production and cogeneration systems to exploit electricity price fluctuations

Author: Pablos de la Fuente Cristian
Publication venue: 'Universidad de Valladolid'
Publication date: 01/01/2021
Field of study

Las fluctuaciones en el precio de la electricidad, procedentes de la aplicación de programas de respuesta de la demanda, son una oportunidad para que las industrias que cuenten con sistemas de cogeneración puedan reducir sus costes de producción mientras hacen que la red eléctrica sea más estable y segura en su conjunto. Dada la cantidad de factores involucrados y la dificultad que esto supone a la hora de tomar decisiones, en esta tesis se presenta una metodología basada en optimización dinámica que permite la gestión óptima de ambos sistemas y se aplica en simulación al caso de estudio de una industria azucarera. Como principales resultados, se ha obtenido que utilizando la metodología propuesta los costes variables de producción se pueden reducir hasta un 2.55% si se utiliza una tarifa por tramos típica, y en torno a un 5.41% si se utilizan los precios dados por el mercado eléctrico directamente.Departamento de Ingeniería de Sistemas y AutomáticaDoctorado en Ingeniería Industria

Repositorio Documental de la Universidad de Valladolid

Automated Knowledge Discovery using Neural Networks

Author: Panju Maysum
Publication venue: 'University of Waterloo'
Publication date: 20/05/2021
Field of study

The natural world is known to consistently abide by scientific laws that can be expressed concisely in mathematical terms, including differential equations. To understand the patterns that define these scientific laws, it is necessary to discover and solve these mathematical problems after making observations and collecting data on natural phenomena. While artificial neural networks are powerful black-box tools for automating tasks related to intelligence, the solutions we seek are related to the concise and interpretable form of symbolic mathematics. In this work, we focus on the idea of a symbolic function learner, or SFL. A symbolic function learner can be any algorithm that is able to produce a symbolic mathematical expression that aims to optimize a given objective function. By choosing different objective functions, the SFL can be tuned to handle different learning tasks. We present a model for an SFL that is based on neural networks and can be trained using deep learning. We then use this SFL to approach the computational task of automating discovery of scientific knowledge in three ways. We first apply our symbolic function learner as a tool for symbolic regression, a curve-fitting problem that has traditionally been approached using genetic evolution algorithms. We show that our SFL performs competitively in comparison to genetic algorithms and neural network regressors on a sample collection of regression instances. We also reframe the problem of learning differential equations as a task in symbolic regression, and use our SFL to rediscover some equations from classical physics from data. We next present a machine-learning based method for solving differential equations symbolically. When neural networks are used to solve differential equations, they usually produce solutions in the form of black-box functions that are not directly mathematically interpretable. We introduce a method for generating symbolic expressions to solve differential equations while leveraging deep learning training methods. Unlike existing methods, our system does not require learning a language model over symbolic mathematics, making it scalable, compact, and easily adaptable for a variety of tasks and configurations. The system is designed to always return a valid symbolic formula, generating a useful approximation when an exact analytic solution to a differential equation is not or cannot be found. We demonstrate through examples the way our method can be applied on a number of differential equations that are rooted in the natural sciences, often obtaining symbolic approximations that are useful or insightful. Furthermore, we show how the system can be effortlessly generalized to find symbolic solutions to other mathematical tasks, including integration and functional equations. We then introduce a novel method for discovering implicit relationships between variables in structured datasets in an unsupervised way. Rather than explicitly designating a causal relationship between input and output variables, our method finds mathematical relationships between variables without treating any variable as distinguished from any other. As a result, properties about the data itself can be discovered, rather than rules for predicting one variable from the others. We showcase examples of our method in the domain of geometry, demonstrating how we can re-discover famous geometric identities automatically from artificially generated data. In total, this thesis aims to strengthen the connection between neural networks and problems in symbolic mathematics. Our proposed SFL is the main tool that we show can be applied to a variety of tasks, including but not limited to symbolic regression. We show how using this approach to symbolic function learning paves the way for future developments in automated scientific knowledge discovery

University of Waterloo's Institutional Repository