5,186 research outputs found
Modular lifelong machine learning
Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge.
Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand.
This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems.
First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures.
Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations.
Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods.
Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer
Bayesian Forecasting in Economics and Finance: A Modern Review
The Bayesian statistical paradigm provides a principled and coherent approach
to probabilistic forecasting. Uncertainty about all unknowns that characterize
any forecasting problem -- model, parameters, latent states -- is able to be
quantified explicitly, and factored into the forecast distribution via the
process of integration or averaging. Allied with the elegance of the method,
Bayesian forecasting is now underpinned by the burgeoning field of Bayesian
computation, which enables Bayesian forecasts to be produced for virtually any
problem, no matter how large, or complex. The current state of play in Bayesian
forecasting in economics and finance is the subject of this review. The aim is
to provide the reader with an overview of modern approaches to the field, set
in some historical context; and with sufficient computational detail given to
assist the reader with implementation.Comment: The paper is now published online at:
https://doi.org/10.1016/j.ijforecast.2023.05.00
Recommended from our members
Rigorous Experimentation For Reinforcement Learning
Scientific fields make advancements by leveraging the knowledge created by others to push the boundary of understanding. The primary tool in many fields for generating knowledge is empirical experimentation. Although common, generating accurate knowledge from empirical experiments is often challenging due to inherent randomness in execution and confounding variables that can obscure the correct interpretation of the results. As such, researchers must hold themselves and others to a high degree of rigor when designing experiments. Unfortunately, most reinforcement learning (RL) experiments lack this rigor, making the knowledge generated from experiments dubious. This dissertation proposes methods to address central issues in RL experimentation.
Evaluating the performance of an RL algorithm is the most common type of experiment in RL literature. Most performance evaluations are often incapable of answering a specific research question and produce misleading results. Thus, the first issue we address is how to create a performance evaluation procedure that holds up to scientific standards.
Despite the prevalence of performance evaluation, these types of experiments produce limited knowledge, e.g., they can only show how well an algorithm worked and not why, and they require significant amounts of time and computational resources. As an alternative, this dissertation proposes that scientific testing, the process of conducting carefully controlled experiments designed to further the knowledge and understanding of how an algorithm works, should be the primary form of experimentation.
Lastly, this dissertation provides a case study using policy gradient methods, showing how scientific testing can replace performance evaluation as the primary form of experimentation. As a result, this dissertation can motivate others in the field to adopt more rigorous experimental practices
Applications and Properties of Magnetic Nanoparticles
This Special Issue aimed to cover the new developments in the synthesis and characterization of magnetic nanoconstructs ranging from conventional metal oxide nanoparticles to novel molecule-based or hybrid multifunctional nano-objects. At the same time, the focus was on the potential of these novel magnetic nanoconstructs in several possible applications, e.g. sensing, energy storage, and nanomedicine
Modeling and Simulation in Engineering
The Special Issue Modeling and Simulation in Engineering, belonging to the section Engineering Mathematics of the Journal Mathematics, publishes original research papers dealing with advanced simulation and modeling techniques. The present book, “Modeling and Simulation in Engineering I, 2022”, contains 14 papers accepted after peer review by recognized specialists in the field. The papers address different topics occurring in engineering, such as ferrofluid transport in magnetic fields, non-fractal signal analysis, fractional derivatives, applications of swarm algorithms and evolutionary algorithms (genetic algorithms), inverse methods for inverse problems, numerical analysis of heat and mass transfer, numerical solutions for fractional differential equations, Kriging modelling, theory of the modelling methodology, and artificial neural networks for fault diagnosis in electric circuits. It is hoped that the papers selected for this issue will attract a significant audience in the scientific community and will further stimulate research involving modelling and simulation in mathematical physics and in engineering
Towards High-Accuracy Simulations of Strongly Correlated Materials Using Tensor Networks
Accurate and verifiable computation of the properties of real materials with strong electron correlation has been a long-standing challenge in the fields of chemistry, physics, and material science. Most existing algorithms suffer from either approximations that are too inaccurate, or fundamental computational complexity that is too high. In studies of simplified models of strongly-correlated materials, tensor network algorithms have demonstrated the potential to overcome these limitations. This thesis describes our research efforts to develop new algorithms for two-dimensional (2D) tensor networks that extend their range of applicability beyond simple models and toward simulations of realistic materials.
We begin by describing three algorithms for projected entangled-pair states (PEPS, a type of 2D tensor network) that address three of their major limitations: numerical stability, long-range interactions, and computational efficiency of operators. We first describe (Ch. 2) a technique for converting a PEPS into a canonical form. By generalizing the QR matrix factorization to entire columns of a PEPS, we approximately generate a PEPS with analogous properties to the well-studied canonical 1D tensor network. This connection enables enhanced numerical stability and ground state optimization protocols. Next, we describe (Ch. 3) a technique to efficiently represent physically realistic long-range interactions between particles in a 2D tensor network operator, a projected entangled-pair operator (PEPO). We express the long-range interaction as a linear combination of correlation functions of an auxiliary system with only nearest-neighbor interactions. This allows us to represent long-range pairwise interactions with linear scaling in the system size. The third algorithm we present (Ch. 4) is a method to rewrite the 2D PEPO in terms of a set of quasi-1D tensor network operators, by exploiting intrinsic redundancies in the PEPO representation. We also report an on-the-fly contraction algorithm using these operators that allows for a significant reduction in computational complexity, enabling larger scale simulations of more complex problems.
We then move on to describe (Ch. 5) an extensive study of a "synthetic 2D material"---a two-dimensional square array of ultracold Rydberg atoms---enabled by some of the new algorithms. We investigate the ground state quantum phases of this system in the bulk and on large finite arrays directly comparable to recent quantum simulation experiments. We find a greatly altered phase diagram compared to earlier numerical and experimental studies, and in particular, we uncover an unexpected entangled nematic phase that appears in the absence of geometric frustration.
Finally, we finish by describing (Ch. 6) a somewhat unrelated, but topically similar project in which we investigate the feasibility of laser cooling small molecules with two metal atoms to ultracold temperatures. We study in detail the properties of the molecules YbCCCa and YbCCAl for application in precision measurement experiments.</p
Production Optimization Indexed to the Market Demand Through Neural Networks
Connectivity, mobility and real-time data analytics are the prerequisites for a new model of intelligent
production management that facilitates communication between machines, people and
processes and uses technology as the main driver.
Many works in the literature treat maintenance and production management in separate approaches,
but there is a link between these areas, with maintenance and its actions aimed at ensuring the
smooth operation of equipment to avoid unnecessary downtime in production.
With the advent of technology, companies are rushing to solve their problems by resorting to technologies
in order to fit into the most advanced technological concepts, such as industries 4.0 and
5.0, which are based on the principle of process automation. This approach brings together database
technologies, making it possible to monitor the operation of equipment and have the opportunity
to study patterns of data behavior that can alert us to possible failures.
The present thesis intends to forecast the pulp production indexed to the stock market value.The
forecast will be made by means of the pulp production variables of the presses and the stock exchange
variables supported by artificial intelligence (AI) technologies, aiming to achieve an effective
planning. To support the decision of efficient production management, in this thesis algorithms
were developed and validated with from five pulp presses, as well as data from other sources, such
as steel production and stock exchange, which were relevant to validate the robustness of the model.
This thesis demonstrated the importance of data processing methods and that they have great relevance
in the model input since they facilitate the process of training and testing the models. The
chosen technologies demonstrated good efficiency and versatility in performing the prediction of
the values of the variables of the equipment, also demonstrating robustness and optimization in
computational processing. The thesis also presents proposals for future developments, namely
in further exploration of these technologies, so that there are market variables that can calibrate
production through forecasts supported on these same variables.Conectividade, mobilidade e análise de dados em tempo real são pré-requisitos para um novo
modelo de gestão inteligente da produção que facilita a comunicação entre máquinas, pessoas e
processos, e usa a tecnologia como motor principal.
Muitos trabalhos na literatura tratam a manutenção e a gestão da produção em abordagens separadas,
mas existe uma correlação entre estas áreas, sendo que a manutenção e as suas políticas
têm como premissa garantir o bom funcionamento dos equipamentos de modo a evitar paragens
desnecessárias na linha de produção.
Com o advento da tecnologia há uma corrida das empresas para solucionar os seus problemas
recorrendo às tecnologias, visando a sua inserção nos conceitos tecnológicos, mais avançados,
tais como as indústrias 4.0 e 5.0, as quais têm como princípio a automatização dos processos.
Esta abordagem junta as tecnologias de sistema de informação, sendo possível fazer o acompanhamento
do funcionamento dos equipamentos e ter a possibilidade de realizar o estudo de padrões
de comportamento dos dados que nos possam alertar para possíveis falhas.
A presente tese pretende prever a produção da pasta de papel indexada às bolsas de valores. A
previsão será feita por via das variáveis da produção da pasta de papel das prensas e das variáveis
da bolsa de valores suportadas em tecnologias de artificial intelligence (IA), tendo como objectivo
conseguir um planeamento eficaz. Para suportar a decisão de uma gestão da produção eficiente,
na presente tese foram desenvolvidos algoritmos, validados em dados de cinco prensas de pasta de
papel, bem como dados de outras fontes, tais como, de Produção de Aço e de Bolsas de Valores,
os quais se mostraram relevantes para a validação da robustez dos modelos.
A presente tese demonstrou a importância dos métodos de tratamento de dados e que os mesmos
têm uma grande relevância na entrada do modelo, visto que facilita o processo de treino e testes dos
modelos. As tecnologias escolhidas demonstraram uma boa eficiência e versatilidade na realização
da previsão dos valores das variáveis dos equipamentos, demonstrando ainda robustez e otimização
no processamento computacional.
A tese apresenta ainda propostas para futuros desenvolvimentos, designadamente na exploração
mais aprofundada destas tecnologias, de modo a que haja variáveis de mercado que possam calibrar
a produção através de previsões suportadas nestas mesmas variáveis
Recommended from our members
Rare-Event Estimation and Calibration for Large-Scale Stochastic Simulation Models
Stochastic simulation has been widely applied in many domains. More recently, however, the rapid surge of sophisticated problems such as safety evaluation of intelligent systems has posed various challenges to conventional statistical methods. Motivated by these challenges, in this thesis, we develop novel methodologies with theoretical guarantees and numerical applications to tackle them from different perspectives.
In particular, our works can be categorized into two areas: (1) rare-event estimation (Chapters 2 to 5) where we develop approaches to estimating the probabilities of rare events via simulation; (2) model calibration (Chapters 6 and 7) where we aim at calibrating the simulation model so that it is close to reality.
In Chapter 2, we study rare-event simulation for a class of problems where the target hitting sets of interest are defined via modern machine learning tools such as neural networks and random forests. We investigate an importance sampling scheme that integrates the dominating point machinery in large deviations and sequential mixed integer programming to locate the underlying dominating points. We provide efficiency guarantees and numerical demonstration of our approach.
In Chapter 3, we propose a new efficiency criterion for importance sampling, which we call probabilistic efficiency. Conventionally, an estimator is regarded as efficient if its relative error is sufficiently controlled. It is widely known that when a rare-event set contains multiple "important regions" encoded by the dominating points, importance sampling needs to account for all of them via mixing to achieve efficiency. We argue that the traditional analysis recipe could suffer from intrinsic looseness by using relative error as an efficiency criterion. Thus, we propose the new efficiency notion to tighten this gap. In particular, we show that under the standard Gartner-Ellis large deviations regime, an importance sampling that uses only the most significant dominating points is sufficient to attain this efficiency notion.
In Chapter 4, we consider the estimation of rare-event probabilities using sample proportions output by crude Monte Carlo. Due to the recent surge of sophisticated rare-event problems, efficiency-guaranteed variance reduction may face implementation challenges, which motivate one to look at naive estimators. In this chapter we construct confidence intervals for the target probability using this naive estimator from various techniques, and then analyze their validity as well as tightness respectively quantified by the coverage probability and relative half-width.
In Chapter 5, we propose the use of extreme value analysis, in particular the peak-over-threshold method which is popularly employed for extremal estimation of real datasets, in the simulation setting. More specifically, we view crude Monte Carlo samples as data to fit on a generalized Pareto distribution. We test this idea on several numerical examples. The results show that in the absence of efficient variance reduction schemes, it appears to offer potential benefits to enhance crude Monte Carlo estimates.
In Chapter 6, we investigate a framework to develop calibration schemes in parametric settings, which satisfies rigorous frequentist statistical guarantees via a basic notion that we call eligibility set designed to bypass non-identifiability via a set-based estimation. We investigate a feature extraction-then-aggregation approach to construct these sets that target at multivariate outputs. We demonstrate our methodology on several numerical examples, including an application to calibration of a limit order book market simulator.
In Chapter 7, we study a methodology to tackle the NASA Langley Uncertainty Quantification Challenge, a model calibration problem under both aleatory and epistemic uncertainties. Our methodology is based on an integration of distributionally robust optimization and importance sampling. The main computation machinery in this integrated methodology amounts to solving sampled linear programs. We present theoretical statistical guarantees of our approach via connections to nonparametric hypothesis testing, and numerical performances including parameter calibration and downstream decision and risk evaluation tasks
Chance-constrained generic energy storage operations under decision-dependent uncertainty
Compared with large-scale physical batteries, aggregated and coordinated generic energy storage (GES) resources provide low-cost, but uncertain, flexibility for power grid operations. While GES can be characterized by different types of uncertainty, the literature mostly focuses on decision-independent uncertainties (DIUs), such as exogenous stochastic disturbances caused by weather conditions. Instead, this manuscript focuses on newly-introduced decision-dependent uncertainties (DDUs) and considers an optimal GES dispatch that accounts for uncertain available state-of-charge (SoC) bounds that are affected by incentive signals and discomfort levels. To incorporate DDUs, we present a novel chance-constrained optimization (CCO) approach for the day-ahead economic dispatch of GES units. Two tractable methods are presented to solve the proposed CCO problem with DDUs: (i) a robust reformulation for general but incomplete distributions of DDUs, and (ii) an iterative algorithm for specific and known distributions of DDUs. Furthermore, reliability indices are introduced to verify the applicability of the proposed approach with respect to the reliability of the response of GES units. Simulation-based analysis shows that the proposed methods yield conservative, but credible, GES dispatch strategies and reduced penalty cost by incorporating DDUs in the constraints and leveraging data-driven parameter identification. This results in improved availability and performance of coordinated GES units
Some models are useful, but how do we know which ones? Towards a unified Bayesian model taxonomy
Probabilistic (Bayesian) modeling has experienced a surge of applications in
almost all quantitative sciences and industrial areas. This development is
driven by a combination of several factors, including better probabilistic
estimation algorithms, flexible software, increased computing power, and a
growing awareness of the benefits of probabilistic learning. However, a
principled Bayesian model building workflow is far from complete and many
challenges remain. To aid future research and applications of a principled
Bayesian workflow, we ask and provide answers for what we perceive as two
fundamental questions of Bayesian modeling, namely (a) "What actually is a
Bayesian model?" and (b) "What makes a good Bayesian model?". As an answer to
the first question, we propose the PAD model taxonomy that defines four basic
kinds of Bayesian models, each representing some combination of the assumed
joint distribution of all (known or unknown) variables (P), a posterior
approximator (A), and training data (D). As an answer to the second question,
we propose ten utility dimensions according to which we can evaluate Bayesian
models holistically, namely, (1) causal consistency, (2) parameter
recoverability, (3) predictive performance, (4) fairness, (5) structural
faithfulness, (6) parsimony, (7) interpretability, (8) convergence, (9)
estimation speed, and (10) robustness. Further, we propose two example utility
decision trees that describe hierarchies and trade-offs between utilities
depending on the inferential goals that drive model building and testing
- …