Search CORE

2,624 research outputs found

Machine Learning for Fluid Mechanics

Author: Brunton Steven
Koumoutsakos Petros
Noack Bernd
Publication venue: 'Annual Reviews'
Publication date: 04/01/2020
Field of study

The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from field measurements, experiments and large-scale simulations at multiple spatiotemporal scales. Machine learning offers a wealth of techniques to extract information from data that could be translated into knowledge about the underlying fluid mechanics. Moreover, machine learning algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of machine learning for fluid mechanics. It outlines fundamental machine learning methodologies and discusses their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation. Machine learning provides a powerful information processing framework that can enrich, and possibly even transform, current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202

arXiv.org e-Print Archive

Advancing the Applicability of Reinforcement Learning to Autonomous Control

Author: Hans Alexander
Publication venue
Publication date: 13/10/2014
Field of study

Mit dateneffizientem Reinforcement Learning (RL) konnten beeindruckendeErgebnisse erzielt werden, z.B. für die Regelung von Gasturbinen. In derPraxis erfordert die Anwendung von RL jedoch noch viel manuelle Arbeit, wasbisher RL für die autonome Regelung untauglich erscheinen ließ. Dievorliegende Arbeit adressiert einige der verbleibenden Probleme, insbesonderein Bezug auf die Zuverlässigkeit der Policy-Erstellung. Es werden zunächst RL-Probleme mit diskreten Zustands- und Aktionsräumenbetrachtet. Für solche Probleme wird häufig ein MDP aus Beobachtungengeschätzt, um dann auf Basis dieser MDP-Schätzung eine Policy abzuleiten. DieArbeit beschreibt, wie die Schätzer-Unsicherheit des MDP in diePolicy-Erstellung eingebracht werden kann, um mit diesem Wissen das Risikoeiner schlechten Policy aufgrund einer fehlerhaften MDP-Schätzung zuverringern. Außerdem wird so effiziente Exploration sowie Policy-Bewertungermöglicht. Anschließend wendet sich die Arbeit Problemen mit kontinuierlichenZustandsräumen zu und konzentriert sich auf auf RL-Verfahren, welche aufFitted Q-Iteration (FQI) basieren, insbesondere Neural Fitted Q-Iteration(NFQ). Zwar ist NFQ sehr dateneffizient, jedoch nicht so zuverlässig, wie fürdie autonome Regelung nötig wäre. Die Arbeit schlägt die Verwendung vonEnsembles vor, um die Zuverlässigkeit von NFQ zu erhöhen. Es werden eine Reihevon Möglichkeiten der Ensemble-Nutzung entworfen und evaluiert. Bei allenbetrachteten RL-Problemen sorgen Ensembles für eine zuverlässigere Erstellungguter Policies. Im nächsten Schritt werden Möglichkeiten der Policy-Bewertung beikontinuierlichen Zustandsräumen besprochen. Die Arbeit schlägt vor, FittedPolicy Evaluation (FPE), eine Variante von FQI für Policy Evaluation, mitanderen Regressionsverfahren und/oder anderen Datensätzen zu kombinieren, umein Maß für die Policy-Qualität zu erhalten. Experimente zeigen, dassExtra-Tree-FPE ein realistisches Qualitätsmaß für NFQ-generierte Policies liefernkann. Schließlich kombiniert die Arbeit Ensembles und Policy-Bewertung, um mit sichändernden RL-Problemen umzugehen. Der wesentliche Beitrag ist das EvolvingEnsemble, dessen Policy sich langsam ändert, indem alte, untaugliche Policiesentfernt und neue hinzugefügt werden. Es zeigt sich, dass das EvolvingEnsemble deutlich besser funktioniert als einfachere Ansätze.With data-efficient reinforcement learning (RL) methods impressive resultscould be achieved, e.g., in the context of gas turbine control. However, inpractice the application of RL still requires much human intervention, whichhinders the application of RL to autonomous control. This thesis addressessome of the remaining problems, particularly regarding the reliability of thepolicy generation process. The thesis first discusses RL problems with discrete state and action spaces.In that context, often an MDP is estimated from observations. It is describedhow to incorporate the estimators' uncertainties into the policy generationprocess. This information can then be used to reduce the risk of obtaining apoor policy due to flawed MDP estimates. Moreover, it is discussed how to usethe knowledge of uncertainty for efficient exploration and the assessment ofpolicy quality without requiring the policy's execution. The thesis then moves on to continuous state problems and focuses on methodsbased on fitted Q-iteration (FQI), particularly neural fitted Q-iteration(NFQ). Although NFQ has proven to be very data-efficient, it is not asreliable as required for autonomous control. The thesis proposes to useensembles to increase reliability. Several ways of ensemble usage in an NFQcontext are discussed and evaluated on a number of benchmark domains. It showsthat in all considered domains with ensembles good policies can be producedmore reliably. Next, policy assessment in continuous domains is discussed. The thesisproposes to use fitted policy evaluation (FPE), an adaptation of FQI to policyevaluation, combined with a different function approximator and/or differentdataset to obtain a measure for policy quality. Results of experiments showthat extra-tree FPE, applied to policies generated by NFQ, produces valuefunctions that can well be used to reason about the true policy quality. Finally, the thesis combines ensembles and policy assessment to derive methodsthat can deal with changing environments. The major contribution is theevolving ensemble. The policy of the evolving ensemble changes slowly as newpolicies are added and old policies removed. It turns out that the evolvingensemble approaches work considerably better than simpler approaches likesingle policies learned with recent observations or simple ensembles

Digitale Bibliothek Thüringen

A Survey of Prediction and Classification Techniques in Multicore Processor Systems

Author: Ababei Cristinel
Moghaddam Milad Ghorbani
Publication venue: e-Publications@Marquette
Publication date: 01/05/2019
Field of study

In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

epublications@Marquette

Optimal Reinforcement Learning for Gaussian Systems

Author: Hennig Philipp
Publication venue
Publication date: 14/10/2011
Field of study

The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finite-dimensional projection gives an impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again, please note some nontrivial changes to exposition and interpretation of the results, in particular in Equation (9) and Eqs. 11-14. The algorithm and results have remained the same, but their theoretical interpretation has change

arXiv.org e-Print Archive

MPG.PuRe

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Author: Athey Susan
Brunskill Emma
Krishnamurthy Sanath Kumar
Zhan Ruohan
Publication venue
Publication date: 02/11/2023
Field of study

In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains understudied. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit setting, where a tuning parameter determines the weight placed on cumulative regret minimization (where we establish near-optimal minimax guarantees) versus simple regret minimization (where we establish state-of-the-art guarantees). Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings. This flexibility comes from constructing and relying on "conformal arm sets" (CASs). CASs provide a set of arms for every context, encompassing the context-specific optimal arm with a certain probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted with a negative result, which shows that no algorithm can achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees

arXiv.org e-Print Archive

Certified Reinforcement Learning with Logic Guidance

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 10/02/2020
Field of study

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

arXiv.org e-Print Archive