2,624 research outputs found
Machine Learning for Fluid Mechanics
The field of fluid mechanics is rapidly advancing, driven by unprecedented
volumes of data from field measurements, experiments and large-scale
simulations at multiple spatiotemporal scales. Machine learning offers a wealth
of techniques to extract information from data that could be translated into
knowledge about the underlying fluid mechanics. Moreover, machine learning
algorithms can augment domain knowledge and automate tasks related to flow
control and optimization. This article presents an overview of past history,
current developments, and emerging opportunities of machine learning for fluid
mechanics. It outlines fundamental machine learning methodologies and discusses
their uses for understanding, modeling, optimizing, and controlling fluid
flows. The strengths and limitations of these methods are addressed from the
perspective of scientific inquiry that considers data as an inherent part of
modeling, experimentation, and simulation. Machine learning provides a powerful
information processing framework that can enrich, and possibly even transform,
current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202
Advancing the Applicability of Reinforcement Learning to Autonomous Control
ï»żMit dateneffizientem Reinforcement Learning (RL) konnten
beeindruckendeErgebnisse erzielt werden, z.B. fĂŒr die Regelung von
Gasturbinen. In derPraxis erfordert die Anwendung von RL jedoch noch viel
manuelle Arbeit, wasbisher RL fĂŒr die autonome Regelung untauglich
erscheinen lieĂ. Dievorliegende Arbeit adressiert einige der verbleibenden
Probleme, insbesonderein Bezug auf die ZuverlÀssigkeit der
Policy-Erstellung.
Es werden zunÀchst RL-Probleme mit diskreten Zustands- und
AktionsrĂ€umenbetrachtet. FĂŒr solche Probleme wird hĂ€ufig ein MDP aus
BeobachtungengeschÀtzt, um dann auf Basis dieser MDP-SchÀtzung eine Policy
abzuleiten. DieArbeit beschreibt, wie die SchÀtzer-Unsicherheit des MDP in
diePolicy-Erstellung eingebracht werden kann, um mit diesem Wissen das
Risikoeiner schlechten Policy aufgrund einer fehlerhaften MDP-SchÀtzung
zuverringern. AuĂerdem wird so effiziente Exploration sowie
Policy-Bewertungermöglicht.
AnschlieĂend wendet sich die Arbeit Problemen mit
kontinuierlichenZustandsrÀumen zu und konzentriert sich auf auf
RL-Verfahren, welche aufFitted Q-Iteration (FQI) basieren, insbesondere
Neural Fitted Q-Iteration(NFQ). Zwar ist NFQ sehr dateneffizient, jedoch
nicht so zuverlĂ€ssig, wie fĂŒrdie autonome Regelung nötig wĂ€re. Die Arbeit
schlÀgt die Verwendung vonEnsembles vor, um die ZuverlÀssigkeit von NFQ zu
erhöhen. Es werden eine Reihevon Möglichkeiten der Ensemble-Nutzung
entworfen und evaluiert. Bei allenbetrachteten RL-Problemen sorgen
Ensembles fĂŒr eine zuverlĂ€ssigere Erstellungguter Policies.
Im nÀchsten Schritt werden Möglichkeiten der Policy-Bewertung
beikontinuierlichen ZustandsrÀumen besprochen. Die Arbeit schlÀgt vor,
FittedPolicy Evaluation (FPE), eine Variante von FQI fĂŒr Policy Evaluation,
mitanderen Regressionsverfahren und/oder anderen DatensÀtzen zu
kombinieren, umein MaĂ fĂŒr die Policy-QualitĂ€t zu erhalten. Experimente
zeigen, dassExtra-Tree-FPE ein realistisches QualitĂ€tsmaĂ fĂŒr
NFQ-generierte Policies liefernkann.
SchlieĂlich kombiniert die Arbeit Ensembles und Policy-Bewertung, um mit
sichÀndernden RL-Problemen umzugehen. Der wesentliche Beitrag ist das
EvolvingEnsemble, dessen Policy sich langsam Àndert, indem alte,
untaugliche Policiesentfernt und neue hinzugefĂŒgt werden. Es zeigt sich,
dass das EvolvingEnsemble deutlich besser funktioniert als einfachere
AnsÀtze.With data-efficient reinforcement learning (RL) methods impressive
resultscould be achieved, e.g., in the context of gas turbine control.
However, inpractice the application of RL still requires much human
intervention, whichhinders the application of RL to autonomous control.
This thesis addressessome of the remaining problems, particularly regarding
the reliability of thepolicy generation process.
The thesis first discusses RL problems with discrete state and action
spaces.In that context, often an MDP is estimated from observations. It is
describedhow to incorporate the estimators' uncertainties into the policy
generationprocess. This information can then be used to reduce the risk of
obtaining apoor policy due to flawed MDP estimates. Moreover, it is
discussed how to usethe knowledge of uncertainty for efficient exploration
and the assessment ofpolicy quality without requiring the policy's
execution.
The thesis then moves on to continuous state problems and focuses on
methodsbased on fitted Q-iteration (FQI), particularly neural fitted
Q-iteration(NFQ). Although NFQ has proven to be very data-efficient, it is
not asreliable as required for autonomous control. The thesis proposes to
useensembles to increase reliability. Several ways of ensemble usage in an
NFQcontext are discussed and evaluated on a number of benchmark domains. It
showsthat in all considered domains with ensembles good policies can be
producedmore reliably.
Next, policy assessment in continuous domains is discussed. The
thesisproposes to use fitted policy evaluation (FPE), an adaptation of FQI
to policyevaluation, combined with a different function approximator and/or
differentdataset to obtain a measure for policy quality. Results of
experiments showthat extra-tree FPE, applied to policies generated by NFQ,
produces valuefunctions that can well be used to reason about the true
policy quality.
Finally, the thesis combines ensembles and policy assessment to derive
methodsthat can deal with changing environments. The major contribution is
theevolving ensemble. The policy of the evolving ensemble changes slowly as
newpolicies are added and old policies removed. It turns out that the
evolvingensemble approaches work considerably better than simpler
approaches likesingle policies learned with recent observations or simple
ensembles
A Survey of Prediction and Classification Techniques in Multicore Processor Systems
In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems
Optimal Reinforcement Learning for Gaussian Systems
The exploration-exploitation trade-off is among the central challenges of
reinforcement learning. The optimal Bayesian solution is intractable in
general. This paper studies to what extent analytic statements about optimal
learning are possible if all beliefs are Gaussian processes. A first order
approximation of learning of both loss and dynamics, for nonlinear,
time-varying systems in continuous time and space, subject to a relatively weak
restriction on the dynamics, is described by an infinite-dimensional partial
differential equation. An approximate finite-dimensional projection gives an
impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again,
please note some nontrivial changes to exposition and interpretation of the
results, in particular in Equation (9) and Eqs. 11-14. The algorithm and
results have remained the same, but their theoretical interpretation has
change
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization
In many applications, e.g. in healthcare and e-commerce, the goal of a
contextual bandit may be to learn an optimal treatment assignment policy at the
end of the experiment. That is, to minimize simple regret. However, this
objective remains understudied. We propose a new family of computationally
efficient bandit algorithms for the stochastic contextual bandit setting, where
a tuning parameter determines the weight placed on cumulative regret
minimization (where we establish near-optimal minimax guarantees) versus simple
regret minimization (where we establish state-of-the-art guarantees). Our
algorithms work with any function class, are robust to model misspecification,
and can be used in continuous arm settings. This flexibility comes from
constructing and relying on "conformal arm sets" (CASs). CASs provide a set of
arms for every context, encompassing the context-specific optimal arm with a
certain probability across the context distribution. Our positive results on
simple and cumulative regret guarantees are contrasted with a negative result,
which shows that no algorithm can achieve instance-dependent simple regret
guarantees while simultaneously achieving minimax optimal cumulative regret
guarantees
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
- âŠ