6,069 research outputs found
Advancing the Applicability of Reinforcement Learning to Autonomous Control
ï»żMit dateneffizientem Reinforcement Learning (RL) konnten
beeindruckendeErgebnisse erzielt werden, z.B. fĂŒr die Regelung von
Gasturbinen. In derPraxis erfordert die Anwendung von RL jedoch noch viel
manuelle Arbeit, wasbisher RL fĂŒr die autonome Regelung untauglich
erscheinen lieĂ. Dievorliegende Arbeit adressiert einige der verbleibenden
Probleme, insbesonderein Bezug auf die ZuverlÀssigkeit der
Policy-Erstellung.
Es werden zunÀchst RL-Probleme mit diskreten Zustands- und
AktionsrĂ€umenbetrachtet. FĂŒr solche Probleme wird hĂ€ufig ein MDP aus
BeobachtungengeschÀtzt, um dann auf Basis dieser MDP-SchÀtzung eine Policy
abzuleiten. DieArbeit beschreibt, wie die SchÀtzer-Unsicherheit des MDP in
diePolicy-Erstellung eingebracht werden kann, um mit diesem Wissen das
Risikoeiner schlechten Policy aufgrund einer fehlerhaften MDP-SchÀtzung
zuverringern. AuĂerdem wird so effiziente Exploration sowie
Policy-Bewertungermöglicht.
AnschlieĂend wendet sich die Arbeit Problemen mit
kontinuierlichenZustandsrÀumen zu und konzentriert sich auf auf
RL-Verfahren, welche aufFitted Q-Iteration (FQI) basieren, insbesondere
Neural Fitted Q-Iteration(NFQ). Zwar ist NFQ sehr dateneffizient, jedoch
nicht so zuverlĂ€ssig, wie fĂŒrdie autonome Regelung nötig wĂ€re. Die Arbeit
schlÀgt die Verwendung vonEnsembles vor, um die ZuverlÀssigkeit von NFQ zu
erhöhen. Es werden eine Reihevon Möglichkeiten der Ensemble-Nutzung
entworfen und evaluiert. Bei allenbetrachteten RL-Problemen sorgen
Ensembles fĂŒr eine zuverlĂ€ssigere Erstellungguter Policies.
Im nÀchsten Schritt werden Möglichkeiten der Policy-Bewertung
beikontinuierlichen ZustandsrÀumen besprochen. Die Arbeit schlÀgt vor,
FittedPolicy Evaluation (FPE), eine Variante von FQI fĂŒr Policy Evaluation,
mitanderen Regressionsverfahren und/oder anderen DatensÀtzen zu
kombinieren, umein MaĂ fĂŒr die Policy-QualitĂ€t zu erhalten. Experimente
zeigen, dassExtra-Tree-FPE ein realistisches QualitĂ€tsmaĂ fĂŒr
NFQ-generierte Policies liefernkann.
SchlieĂlich kombiniert die Arbeit Ensembles und Policy-Bewertung, um mit
sichÀndernden RL-Problemen umzugehen. Der wesentliche Beitrag ist das
EvolvingEnsemble, dessen Policy sich langsam Àndert, indem alte,
untaugliche Policiesentfernt und neue hinzugefĂŒgt werden. Es zeigt sich,
dass das EvolvingEnsemble deutlich besser funktioniert als einfachere
AnsÀtze.With data-efficient reinforcement learning (RL) methods impressive
resultscould be achieved, e.g., in the context of gas turbine control.
However, inpractice the application of RL still requires much human
intervention, whichhinders the application of RL to autonomous control.
This thesis addressessome of the remaining problems, particularly regarding
the reliability of thepolicy generation process.
The thesis first discusses RL problems with discrete state and action
spaces.In that context, often an MDP is estimated from observations. It is
describedhow to incorporate the estimators' uncertainties into the policy
generationprocess. This information can then be used to reduce the risk of
obtaining apoor policy due to flawed MDP estimates. Moreover, it is
discussed how to usethe knowledge of uncertainty for efficient exploration
and the assessment ofpolicy quality without requiring the policy's
execution.
The thesis then moves on to continuous state problems and focuses on
methodsbased on fitted Q-iteration (FQI), particularly neural fitted
Q-iteration(NFQ). Although NFQ has proven to be very data-efficient, it is
not asreliable as required for autonomous control. The thesis proposes to
useensembles to increase reliability. Several ways of ensemble usage in an
NFQcontext are discussed and evaluated on a number of benchmark domains. It
showsthat in all considered domains with ensembles good policies can be
producedmore reliably.
Next, policy assessment in continuous domains is discussed. The
thesisproposes to use fitted policy evaluation (FPE), an adaptation of FQI
to policyevaluation, combined with a different function approximator and/or
differentdataset to obtain a measure for policy quality. Results of
experiments showthat extra-tree FPE, applied to policies generated by NFQ,
produces valuefunctions that can well be used to reason about the true
policy quality.
Finally, the thesis combines ensembles and policy assessment to derive
methodsthat can deal with changing environments. The major contribution is
theevolving ensemble. The policy of the evolving ensemble changes slowly as
newpolicies are added and old policies removed. It turns out that the
evolvingensemble approaches work considerably better than simpler
approaches likesingle policies learned with recent observations or simple
ensembles
How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review
Context: Machine Learning (ML) has been at the heart of many innovations over
the past years. However, including it in so-called 'safety-critical' systems
such as automotive or aeronautic has proven to be very challenging, since the
shift in paradigm that ML brings completely changes traditional certification
approaches.
Objective: This paper aims to elucidate challenges related to the
certification of ML-based safety-critical systems, as well as the solutions
that are proposed in the literature to tackle them, answering the question 'How
to Certify Machine Learning Based Safety-critical Systems?'.
Method: We conduct a Systematic Literature Review (SLR) of research papers
published between 2015 to 2020, covering topics related to the certification of
ML systems. In total, we identified 217 papers covering topics considered to be
the main pillars of ML certification: Robustness, Uncertainty, Explainability,
Verification, Safe Reinforcement Learning, and Direct Certification. We
analyzed the main trends and problems of each sub-field and provided summaries
of the papers extracted.
Results: The SLR results highlighted the enthusiasm of the community for this
subject, as well as the lack of diversity in terms of datasets and type of
models. It also emphasized the need to further develop connections between
academia and industries to deepen the domain study. Finally, it also
illustrated the necessity to build connections between the above mention main
pillars that are for now mainly studied separately.
Conclusion: We highlighted current efforts deployed to enable the
certification of ML based software systems, and discuss some future research
directions.Comment: 60 pages (92 pages with references and complements), submitted to a
journal (Automated Software Engineering). Changes: Emphasizing difference
traditional software engineering / ML approach. Adding Related Works, Threats
to Validity and Complementary Materials. Adding a table listing papers
reference for each section/subsection
Designing Trustworthy Autonomous Systems
The design of autonomous systems is challenging and ensuring their trustworthiness can have different meanings, such as i) ensuring consistency and completeness of the requirements by a correct elicitation and formalization process; ii) ensuring that requirements are correctly mapped to system implementations so that any system behaviors never violate its requirements; iii) maximizing the reuse of available components and subsystems in order to cope with the design complexity; and iv) ensuring correct coordination of the system with its environment.Several techniques have been proposed over the years to cope with specific problems. However, a holistic design framework that, leveraging on existing tools and methodologies, practically helps the analysis and design of autonomous systems is still missing. This thesis explores the problem of building trustworthy autonomous systems from different angles. We have analyzed how current approaches of formal verification can provide assurances: 1) to the requirement corpora itself by formalizing requirements with assume/guarantee contracts to detect incompleteness and conflicts; 2) to the reward function used to then train the system so that the requirements do not get misinterpreted; 3) to the execution of the system by run-time monitoring and enforcing certain invariants; 4) to the coordination of the system with other external entities in a system of system scenario and 5) to system behaviors by automatically synthesize a policy which is correct
Understanding, Assessing, and Mitigating Safety Risks in Artificial Intelligence Systems
Prepared for: Naval Air Warfare Development Center (NAVAIR)Traditional software safety techniques rely on validating software against a deductively defined specification of how the software should behave in particular
situations. In the case of AI systems, specifications are often implicit or inductively defined. Data-driven methods are subject to sampling error since practical
datasets cannot provide exhaustive coverage of all possible events in a real physical environment. Traditional software verification and validation approaches may
not apply directly to these novel systems, complicating the operation of systems safety analysis (such as implemented in MIL-STD 882). However, AI offers
advanced capabilities, and it is desirable to ensure the safety of systems that rely on these capabilities. When AI tech is deployed in a weapon system, robot, or
planning system, unwanted events are possible. Several techniques can support the evaluation process for understanding the nature and likelihood of unwanted
events in AI systems and making risk decisions on naval employment. This research considers the state of the art, evaluating which ones are most likely to be
employable, usable, and correct. Techniques include software analysis, simulation environments, and mathematical determinations.Naval Air Warfare Development CenterNaval Postgraduate School, Naval Research Program (PE 0605853N/2098)Approved for public release. Distribution is unlimite
- âŠ