127 research outputs found
CERN for AGI: A Theoretical Framework for Autonomous Simulation-Based Artificial Intelligence Testing and Alignment
This paper explores the potential of a multidisciplinary approach to testing
and aligning artificial general intelligence (AGI) and LLMs. Due to the rapid
development and wide application of LLMs, challenges such as ethical alignment,
controllability, and predictability of these models have become important
research topics. This study investigates an innovative simulation-based
multi-agent system within a virtual reality framework that replicates the
real-world environment. The framework is populated by automated 'digital
citizens,' simulating complex social structures and interactions to examine and
optimize AGI. Application of various theories from the fields of sociology,
social psychology, computer science, physics, biology, and economics
demonstrates the possibility of a more human-aligned and socially responsible
AGI. The purpose of such a digital environment is to provide a dynamic platform
where advanced AI agents can interact and make independent decisions, thereby
mimicking realistic scenarios. The actors in this digital city, operated by the
LLMs, serve as the primary agents, exhibiting high degrees of autonomy. While
this approach shows immense potential, there are notable challenges and
limitations, most significantly the unpredictable nature of real-world social
dynamics. This research endeavors to contribute to the development and
refinement of AGI, emphasizing the integration of social, ethical, and
theoretical dimensions for future research.Comment: 32 pages, 4 figures, 2 table
Controlled time series generation for automotive software-in-the-loop testing using GANs
Testing automotive mechatronic systems partly uses the software-in-the-loop
approach, where systematically covering inputs of the system-under-test remains
a major challenge. In current practice, there are two major techniques of input
stimulation. One approach is to craft input sequences which eases control and
feedback of the test process but falls short of exposing the system to
realistic scenarios. The other is to replay sequences recorded from field
operations which accounts for reality but requires collecting a well-labeled
dataset of sufficient capacity for widespread use, which is expensive. This
work applies the well-known unsupervised learning framework of Generative
Adversarial Networks (GAN) to learn an unlabeled dataset of recorded in-vehicle
signals and uses it for generation of synthetic input stimuli. Additionally, a
metric-based linear interpolation algorithm is demonstrated, which guarantees
that generated stimuli follow a customizable similarity relationship with
specified references. This combination of techniques enables controlled
generation of a rich range of meaningful and realistic input patterns,
improving virtual test coverage and reducing the need for expensive field
tests.Comment: Preprint of paper accepted at The Second IEEE International
Conference on Artificial Intelligence Testing, April 13-16, 2020, Oxford, U
Potential-based Credit Assignment for Cooperative RL-based Testing of Autonomous Vehicles
While autonomous vehicles (AVs) may perform remarkably well in generic
real-life cases, their irrational action in some unforeseen cases leads to
critical safety concerns. This paper introduces the concept of collaborative
reinforcement learning (RL) to generate challenging test cases for AV planning
and decision-making module. One of the critical challenges for collaborative RL
is the credit assignment problem, where a proper assignment of rewards to
multiple agents interacting in the traffic scenario, considering all parameters
and timing, turns out to be non-trivial. In order to address this challenge, we
propose a novel potential-based reward-shaping approach inspired by
counterfactual analysis for solving the credit-assignment problem. The
evaluation in a simulated environment demonstrates the superiority of our
proposed approach against other methods using local and global rewards.Comment: Accepted at IJCNN'2
Towards Structured Evaluation of Deep Neural Network Supervisors
Deep Neural Networks (DNN) have improved the quality of several non-safety
related products in the past years. However, before DNNs should be deployed to
safety-critical applications, their robustness needs to be systematically
analyzed. A common challenge for DNNs occurs when input is dissimilar to the
training set, which might lead to high confidence predictions despite proper
knowledge of the input. Several previous studies have proposed to complement
DNNs with a supervisor that detects when inputs are outside the scope of the
network. Most of these supervisors, however, are developed and tested for a
selected scenario using a specific performance metric. In this work, we
emphasize the need to assess and compare the performance of supervisors in a
structured way. We present a framework constituted by four datasets organized
in six test cases combined with seven evaluation metrics. The test cases
provide varying complexity and include data from publicly available sources as
well as a novel dataset consisting of images from simulated driving scenarios.
The latter we plan to make publicly available. Our framework can be used to
support DNN supervisor evaluation, which in turn could be used to motive
development, validation, and deployment of DNNs in safety-critical
applications.Comment: Preprint of paper accepted for presentation at The First IEEE
International Conference on Artificial Intelligence Testing, April 4-9, 2019,
San Francisco East Bay, California, US
An Industrial Workbench for Test Scenario Identification for Autonomous Driving Software
Testing of autonomous vehicles involves enormous challenges for the automotive industry. The number of real-world driving scenarios is extremely large, and choosing effective test scenarios is essential, as well as combining simulated and real world testing. We present an industrial workbench of tools and workflows to generate efficient and effective test scenarios for active safety and autonomous driving functions. The workbench is based on existing engineering tools, and helps smoothly integrate simulated testing, with real vehicle parameters and software. We aim to validate the workbench with real cases and further refine the input model parameters and distributions
Exploring ML testing in practice - Lessons learned from an interactive rapid review with Axis Communications
There is a growing interest in industry and academia in machine learning (ML) testing. We believe that industry and academia need to learn together to produce rigorous and relevant knowledge. In this study, we initiate a collaboration between stakeholders from one case company, one research institute, and one university. To establish a common view of the problem domain, we applied an interactive rapid review of the state of the art. Four researchers from Lund University and RISE Research Institutes and four practitioners from Axis Communications reviewed a set of 180 primary studies on ML testing. We developed a taxonomy for the communication around ML testing challenges and results and identified a list of 12 review questions relevant for Axis Communications. The three most important questions (data testing, metrics for assessment, and test generation) were mapped to the literature, and an in-depth analysis of the 35 primary studies matching the most important question (data testing) was made. A final set of the five best matches were analysed and we reflect on the criteria for applicability and relevance for the industry. The taxonomies are helpful for communication but not final. Furthermore, there was no perfect match to the case company’s investigated review question (data testing). However, we extracted relevant approaches from the five studies on a conceptual level to support later context-specific improvements. We found the interactive rapid review approach useful for triggering and aligning communication between the different stakeholders
- …