127 research outputs found

    CERN for AGI: A Theoretical Framework for Autonomous Simulation-Based Artificial Intelligence Testing and Alignment

    Full text link
    This paper explores the potential of a multidisciplinary approach to testing and aligning artificial general intelligence (AGI) and LLMs. Due to the rapid development and wide application of LLMs, challenges such as ethical alignment, controllability, and predictability of these models have become important research topics. This study investigates an innovative simulation-based multi-agent system within a virtual reality framework that replicates the real-world environment. The framework is populated by automated 'digital citizens,' simulating complex social structures and interactions to examine and optimize AGI. Application of various theories from the fields of sociology, social psychology, computer science, physics, biology, and economics demonstrates the possibility of a more human-aligned and socially responsible AGI. The purpose of such a digital environment is to provide a dynamic platform where advanced AI agents can interact and make independent decisions, thereby mimicking realistic scenarios. The actors in this digital city, operated by the LLMs, serve as the primary agents, exhibiting high degrees of autonomy. While this approach shows immense potential, there are notable challenges and limitations, most significantly the unpredictable nature of real-world social dynamics. This research endeavors to contribute to the development and refinement of AGI, emphasizing the integration of social, ethical, and theoretical dimensions for future research.Comment: 32 pages, 4 figures, 2 table

    Controlled time series generation for automotive software-in-the-loop testing using GANs

    Full text link
    Testing automotive mechatronic systems partly uses the software-in-the-loop approach, where systematically covering inputs of the system-under-test remains a major challenge. In current practice, there are two major techniques of input stimulation. One approach is to craft input sequences which eases control and feedback of the test process but falls short of exposing the system to realistic scenarios. The other is to replay sequences recorded from field operations which accounts for reality but requires collecting a well-labeled dataset of sufficient capacity for widespread use, which is expensive. This work applies the well-known unsupervised learning framework of Generative Adversarial Networks (GAN) to learn an unlabeled dataset of recorded in-vehicle signals and uses it for generation of synthetic input stimuli. Additionally, a metric-based linear interpolation algorithm is demonstrated, which guarantees that generated stimuli follow a customizable similarity relationship with specified references. This combination of techniques enables controlled generation of a rich range of meaningful and realistic input patterns, improving virtual test coverage and reducing the need for expensive field tests.Comment: Preprint of paper accepted at The Second IEEE International Conference on Artificial Intelligence Testing, April 13-16, 2020, Oxford, U

    Potential-based Credit Assignment for Cooperative RL-based Testing of Autonomous Vehicles

    Full text link
    While autonomous vehicles (AVs) may perform remarkably well in generic real-life cases, their irrational action in some unforeseen cases leads to critical safety concerns. This paper introduces the concept of collaborative reinforcement learning (RL) to generate challenging test cases for AV planning and decision-making module. One of the critical challenges for collaborative RL is the credit assignment problem, where a proper assignment of rewards to multiple agents interacting in the traffic scenario, considering all parameters and timing, turns out to be non-trivial. In order to address this challenge, we propose a novel potential-based reward-shaping approach inspired by counterfactual analysis for solving the credit-assignment problem. The evaluation in a simulated environment demonstrates the superiority of our proposed approach against other methods using local and global rewards.Comment: Accepted at IJCNN'2

    Towards Structured Evaluation of Deep Neural Network Supervisors

    Full text link
    Deep Neural Networks (DNN) have improved the quality of several non-safety related products in the past years. However, before DNNs should be deployed to safety-critical applications, their robustness needs to be systematically analyzed. A common challenge for DNNs occurs when input is dissimilar to the training set, which might lead to high confidence predictions despite proper knowledge of the input. Several previous studies have proposed to complement DNNs with a supervisor that detects when inputs are outside the scope of the network. Most of these supervisors, however, are developed and tested for a selected scenario using a specific performance metric. In this work, we emphasize the need to assess and compare the performance of supervisors in a structured way. We present a framework constituted by four datasets organized in six test cases combined with seven evaluation metrics. The test cases provide varying complexity and include data from publicly available sources as well as a novel dataset consisting of images from simulated driving scenarios. The latter we plan to make publicly available. Our framework can be used to support DNN supervisor evaluation, which in turn could be used to motive development, validation, and deployment of DNNs in safety-critical applications.Comment: Preprint of paper accepted for presentation at The First IEEE International Conference on Artificial Intelligence Testing, April 4-9, 2019, San Francisco East Bay, California, US

    An Industrial Workbench for Test Scenario Identification for Autonomous Driving Software

    Get PDF
    Testing of autonomous vehicles involves enormous challenges for the automotive industry. The number of real-world driving scenarios is extremely large, and choosing effective test scenarios is essential, as well as combining simulated and real world testing. We present an industrial workbench of tools and workflows to generate efficient and effective test scenarios for active safety and autonomous driving functions. The workbench is based on existing engineering tools, and helps smoothly integrate simulated testing, with real vehicle parameters and software. We aim to validate the workbench with real cases and further refine the input model parameters and distributions

    Exploring ML testing in practice - Lessons learned from an interactive rapid review with Axis Communications

    Get PDF
    There is a growing interest in industry and academia in machine learning (ML) testing. We believe that industry and academia need to learn together to produce rigorous and relevant knowledge. In this study, we initiate a collaboration between stakeholders from one case company, one research institute, and one university. To establish a common view of the problem domain, we applied an interactive rapid review of the state of the art. Four researchers from Lund University and RISE Research Institutes and four practitioners from Axis Communications reviewed a set of 180 primary studies on ML testing. We developed a taxonomy for the communication around ML testing challenges and results and identified a list of 12 review questions relevant for Axis Communications. The three most important questions (data testing, metrics for assessment, and test generation) were mapped to the literature, and an in-depth analysis of the 35 primary studies matching the most important question (data testing) was made. A final set of the five best matches were analysed and we reflect on the criteria for applicability and relevance for the industry. The taxonomies are helpful for communication but not final. Furthermore, there was no perfect match to the case company’s investigated review question (data testing). However, we extracted relevant approaches from the five studies on a conceptual level to support later context-specific improvements. We found the interactive rapid review approach useful for triggering and aligning communication between the different stakeholders
    • …
    corecore