79 research outputs found

    Motion Synthesis and Control for Autonomous Agents using Generative Models and Reinforcement Learning

    Get PDF
    Imitating and predicting human motions have wide applications in both graphics and robotics, from developing realistic models of human movement and behavior in immersive virtual worlds and games to improving autonomous navigation for service agents deployed in the real world. Traditional approaches for motion imitation and prediction typically rely on pre-defined rules to model agent behaviors or use reinforcement learning with manually designed reward functions. Despite impressive results, such approaches cannot effectively capture the diversity of motor behaviors and the decision making capabilities of human beings. Furthermore, manually designing a model or reward function to explicitly describe human motion characteristics often involves laborious fine-tuning and repeated experiments, and may suffer from generalization issues. In this thesis, we explore data-driven approaches using generative models and reinforcement learning to study and simulate human motions. Specifically, we begin with motion synthesis and control of physically simulated agents imitating a wide range of human motor skills, and then focus on improving the local navigation decisions of autonomous agents in multi-agent interaction settings. For physics-based agent control, we introduce an imitation learning framework built upon generative adversarial networks and reinforcement learning that enables humanoid agents to learn motor skills from a few examples of human reference motion data. Our approach generates high-fidelity motions and robust controllers without needing to manually design and finetune a reward function, allowing at the same time interactive switching between different controllers based on user input. Based on this framework, we further propose a multi-objective learning scheme for composite and task-driven control of humanoid agents. Our multi-objective learning scheme balances the simultaneous learning of disparate motions from multiple reference sources and multiple goal-directed control objectives in an adaptive way, enabling the training of efficient composite motion controllers. Additionally, we present a general framework for fast and robust learning of motor control skills. Our framework exploits particle filtering to dynamically explore and discretize the high-dimensional action space involved in continuous control tasks, and provides a multi-modal policy as a substitute for the commonly used Gaussian policies. For navigation learning, we leverage human crowd data to train a human-inspired collision avoidance policy by combining knowledge distillation and reinforcement learning. Our approach enables autonomous agents to take human-like actions during goal-directed steering in fully decentralized, multi-agent environments. To inform better control in such environments, we propose SocialVAE, a variational autoencoder based architecture that uses timewise latent variables with socially-aware conditions and a backward posterior approximation to perform agent trajectory prediction. Our approach improves current state-of-the-art performance on trajectory prediction tasks in daily human interaction scenarios and more complex scenes involving interactions between NBA players. We further extend SocialVAE by exploiting semantic maps as context conditions to generate map-compliant trajectory prediction. Our approach processes context conditions and social conditions occurring during agent-agent interactions in an integrated manner through the use of a dual-attention mechanism. We demonstrate the real-time performance of our approach and its ability to provide high-fidelity, multi-modal predictions on various large-scale vehicle trajectory prediction tasks

    Deep Neural Networks and Data for Automated Driving

    Get PDF
    This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above

    Spatiotemporal anomaly detection: streaming architecture and algorithms

    Get PDF
    Includes bibliographical references.2020 Summer.Anomaly detection is the science of identifying one or more rare or unexplainable samples or events in a dataset or data stream. The field of anomaly detection has been extensively studied by mathematicians, statisticians, economists, engineers, and computer scientists. One open research question remains the design of distributed cloud-based architectures and algorithms that can accurately identify anomalies in previously unseen, unlabeled streaming, multivariate spatiotemporal data. With streaming data, time is of the essence, and insights are perishable. Real-world streaming spatiotemporal data originate from many sources, including mobile phones, supervisory control and data acquisition enabled (SCADA) devices, the internet-of-things (IoT), distributed sensor networks, and social media. Baseline experiments are performed on four (4) non-streaming, static anomaly detection multivariate datasets using unsupervised offline traditional machine learning (TML), and unsupervised neural network techniques. Multiple architectures, including autoencoders, generative adversarial networks, convolutional networks, and recurrent networks, are adapted for experimentation. Extensive experimentation demonstrates that neural networks produce superior detection accuracy over TML techniques. These same neural network architectures can be extended to process unlabeled spatiotemporal streaming using online learning. Space and time relationships are further exploited to provide additional insights and increased anomaly detection accuracy. A novel domain-independent architecture and set of algorithms called the Spatiotemporal Anomaly Detection Environment (STADE) is formulated. STADE is based on federated learning architecture. STADE streaming algorithms are based on a geographically unique, persistently executing neural networks using online stochastic gradient descent (SGD). STADE is designed to be pluggable, meaning that alternative algorithms may be substituted or combined to form an ensemble. STADE incorporates a Stream Anomaly Detector (SAD) and a Federated Anomaly Detector (FAD). The SAD executes at multiple locations on streaming data, while the FAD executes at a single server and identifies global patterns and relationships among the site anomalies. Each STADE site streams anomaly scores to the centralized FAD server for further spatiotemporal dependency analysis and logging. The FAD is based on recent advances in DNN-based federated learning. A STADE testbed is implemented to facilitate globally distributed experimentation using low-cost, commercial cloud infrastructure provided by Microsoft™. STADE testbed sites are situated in the cloud within each continent: Africa, Asia, Australia, Europe, North America, and South America. Communication occurs over the commercial internet. Three STADE case studies are investigated. The first case study processes commercial air traffic flows, the second case study processes global earthquake measurements, and the third case study processes social media (i.e., Twitter™) feeds. These case studies confirm that STADE is a viable architecture for the near real-time identification of anomalies in streaming data originating from (possibly) computationally disadvantaged, geographically dispersed sites. Moreover, the addition of the FAD provides enhanced anomaly detection capability. Since STADE is domain-independent, these findings can be easily extended to additional application domains and use cases

    Automated Detection of Complex Tactical Patterns in Football—Using Machine Learning Techniques to Identify Tactical Behavior

    Get PDF
    Football tactics is a topic of public interest, where decisions are predominantly made based on gut instincts from domain-experts. Sport science literature often highlights the need for evidence-based research on football tactics, however the limited capabilities in modeling the dynamics of football has prevented researchers from gaining usable insights. Recent technological advances have made high quality football data more available and affordable. Particularly, positional data providing player and ball coordinates at every instance of a match can be combined with event data containing spatio-temporal information on any event taking place on the pitch (e.g. passes, shots, fouls). On the other hand, the application of machine learning methods to domain-specific problems yields a paradigm shift in many industries including sports. The need for more informed decisions as well as automating time consuming processes—accelerated by the availability of data—has motivated many scientific investigations in football analytics. This thesis is part of a research program combining methodologies from sports and data science to address the following problems: the synchronization of positional and event data, objectively quantifying offensive actions, as well as the detection of tactical patterns. Although various basic insights from the overall research program are integrated, this thesis focuses primarily on the latter one. Specifically, positional and event data are used to apply machine learning techniques to identify eight established tactical patterns in football: namely high-/mid-/low-block defending, build-up/attacking play in the offense, counterpressing and counterattacks during transitions, and patterns when defending corner-kicks, e.g. player-/zonal- or post-marking. For each pattern, we consolidate definitions with football experts and label large amounts of data manually using video recordings. The inter-labeler reliability is used to ensure that each pattern is well-defined. Unsupervised techniques are used for the purpose of exploration, and supervised machine learning methods based on expert-labeled data for the final detection. As an outlook, semi-supervised methods were used to reduce the labeling effort. This thesis proves that the detection of tactical patterns can optimize everyday processes in professional clubs, and leverage the domain of tactical analysis in sport science by gaining unseen insights. Additionally, we add value to the machine learning domain by evaluating recent methods in supervised and semi supervised machine learning on challenging, real-world problems

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    State of the art of audio- and video based solutions for AAL

    Get PDF
    Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio

    Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

    Full text link
    Three recent breakthroughs due to AI in arts and science serve as motivation: An award winning digital image, protein folding, fast matrix multiplication. Many recent developments in artificial neural networks, particularly deep learning (DL), applied and relevant to computational mechanics (solid, fluids, finite-element technology) are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. Hybrid methods combine traditional PDE discretizations with ML methods either (1) to help model complex nonlinear constitutive relations, (2) to nonlinearly reduce the model order for efficient simulation (turbulence), or (3) to accelerate the simulation by predicting certain components in the traditional integration methods. Here, methods (1) and (2) relied on Long-Short-Term Memory (LSTM) architecture, with method (3) relying on convolutional neural networks. Pure ML methods to solve (nonlinear) PDEs are represented by Physics-Informed Neural network (PINN) methods, which could be combined with attention mechanism to address discontinuous solutions. Both LSTM and attention architectures, together with modern and generalized classic optimizers to include stochasticity for DL networks, are extensively reviewed. Kernel machines, including Gaussian processes, are provided to sufficient depth for more advanced works such as shallow networks with infinite width. Not only addressing experts, readers are assumed familiar with computational mechanics, but not with DL, whose concepts and applications are built up from the basics, aiming at bringing first-time learners quickly to the forefront of research. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics, even in well-known references. Positioning and pointing control of a large-deformable beam is given as an example.Comment: 275 pages, 158 figures. Appeared online on 2023.03.01 at CMES-Computer Modeling in Engineering & Science

    Unsupervised learning for anomaly detection in Australian medical payment data

    Full text link
    Fraudulent or wasteful medical insurance claims made by health care providers are costly for insurers. Typically, OECD healthcare organisations lose 3-8% of total expenditure due to fraud. As Australia’s universal public health insurer, Medicare Australia, spends approximately A34billionperannumontheMedicareBenefitsSchedule(MBS)andPharmaceuticalBenefitsScheme,wastedspendingofA 34 billion per annum on the Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme, wasted spending of A1–2.7 billion could be expected.However, fewer than 1% of claims to Medicare Australia are detected as fraudulent, below international benchmarks. Variation is common in medicine, and health conditions, along with their presentation and treatment, are heterogenous by nature. Increasing volumes of data and rapidly changing patterns bring challenges which require novel solutions. Machine learning and data mining are becoming commonplace in this field, but no gold standard is yet available. In this project, requirements are developed for real-world application to compliance analytics at the Australian Government Department of Health and Aged Care (DoH), covering: unsupervised learning; problem generalisation; human interpretability; context discovery; and cost prediction. Three novel methods are presented which rank providers by potentially recoverable costs. These methods used association analysis, topic modelling, and sequential pattern mining to provide interpretable, expert-editable models of typical provider claims. Anomalous providers are identified through comparison to the typical models, using metrics based on costs of excess or upgraded services. Domain knowledge is incorporated in a machine-friendly way in two of the methods through the use of the MBS as an ontology. Validation by subject-matter experts and comparison to existing techniques shows that the methods perform well. The methods are implemented in a software framework which enables rapid prototyping and quality assurance. The code is implemented at the DoH, and further applications as decision-support systems are in progress. The developed requirements will apply to future work in this fiel
    • …
    corecore