73 research outputs found

    Automotive Interior Sensing - Anomaly Detection

    Get PDF
    Com o surgimento dos veículos autónomos partilhados não haverá condutores nos veículos capazes de manter o bem-estar dos passageiros. Por esta razão, é imperativo que exista um sistema preparado para detetar comportamentos anómalos, por exemplo, violência entre passageiros, e que responda de forma adequada. O tipo de anomalias pode ser tão diverso que ter um "dataset" para treino que contenha todas as anomalias possíveis neste contexto é impraticável, implicando que algoritmos tradicionais de classificação não sejam ideais para esta aplicação. Por estas razões, os algoritmos de deteção de anomalias são a melhor opção para construir um bom modelo discriminativo. Esta dissertação foca-se na utilização de técnicas de "deep learning", mais precisamente arquiteturas baseadas em "Spatiotemporal auto-encoders" que são treinadas apenas com sequências de "frames" de comportamentos normais e testadas com sequências normais e anómalas dos "datasets" internos da Bosch. O modelo foi treinado inicialmente com apenas uma categoria das ações não violentas e as iterações finais foram treinadas com todas as categorias de ações não violentas. A rede neuronal contém camadas convolucionais dedicadas à compressão e descompressão dos dados espaciais; e algumas camadas dedicadas à compressão e descompressão temporal dos dados, implementadas com células LSTM ("Long Short-Term Memory") convolucionais, que extraem informações relativas aos movimentos dos passageiros. A rede define como reconstruir corretamente as sequências de "frames" normais e durante os testes, cada sequência é classificada como normal ou anómala de acordo com o seu erro de reconstrução. Através dos erros de reconstrução são calculados os "regularity scores" que indicam a regularidade que o modelo previu para cada "frame". A "framework" resultante é uma adição viável aos algoritmos tradicionais de reconhecimento de ações visto que pode funcionar como um sistema que serve para detetar ações desconhecidas e contribuir para entender o significado de tais interações humanas.With the appearance of SAVs (Shared Autonomous Vehicles) there will no longer be a driver responsible for maintaining the car interior and well-being of passengers. To counter this, it is imperative to have a system that is able to detect any abnormal behaviours, e.g., violence between passengers, and trigger the appropriate response. Furthermore, the type of anomalous activities can be so diverse, that having a dataset that incorporates most use cases is unattainable, making traditional classification algorithms not ideal for this kind of application. In this sense, anomaly detection algorithms are a good approach in order to build a discriminative model. Taking this into account, this work focuses on the use of deep learning techniques, more precisely Spatiotemporal auto-encoder based frameworks, which are trained on human behavior video sequences and tested on use cases with normal and abnormal human interactions from Bosch's internal datasets. Initially, the model was trained on a single non-violent action category. Final iterations considered all of the identified non-violent actions as normal data. The network architecture presents a group of convolutional layers which encode and decode spatial data; and a temporal encoder/decoder structure, implemented as a convolutional Long Short Term Memory network, responsible for learning motion information. The network defines how to properly reconstruct the 'normal' frame sequences and during testing, each sequence is classified as normal or abnormal based on its reconstruction error. Based on these values, regularity scores are inferred showing the predicted regularity of each frame. The resulting framework is a viable addition to traditional action recognition algorithms since it can work as a tool for detecting unknown actions, strange/abnormal behaviours and aid in understanding the meaning of such human interactions

    Anomaly Detection in Traffic Surveillance Videos Using Deep Learning

    Get PDF
    In the recent past, a huge number of cameras have been placed in a variety of public and private areas for the purposes of surveillance, the monitoring of abnormal human actions, and traffic surveillance. The detection and recognition of abnormal activity in a real-world environment is a big challenge, as there can be many types of alarming and abnormal activities, such as theft, violence, and accidents. This research deals with accidents in traffic videos. In the modern world, video traffic surveillance cameras (VTSS) are used for traffic surveillance and monitoring. As the population is increasing drastically, the likelihood of accidents is also increasing. The VTSS is used to detect abnormal events or incidents regarding traffic on different roads and highways, such as traffic jams, traffic congestion, and vehicle accidents. Mostly in accidents, people are helpless and some die due to the unavailability of emergency treatment on long highways and those places that are far from cities. This research proposes a methodology for detecting accidents automatically through surveillance videos. A review of the literature suggests that convolutional neural networks (CNNs), which are a specialized deep learning approach pioneered to work with grid-like data, are effective in image and video analysis. This research uses CNNs to find anomalies (accidents) from videos captured by the VTSS and implement a rolling prediction algorithm to achieve high accuracy. In the training of the CNN model, a vehicle accident image dataset (VAID), composed of images with anomalies, was constructed and used. For testing the proposed methodology, the trained CNN model was checked on multiple videos, and the results were collected and analyzed. The results of this research show the successful detection of traffic accident events with an accuracy of 82% in the traffic surveillance system videos.publishedVersio

    Deep Learning for Crowd Anomaly Detection

    Get PDF
    Today, public areas across the globe are monitored by an increasing amount of surveillance cameras. This widespread usage has presented an ever-growing volume of data that cannot realistically be examined in real-time. Therefore, efforts to understand crowd dynamics have brought light to automatic systems for the detection of anomalies in crowds. This thesis explores the methods used across literature for this purpose, with a focus on those fusing dense optical flow in a feature extraction stage to the crowd anomaly detection problem. To this extent, five different deep learning architectures are trained using optical flow maps estimated by three deep learning-based techniques. More specifically, a 2D convolutional network, a 3D convolutional network, and LSTM-based convolutional recurrent network, a pre-trained variant of the latter, and a ConvLSTM-based autoencoder is trained using both regular frames and optical flow maps estimated by LiteFlowNet3, RAFT, and GMA on the UCSD Pedestrian 1 dataset. The experimental results have shown that while prone to overfitting, the use of optical flow maps may improve the performance of supervised spatio-temporal architectures

    Weakly supervised Video Anomaly Detection based on 3D Convolution and LSTM

    Get PDF
    Weakly supervised video anomaly detection is a recent focus of computer vision research thanks to the availability of large-scale weakly supervised video datasets. However, most existing research works are limited to the frame-level classification with emphasis on finding the presence of specific objects or activities. In this article, a new neural network architecture is proposed to efficiently extract the prominent features for detecting whether a video contains anomalies. A video is treated as an integral input and the detection follows the procedure of video-label assignment. The extraction of spatial and temporal features is carried out by three-dimensional convolutions, and then their relationship is further modeled using an LSTM network. The concise structure of the proposed method enables high computational efficiency, and extensive experiments demonstrate its effectiveness. (c) 2021 by the authors. Licensee MDPI, Basel, Switzerland

    Unsupervised video anomaly detection in UAVs: a new approach based on learning and inference

    Get PDF
    In this paper, an innovative approach to detecting anomalous occurrences in video data without supervision is introduced, leveraging contextual data derived from visual characteristics and effectively addressing the semantic discrepancy that exists between visual information and the interpretation of atypical incidents. Our work incorporates Unmanned Aerial Vehicles (UAVs) to capture video data from a different perspective and to provide a unique set of visual features. Specifically, we put forward a technique for discerning context through scene comprehension, which entails the construction of a spatio-temporal contextual graph to represent various aspects of visual information. These aspects encompass the manifestation of objects, their interrelations within the spatio-temporal domain, and the categorization of the scenes captured by UAVs. To encode context information, we utilize Transformer with message passing for updating the graph's nodes and edges. Furthermore, we have designed a graph-oriented deep Variational Autoencoder (VAE) approach for unsupervised categorization of scenes, enabling the extraction of the spatio-temporal context graph across diverse settings. In conclusion, by utilizing contextual data, we ascertain anomaly scores at the frame-level to identify atypical occurrences. We assessed the efficacy of the suggested approach by employing it on a trio of intricate data collections, specifically, the UCF-Crime, Avenue, and ShanghaiTech datasets, which provided substantial evidence of the method's successful performance

    Visual Representation Learning with Minimal Supervision

    Get PDF
    Computer vision intends to provide the human abilities of understanding and interpreting the visual surroundings to computers. An essential element to comprehend the environment is to extract relevant information from complex visual data so that the desired task can be solved. For instance, to distinguish cats from dogs the feature 'body shape' is more relevant than 'eye color' or the 'amount of legs'. In traditional computer vision it is conventional to develop handcrafted functions that extract specific low-level features such as edges from visual data. However, in order to solve a particular task satisfactorily we require a combination of several features. Thus, the approach of traditional computer vision has the disadvantage that whenever a new task is addressed, a developer needs to manually specify all the features the computer should look for. For that reason, recent works have primarily focused on developing new algorithms that teach the computer to autonomously detect relevant and task-specific features. Deep learning has been particularly successful for that matter. In deep learning, artificial neural networks automatically learn to extract informative features directly from visual data. The majority of developed deep learning strategies require a dataset with annotations which indicate the solution of the desired task. The main bottleneck is that creating such a dataset is very tedious and time-intensive considering that every sample needs to be annotated manually. This thesis presents new techniques that attempt to keep the amount of human supervision to a minimum while still reaching satisfactory performances on various visual understanding tasks. In particular, this thesis focuses on self-supervised learning algorithms that train a neural network on a surrogate task where no human supervision is required. We create an artificial supervisory signal by breaking the order of visual patterns and asking the network to recover the original structure. Besides demonstrating the abilities of our model on common computer vision tasks such as action recognition, we additionally apply our model to biomedical scenarios. Many research projects in medicine involve profuse manual processes that extend the duration of developing successful treatments. Taking the example of analyzing the motor function of neurologically impaired patients we show that our self-supervised method can help to automate tedious, visually based processes in medical research. In order to perform a detailed analysis of motor behavior and, thus, provide a suitable treatment, it is important to discover and identify the negatively affected movements. Therefore, we propose a magnification tool that can detect and enhance subtle changes in motor function including motor behavior differences across individuals. In this way, our automatic diagnostic system does not only analyze apparent behavior but also facilitates the perception and discovery of impaired movements. Learning a feature representation without requiring annotations significantly reduces human supervision. However, using annotated dataset leads generally to better performances in contrast to self-supervised learning methods. Hence, we additionally examine semi-supervised approaches which efficiently combine few annotated samples with large unlabeled datasets. Consequently, semi-supervised learning represents a good trade-off between annotation time and accuracy
    corecore