2,159 research outputs found

    Multi-Label/Multi-Class Deep Learning Classification of Spatiotemporal Data

    Get PDF
    Human senses allow for the detection of simultaneous changes in our environments. An unobstructed field of view allows us to notice concurrent variations in different parts of what we are looking at. For example, when playing a video game, a player, oftentimes, needs to be aware of what is happening in the entire scene. Likewise, our hearing makes us aware of various simultaneous sounds occurring around us. Human perception can be affected by the cognitive ability of the brain and acuity of the senses. This is not a factor with machines. As long as a system is given a signal and instructed how to analyze this signal and extract useful information, it will be able to complete this task repeatedly with enough processing power. Automated and simultaneous detection of activity in machine learning requires the use of multi-labels. In order to detect concurrent occurrences spatially, the labels should represent the regions of interest for a particular application. For example, in this thesis, the regions of interest will be either different quadrants of a parking lot as captured on surveillance videos, four auscultation sites on patients\u27 lungs, or the two sides of the brain\u27s motor cortex (left and right). Since the labels, within the multi-labels, will be used to represent not only certain spatial locations but also different levels or types of occurrences, a multi-class/multi-level schema is necessary. In the first study, each label is appointed one of three levels of activity within the specific quadrant. In the second study, each label is assigned one of four different types of respiratory sounds. In the third study, each label is designated one of three different finger tapping frequencies. This novel multi-labeling/multi-class schema is one part of being able to detect useful information in the data. The other part of the process lies in the machine learning algorithm, the network model. In order to be able to capture the spatiotemporal characteristics of the data, selecting Convolutional Neural Network and Long Short Term Memory Network-based algorithms as the basis of the network is fitting. The following classifications are described in this thesis: 1. In the first study, one of three different motion densities are identified simultaneously in four quadrants of two sets of surveillance videos. Publicly available video recordings are the spatiotemporal data. 2. In the second study, one of four types of breathing sounds are classified simultaneously in four auscultation sites. The spatiotemporal data are publicly available respiratory sound recordings. 3. In the third study, one of three finger tapping rates are detected simultaneously in two regions of interest, the right and left sides of the brain\u27s motor cortex. The spatiotemporal data are fNIRS channel readings gathered during an index finger tapping experiment. Classification results are based on testing data which is not part of model training and validation. The success of the results is based on measures of Hamming Loss and Subset Accuracy as well Accuracy, F-Score, Sensitivity, and Specificity metrics. In the last study, model explanation is performed using Shapley Additive Explanation (SHAP) values and plotting them on an image-like background, a representation of the fNIRS channel layout used as data input. Overall, promising findings support the use of this approach in classifying spatiotemporal data with the interest of detecting different levels or types of occurrences simultaneously in several regions of interest

    Automotive Interior Sensing - Anomaly Detection

    Get PDF
    Com o surgimento dos veículos autónomos partilhados não haverá condutores nos veículos capazes de manter o bem-estar dos passageiros. Por esta razão, é imperativo que exista um sistema preparado para detetar comportamentos anómalos, por exemplo, violência entre passageiros, e que responda de forma adequada. O tipo de anomalias pode ser tão diverso que ter um "dataset" para treino que contenha todas as anomalias possíveis neste contexto é impraticável, implicando que algoritmos tradicionais de classificação não sejam ideais para esta aplicação. Por estas razões, os algoritmos de deteção de anomalias são a melhor opção para construir um bom modelo discriminativo. Esta dissertação foca-se na utilização de técnicas de "deep learning", mais precisamente arquiteturas baseadas em "Spatiotemporal auto-encoders" que são treinadas apenas com sequências de "frames" de comportamentos normais e testadas com sequências normais e anómalas dos "datasets" internos da Bosch. O modelo foi treinado inicialmente com apenas uma categoria das ações não violentas e as iterações finais foram treinadas com todas as categorias de ações não violentas. A rede neuronal contém camadas convolucionais dedicadas à compressão e descompressão dos dados espaciais; e algumas camadas dedicadas à compressão e descompressão temporal dos dados, implementadas com células LSTM ("Long Short-Term Memory") convolucionais, que extraem informações relativas aos movimentos dos passageiros. A rede define como reconstruir corretamente as sequências de "frames" normais e durante os testes, cada sequência é classificada como normal ou anómala de acordo com o seu erro de reconstrução. Através dos erros de reconstrução são calculados os "regularity scores" que indicam a regularidade que o modelo previu para cada "frame". A "framework" resultante é uma adição viável aos algoritmos tradicionais de reconhecimento de ações visto que pode funcionar como um sistema que serve para detetar ações desconhecidas e contribuir para entender o significado de tais interações humanas.With the appearance of SAVs (Shared Autonomous Vehicles) there will no longer be a driver responsible for maintaining the car interior and well-being of passengers. To counter this, it is imperative to have a system that is able to detect any abnormal behaviours, e.g., violence between passengers, and trigger the appropriate response. Furthermore, the type of anomalous activities can be so diverse, that having a dataset that incorporates most use cases is unattainable, making traditional classification algorithms not ideal for this kind of application. In this sense, anomaly detection algorithms are a good approach in order to build a discriminative model. Taking this into account, this work focuses on the use of deep learning techniques, more precisely Spatiotemporal auto-encoder based frameworks, which are trained on human behavior video sequences and tested on use cases with normal and abnormal human interactions from Bosch's internal datasets. Initially, the model was trained on a single non-violent action category. Final iterations considered all of the identified non-violent actions as normal data. The network architecture presents a group of convolutional layers which encode and decode spatial data; and a temporal encoder/decoder structure, implemented as a convolutional Long Short Term Memory network, responsible for learning motion information. The network defines how to properly reconstruct the 'normal' frame sequences and during testing, each sequence is classified as normal or abnormal based on its reconstruction error. Based on these values, regularity scores are inferred showing the predicted regularity of each frame. The resulting framework is a viable addition to traditional action recognition algorithms since it can work as a tool for detecting unknown actions, strange/abnormal behaviours and aid in understanding the meaning of such human interactions

    Biolocomotion Detection in Videos

    Get PDF
    Animals locomote for various reasons: to search for food, to find suitable habitat, to pursue prey, to escape from predators, or to seek a mate. The grand scale of biodiversity contributes to the great locomotory design and mode diversity. In this dissertation, the locomotion of general biological species is referred to as biolocomotion. The goal of this dissertation is to develop a computational approach to detect biolocomotion in any unprocessed video. The ways biological entities locomote through an environment are extremely diverse. Various creatures make use of legs, wings, fins, and other means to move through the world. Significantly, the motion exhibited by the body parts to navigate through an environment can be modelled by a combination of an overall positional advance with an overlaid asymmetric oscillatory pattern, a distinctive signature that tends to be absent in non-biological objects in locomotion. In this dissertation, this key trait of positional advance with asymmetric oscillation along with differences in an object's common motion (extrinsic motion) and localized motion of its parts (intrinsic motion) is exploited to detect biolocomotion. In particular, a computational algorithm is developed to measure the presence of these traits in tracked objects to determine if they correspond to a biological entity in locomotion. An alternative algorithm, based on generic handcrafted features combined with learning is assembled out of components from allied areas of investigation, also is presented as a basis of comparison to the main proposed algorithm. A novel biolocomotion dataset encompassing a wide range of moving biological and non-biological objects in natural settings is provided. Additionally, biolocomotion annotations to an extant camouflage animals dataset also is provided. Quantitative results indicate that the proposed algorithm considerably outperforms the alternative approach, supporting the hypothesis that biolocomotion can be detected reliably based on its distinct signature of positional advance with asymmetric oscillation and extrinsic/intrinsic motion dissimilarity

    Embodied Visual Perception Models For Human Behavior Understanding

    Get PDF
    Many modern applications require extracting the core attributes of human behavior such as a person\u27s attention, intent, or skill level from the visual data. There are two main challenges related to this problem. First, we need models that can represent visual data in terms of object-level cues. Second, we need models that can infer the core behavioral attributes from the visual data. We refer to these two challenges as ``learning to see\u27\u27, and ``seeing to learn\u27\u27 respectively. In this PhD thesis, we have made progress towards addressing both challenges. We tackle the problem of ``learning to see\u27\u27 by developing methods that extract object-level information directly from raw visual data. This includes, two top-down contour detectors, DeepEdge and HfL, which can be used to aid high-level vision tasks such as object detection. Furthermore, we also present two semantic object segmentation methods, Boundary Neural Fields (BNFs), and Convolutional Random Walk Networks (RWNs), which integrate low-level affinity cues into an object segmentation process. We then shift our focus to video-level understanding, and present a Spatiotemporal Sampling Network (STSN), which can be used for video object detection, and discriminative motion feature learning. Afterwards, we transition into the second subproblem of ``seeing to learn\u27\u27, for which we leverage first-person GoPro cameras that record what people see during a particular activity. We aim to infer the core behavior attributes such as a person\u27s attention, intention, and his skill level from such first-person data. To do so, we first propose a concept of action-objects--the objects that capture person\u27s conscious visual (watching a TV) or tactile (taking a cup) interactions. We then introduce two models, EgoNet and Visual-Spatial Network (VSN), which detect action-objects in supervised and unsupervised settings respectively. Afterwards, we focus on a behavior understanding task in a complex basketball activity. We present a method for evaluating players\u27 skill level from their first-person basketball videos, and also a model that predicts a player\u27s future motion trajectory from a single first-person image

    The 3rd Anti-UAV Workshop & Challenge: Methods and Results

    Full text link
    The 3rd Anti-UAV Workshop & Challenge aims to encourage research in developing novel and accurate methods for multi-scale object tracking. The Anti-UAV dataset used for the Anti-UAV Challenge has been publicly released. There are two main differences between this year's competition and the previous two. First, we have expanded the existing dataset, and for the first time, released a training set so that participants can focus on improving their models. Second, we set up two tracks for the first time, i.e., Anti-UAV Tracking and Anti-UAV Detection & Tracking. Around 76 participating teams from the globe competed in the 3rd Anti-UAV Challenge. In this paper, we provide a brief summary of the 3rd Anti-UAV Workshop & Challenge including brief introductions to the top three methods in each track. The submission leaderboard will be reopened for researchers that are interested in the Anti-UAV challenge. The benchmark dataset and other information can be found at: https://anti-uav.github.io/.Comment: Technical report for 3rd Anti-UAV Workshop and Challenge. arXiv admin note: text overlap with arXiv:2108.0990
    corecore