186 research outputs found

    Anomaly Detection in Video

    Get PDF
    Anomaly detection is an area of video analysis that has great importance in automated surveillance. Although it has been extensively studied, there has been little work on using deep convolutional neural networks to learn spatio-temporal feature representations. In this thesis we present novel approaches for learning motion features and modelling normal spatio-temporal dynamics for anomaly detection. The contributions are divided into two main chapters. The first introduces a method that uses a convolutional autoencoder to learn motion features from foreground optical flow patches. The autoencoder is coupled with a spatial sparsity constraint, known as Winner-Take-All, to learn shift-invariant and generic flow-features. This method solves the problem of using hand-crafted feature representations in state of the art methods. Moreover, to capture variations in scale of the patterns of motion as an object moves in depth through the scene,we also divide the image plane into regions and learn a separate normality model in each region. We compare the methods with state of the art approaches on two datasets and demonstrate improved performance. The second main chapter presents a end-to-end method that learns normal spatio-temporal dynamics from video volumes using a sequence-to-sequence encoder-decoder for prediction and reconstruction. This work is based on the intuition that the encoder-decoder learns to estimate normal sequences in a training set with low error, thus it estimates an abnormal sequence with high error. Error between the network's output and the target is used to classify a video volume as normal or abnormal. In addition to the use of reconstruction error, we also use prediction error for anomaly detection. We evaluate the second method on three datasets. The prediction models show comparable performance with state of the art methods. In comparison with the first proposed method, performance is improved in one dataset. Moreover, running time is significantly faster

    Movement Analytics: Current Status, Application to Manufacturing, and Future Prospects from an AI Perspective

    Full text link
    Data-driven decision making is becoming an integral part of manufacturing companies. Data is collected and commonly used to improve efficiency and produce high quality items for the customers. IoT-based and other forms of object tracking are an emerging tool for collecting movement data of objects/entities (e.g. human workers, moving vehicles, trolleys etc.) over space and time. Movement data can provide valuable insights like process bottlenecks, resource utilization, effective working time etc. that can be used for decision making and improving efficiency. Turning movement data into valuable information for industrial management and decision making requires analysis methods. We refer to this process as movement analytics. The purpose of this document is to review the current state of work for movement analytics both in manufacturing and more broadly. We survey relevant work from both a theoretical perspective and an application perspective. From the theoretical perspective, we put an emphasis on useful methods from two research areas: machine learning, and logic-based knowledge representation. We also review their combinations in view of movement analytics, and we discuss promising areas for future development and application. Furthermore, we touch on constraint optimization. From an application perspective, we review applications of these methods to movement analytics in a general sense and across various industries. We also describe currently available commercial off-the-shelf products for tracking in manufacturing, and we overview main concepts of digital twins and their applications

    Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

    Get PDF
    Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

    Information processing in visual systems

    No full text
    One of the goals of neuroscience is to understand how animals perceive sensory information. This thesis focuses on visual systems, to unravel how neuronal structures process aspects of the visual environment. To characterise the receptive field of a neuron, we developed spike-triggered independent component analysis. Alongside characterising the receptive field of a neuron, this method provides an insight into its underlying network structure. When applied to recordings from the H1 neuron of blowflies, it accurately recovered the sub-structure of the neuron. This sub-structure was studied further by recording H1's response to plaid stimuli. Based on the response, H1 can be classified as a component cell. We then fitted an anatomically inspired model to the response, and found the critical component to explain H1's response to be a sigmoid non-linearity at output of elementary movement detectors. The simpler blowfly visual system can help us understand elementary sensory information processing mechanisms. How does the more complex mammalian cortex implement these principles in its network? To study this, we used multi-electrode arrays to characterise the receptive field properties of neurons in the visual cortex of anaesthetised mice. Based on these recordings, we estimated the cortical limits on the performance of a visual task; the behavioural performance observed by Prusky and Douglas (2004) is within these limits. Our recordings were carried out in anaesthetised animals. During anaesthesia, cortical UP states are considered "fragments of wakefulness" and from simultaneous whole-cell and extracellular recordings, we found these states to be revealed in the phase of local field potentials. This finding was used to develop a method of detecting cortical state based on extracellular recordings, which allows us to explore information processing during different cortical states. Across this thesis, we have developed, tested and applied methods that help improve our understanding of information processing in visual systems

    Anomaly Detection Using Predictive Convolutional Long Short-Term Memory Units

    Get PDF
    Automating the segmentation of anomalous activities within long video sequences is complicated by the ambiguity of how such events are defined. This thesis approaches the problem by learning generative models with which meaningful sequences can be identified in videos using limited supervision. We propose two types of end-to-end trainable Convolutional Long Short-Term Memory (Conv-LSTM) networks that are able to predict the subsequent video sequence from a given input. The first is an encoder decoder based model that learns spatio-temporal features from stacked non-overlapping image patches, and the second is an autoencoder based model that utilizes max-pooling layers to learn an abstraction of the entire image. The networks learn to model “normal” activities from usual events. Regularity scores are derived from the reconstruction errors of a set of predictions with abnormal video sequences yielding lower regularity scores, as they diverge further from the actual sequence with time. The models utilize a composite structure and examine the effects of “conditioning” to learn more meaningful representations. The best model is chosen based on the reconstruction and prediction accuracies. The Conv-LSTM models are evaluated both qualitatively and quantitatively, demonstrating competitive results on multiple anomaly detection datasets. Conv-LSTM units are shown to provide competitive results for modeling and predicting learned events when compared to state-to-the-art methods

    The potential of error-related potentials. Analysis and decoding for control, neuro-rehabilitation and motor substitution

    Get PDF
    Las interfaces cerebro-máquina (BMIs, por sus siglas en inglés) permiten la decodificación de patrones de activación neuronal del cerebro de los usuarios para proporcionar a personas con movilidad severamente limitada, ya sea debido a un accidente o a una enfermedad neurodegenerativa, una forma de establecer una conexión directa entre su cerebro y un dispositivo. En este sentido, las BMIs basadas en técnicas no invasivas, como el electroencefalograma (EEG) han ofrecido a estos usuarios nuevas oportunidades para recuperar el control sobre las actividades de su vida diaria que de otro modo no podrían realizar, especialmente en las áreas de comunicación y control de su entorno.En los últimos años, la tecnología está avanzando a grandes pasos y con ella la complejidad de dispositivos ha incrementado significativamente, ampliando el número de posibilidades para controlar sofisticados dispositivos robóticos, prótesis con numerosos grados de libertad o incluso para la aplicación de complejos patrones de estimulación eléctrica en las propias extremidades paralizadas de un usuario, que le permitan ejecutar movimientos precisos. Sin embargo, la cantidad de información que se puede transmitir entre el cerebro y estos dispositivos sigue siendo muy limitada, tanto por el número como por la velocidad a la que se pueden decodificar los comandos neuronales. Por lo tanto, depender únicamente de las señales neuronales no garantiza un control óptimo y preciso.Para poder sacar el máximo partido de estas tecnologías, el campo de las BMIs adoptó el conocido enfoque de “control-compartido". Esta estrategia de control pretende crear un sistema de cooperación entre el usuario y un dispositivo inteligente, liberando al usuario de las tareas más pesadas requeridas para ejecutar la tarea sin llegar a perder la sensación de estar en control. De esta manera, los usuarios solo necesitan centrar su atención en los comandos de alto nivel (por ejemplo, elegir un elemento específico que agarrar, o elegir el destino final donde moverse) mientras el agente inteligente resuelve problemas de bajo nivel (como planificación de trayectorias, esquivar obstáculos, etc.) que permitan realizar la tarea designada de la manera óptima.En particular, esta tesis gira en torno a una señal neuronal cognitiva de alto nivel originada como la falta de coincidencia entre las expectativas del usuario y las acciones reales ejecutadas por los dispositivos inteligentes. Estas señales, denominadas potenciales de error (ErrPs), se consideran una forma natural de intercomunicar nuestro cerebro con máquinas y, por lo tanto, los usuarios solo requieren monitorizar las acciones de un dispositivo y evaluar mentalmente si este último se comporta correctamente o no. Esto puede verse como una forma de supervisar el comportamiento del dispositivo, en el que la decodificación de estas evaluaciones mentales se utiliza para proporcionar a estos dispositivos retroalimentación directamente relacionada con la ejecución de una tarea determinada para que puedan aprender y adaptarse a las preferencias del usuario.Dado que la respuesta neuronal de ErrP está asociada a un evento exógeno (dispositivo que comete una acción errónea), la mayoría de los trabajos desarrollados han intentado distinguir si una acción es correcta o errónea mediante la explotación de eventos discretos en escenarios bien controlados. Esta tesis presenta el primer intento de cambiar hacia configuraciones asíncronas que se centran en tareas relacionadas con el aumento de las capacidades motoras, con el objetivo de desarrollar interfaces para usuarios con movilidad limitada. En este tipo de configuraciones, dos desafíos importantes son que los eventos correctos o erróneos no están claramente definidos y los usuarios tienen que evaluar continuamente la tarea ejecutada, mientras que la clasificación de las señales EEG debe realizarse de forma asíncrona. Como resultado, los decodificadores tienen que lidiar constantemente con la actividad EEG de fondo, que típicamente conduce a una gran cantidad de errores de detección de firmas de error. Para superar estos desafíos, esta tesis aborda dos líneas principales de trabajo.Primero, explora la neurofisiología de las señales neuronales evocadas asociadas con la percepción de errores durante el uso interactivo de un BMI en escenarios continuos y más realistas.Se realizaron dos estudios para encontrar características alternativas basadas en el dominio de la frecuencia como una forma de lidiar con la alta variabilidad de las señales del EEG. Resultados, revelaron que existe un patrón estable representado como oscilaciones "theta" que mejoran la generalización durante la clasificación. Además, se utilizaron técnicas de aprendizaje automático de última generación para aplicar el aprendizaje de transferencia para discriminar asincrónicamente los errores cuando se introdujeron de forma gradual y no se conoce presumiblemente el inicio que desencadena los ErrPs. Además, los análisis de neurofisiología arrojan algo de luz sobre los mecanismos cognitivos subyacentes que provocan ErrP durante las tareas continuas, lo que sugiere la existencia de modelos neuronales en nuestro cerebro que acumulan evidencia y solo toman una decisión al alcanzar un cierto umbral. En segundo lugar, esta tesis evalúa la implementación de estos potenciales relacionados con errores en tres aplicaciones orientadas al usuario. Estos estudios no solo exploran cómo maximizar el rendimiento de decodificación de las firmas ErrP, sino que también investigan los mecanismos neuronales subyacentes y cómo los diferentes factores afectan las señales provocadas.La primera aplicación de esta tesis presenta una nueva forma de guiar a un robot móvil que se mueve en un entorno continuo utilizando solo potenciales de error como retroalimentación que podrían usarse para el control directo de dispositivos de asistencia. Con este propósito, proponemos un algoritmo basado en el emparejamiento de políticas para el aprendizaje de refuerzo inverso para inferir el objetivo del usuario a partir de señales cerebrales.La segunda aplicación presentada en esta tesis contempla los primeros pasos hacia un BCI híbrido para ejecutar distintos tipos de agarre de objetos, con el objetivo de ayudar a las personas que han perdido la funcionalidad motora de su extremidad superior. Este BMI combina la decodificación del tipo de agarre a partir de señales de EEG obtenidas del espectro de baja frecuencia con los potenciales de error provocados como resultado de la monitorización de movimientos de agarre erróneos. Los resultados muestran que, en efecto los ErrP aparecen en combinaciones de señales motoras originadas a partir de movimientos de agarre consistentes en una única repetición. Además, la evaluación de los diferentes factores involucrados en el diseño de la interfaz híbrida (como la velocidad de los estímulos, el tipo de agarre o la tarea mental) muestra cómo dichos factores afectan la morfología del subsiguiente potencial de error evocado.La tercera aplicación investiga los correlatos neuronales y los procesos cognitivos subyacentes asociados con desajustes somatosensoriales producidos por perturbaciones inesperadas durante la estimulación eléctrica neuromuscular en el brazo de un usuario. Este estudio simula los posibles errores que ocurren durante la terapia de neuro-rehabilitación, en la que la activación simultánea de la estimulación aferente mientras los sujetos se concentran en la realización de una tarea motora es crucial para una recuperación óptima. Los resultados muestran que los errores pueden aumentar la atención del sujeto en la tarea y desencadenar mecanismos de aprendizaje que al mismo tiempo podrían promover la neuroplasticidad motora.En resumen, a lo largo de esta tesis, se han diseñado varios paradigmas experimentales para mejorar la comprensión de cómo se generan los potenciales relacionados con errores durante el uso interactivo de BMI en aplicaciones orientadas al usuario. Se han propuesto diferentes métodos para pasar de la configuración bloqueada en el tiempo a la asíncrona, tanto en términos de decodificación como de percepción de los eventos erróneos; y ha explorado tres aplicaciones relacionadas con el aumento de las capacidades motoras, en las cuales los ErrPs se pueden usar para el control de dispositivos, la sustitución de motores y la neuro-rehabilitación.Brain-machine interfaces (BMIs) allow the decoding of cortical activation patterns from the users brain to provide people with severely limited mobility, due to an accident or disease, a way to establish a direct connection between their brain and a device. In this sense, BMIs based in noninvasive recordings, such as the electroencephalogram (EEG) have o↵ered these users new opportunities to regain control over activities of their daily life that they could not perform otherwise, especially in the areas of communication and control of their environment. Over the past years and with the latest technological advancements, devices have significantly grown on complexity expanding the number of possibilities to control complex robotic devices, prosthesis with numerous degrees of freedom or even to apply compound patterns of electrical stimulation on the subjects own paralyzed extremities to execute precise movements. However, the band-with of communication between brain and devices is still very limited, both in terms of the number and the speed at which neural commands can be decoded, and thus solely relying on neural signals do not guarantee accurate control them. In order to benefit of these technologies, the field of BMIs adopted the well-known approach of shared-control. This strategy intends to create a cooperation system between the user and an intelligent device, liberating the user from the burdensome parts of the task without losing the feeling of being in control. Here, users only need to focus their attention on high-level commands (e.g. choose the final destination to reach, or a specific item to grab) while the intelligent agent resolve low-level problems (e.g. trajectory planning, obstacle avoidance, etc) to perform the designated task in the optimal way. In particular, this thesis revolves around a high-level cognitive neural signal originated as the mismatch between the expectations of the user and the actual actions executed by the intelligent devices. These signals, denoted as error-related potentials (ErrPs), are thought as a natural way to intercommunicate our brain with machines and thus users only require to monitor the actions of a device and mentally assess whether the latter is behaving correctly or not. This can be seen as a way to supervise the device’s behavior, in which the decoding of these mental assessments is used to provide these devices with feedback directly related with the performance of a given task so they can learn and adapt to the user’s preferences. Since the ErrP’s neural response is associated to an exogenous event (device committing an erroneous action), most of the developed works have attempted to distinguish whether an action is correct or erroneous by exploiting discrete events under well-controlled scenarios. This thesis presents the first attempt to shift towards asynchronous settings that focus on tasks related with the augmentation of motor capabilities, with the objective of developing interfaces for users with limited mobility. In this type of setups, two important challenges are that correct or erroneous events are not clearly defined and users have to continuously evaluate the executed task, while classification of EEG signals has to be performed asynchronously. As a result, the decoders have to constantly deal with background EEG activity, which typically leads to a large number of missdetection of error signatures. To overcome these challenges, this thesis addresses two main lines of work. First, it explores the neurophysiology of the evoked neural signatures associated with the perception of errors during the interactive use of a BMI in continuous and more realistic scenarios. Two studies were performed to find alternative features based on the frequency domain as a way of dealing with the high variability of EEG signals. Results, revealed that there exists a stable pattern represented as theta oscillations that enhance generalization during classification. Also, state-of-the-art machine learning techniques were used to apply transfer learning to asynchronously discriminate errors when they were introduced in a gradual fashion and the onset that triggers the ErrPs is not presumably known. Furthermore, neurophsysiology analyses shed some light about the underlying cognitive mechanisms that elicit ErrP during continuous tasks, suggesting the existence of neural models in our brain that accumulate evidence and only take a decision upon reaching a certain threshold. Secondly, this thesis evaluates the implementation of these error-related potentials in three user-oriented applications. These studies not only explore how to maximize the decoding performance of ErrP signatures but also investigate the underlying neural mechanisms and how di↵erent factors a↵ect the elicited signals. The first application of this thesis presents a new way to guide a mobile robot moving in a continuous environment using only error potentials as feedback which could be used for the direct control of assistive devices. With this purpose, we propose an algorithm based on policy matching for inverse reinforcement learning to infer the user goal from brain signals. The second application presented in this thesis contemplates the first steps towards a hybrid BMI for grasping oriented to assist people who have lost motor functionality of their upper-limb. This BMI combines the decoding of the type of grasp from low-frequency EEG signals with error-related potentials elicited as the result of monitoring an erroneous grasping. The results show that ErrPs are elicited in combination of motor signatures from the low-frequency spectrum originated from single repetition grasping tasks and evaluates how di↵erent design factors (such as the speed of the stimuli, type of grasp or mental task) impact the morphology of the subsequent evoked ErrP. The third application investigates the neural correlates and the underlying cognitive processes associated with somatosensory mismatches produced by unexpected disturbances during neuromsucular electrical stimulation on a user’s arm. This study simulates possible errors that occur during neurorehabilitation therapy, in which the simultaneous activation of a↵erent stimulation while the subjects are concentrated in performing a motor task is crucial for optimal recovery. The results showed that errors may increase subject’s attention on the task and trigger learning mechanisms that at the same time could promote motor neuroplasticity. In summary, throughout this thesis, several experimental paradigms have been designed to improve the understanding of how error-related potentials are generated during the interactive use of BMIs in user-oriented applications. Di↵erent methods have been proposed to shift from time-locked to asynchronous settings, both in terms of decoding and perception of the erroneous events; and it has explored three applications related with the augmentation of motor capabilities, in which ErrPs can be used for control of devices, motor substitution and neurorehabilitation.<br /

    Evaluation and Advancement of Electrocorticographic Brain-Machine Interfaces for Individuals with Upper-Limb Paralysis

    Get PDF
    Brain-machine interface (BMI) technology aims to provide individuals with movement paralysis a natural and intuitive means for the restoration of function. Electrocorticography (ECoG), in which disc electrodes are placed on either the surface of the dura or the cortex to record field potential activity, has been proposed as a viable neural recording modality for BMI systems, potentially providing stable, long-term recordings of cortical activity with high spatial and temporal resolution. Previous demonstrations of BMI control using ECoG have consisted of short-term periods of control by able-bodied subjects utilizing basic processing and decoding techniques. This dissertation presents work seeking to advance the current state of ECoG BMIs through an assessment of the ability of individuals with movement paralysis to control an ECoG BMI, an investigation into adaptation during BMI skill acquisition, an evaluation of chronic implantation of an ECoG electrode grid, and improved extraction of BMI command signals from ECoG recordings. Two individuals with upper-limb paralysis were implanted with high-density ECoG electrode grids over sensorimotor cortical areas for up to 30 days, with both subjects found to be capable of voluntarily modulating their cortical activity to control movement of a computer cursor with up to three degrees of freedom. Analysis of control signal angular error and the tuning characteristics of ECoG spectral features during the acquisition of brain control revealed that both decoder calibration and fixed-decoder training could facilitate performance improvements. In addition, to better understand the capability of ECoG to provide robust, long-term recordings, work was conducted assessing the effects of chronic implantation of an ECoG electrode grid in a non-human primate, demonstrating that movement-related modulation could be recorded from electrode nearly two years post-implantation despite the presence of substantial fibrotic encapsulation. Finally, it was found that the extraction of command signals from ECoG recordings could be improved through the use of a decoding method incorporating weight-space priors accounting for the expected correlation structure of electrical field potentials. Combined, this work both demonstrates the feasibility of ECoG-based BMI systems as well as addresses some of key challenges that must be overcome before such systems are translated to the clinical realm
    corecore