403 research outputs found
Big Data Analytics for Network Level Short-Term Travel Time Prediction with Hierarchical LSTM and Attention
The travel time data collected from widespread traffic monitoring sensors
necessitate big data analytic tools for querying, visualization, and
identifying meaningful traffic patterns. This paper utilizes a large-scale
travel time dataset from Caltrans Performance Measurement System (PeMS) system
that is an overflow for traditional data processing and modeling tools. To
overcome the challenges of the massive amount of data, the big data analytic
engines Apache Spark and Apache MXNet are applied for data wrangling and
modeling. Seasonality and autocorrelation were performed to explore and
visualize the trend of time-varying data. Inspired by the success of the
hierarchical architecture for many Artificial Intelligent (AI) tasks, we
consolidate the cell and hidden states passed from low-level to the high-level
LSTM with an attention pooling similar to how the human perception system
operates. The designed hierarchical LSTM model can consider the dependencies at
different time scales to capture the spatial-temporal correlations of
network-level travel time. Another self-attention module is then devised to
connect LSTM extracted features to the fully connected layers, predicting
travel time for all corridors instead of a single link/route. The comparison
results show that the Hierarchical LSTM with Attention (HierLSTMat) model gives
the best prediction results at 30-minute and 45-min horizons and can
successfully forecast unusual congestion. The efficiency gained from big data
analytic tools was evaluated by comparing them with popular data science and
deep learning frameworks
Backwards is the way forward: feedback in the cortical hierarchy predicts the expected future
Clark offers a powerful description of the brain as a prediction machine, which offers progress on two distinct levels. First, on an abstract conceptual level, it provides a unifying framework for perception, action, and cognition (including subdivisions such as attention, expectation, and imagination). Second, hierarchical prediction offers progress on a concrete descriptive level for testing and constraining conceptual elements and mechanisms of predictive coding models (estimation of predictions, prediction errors, and internal models)
Human activity recognition using a wearable camera
Advances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first person videos. The proposed features encode discriminant characteristics form magnitude, direction and dynamics of motion estimated using optical flow. M:>reover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features.
Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variaitons of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classi fiers, and the temporal modelling is shown to improve recognition performance.
We also investigate the use of deep networks to simplify the feature engineering from first-person videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiplemotion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activiites. Furthermore, we apply cross-domain knowledge transfer between inertial based and vision-based approaches for egocentric activity recognition. We propose sparsity weightedcombination of information from different motion modalities and/or streams . Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity.Los avances en tecnologías wearables facilitan la comprensión de actividades humanas utilizando cuando se usan videos grabados en primera persona para una amplia gama de aplicaciones. En esta tesis, proponemos características robustas de movimiento para el reconocimiento de actividades humana a partir de videos en primera persona. Las características propuestas codifican características discriminativas estimadas a partir de optical flow como magnitud, dirección y dinámica de movimiento. Además, diseñamos nuevas características de inercia virtual a partir de video, sin usar sensores inerciales, utilizando el movimiento del centroide de intensidad a través de los fotogramas. Los resultados obtenidos en múltiples bases de datos demuestran que las características inerciales basadas en centroides mejoran el rendimiento de reconocimiento en comparación con grid-based características. Además, proponemos un algoritmo multicapa que codifica las relaciones jerárquicas y temporales entre actividades. La primera capa opera en grupos de características que codifican eficazmente las dinámicas del movimiento y las variaciones temporales de características de apariencia entre múltiples fotogramas utilizando una jerarquía. La segunda capa aprovecha el contexto temporal ponderando las salidas de la jerarquía durante el modelado. Además, diseñamos una técnica de postprocesado para filtrar las decisiones utilizando estimaciones pasadas y la confianza de la estimación actual. Validamos el algoritmo propuesto utilizando varios clasificadores. El modelado temporal muestra una mejora del rendimiento en el reconocimiento de actividades. También investigamos el uso de redes profundas (deep networks) para simplificar el diseño manual de características a partir de videos en primera persona. Proponemos apilar espectrogramas para representar movimientos globales a corto plazo. Estos espectrogramas contienen una representación espaciotemporal de múltiples componentes de movimiento. Esto nos permite aplicar convoluciones bidimensionales para aprender funciones de movimiento. Empleamos long short-term memory recurrent networks para codificar la dependencia temporal a largo plazo entre las actividades. Además, aplicamos transferencia de conocimiento entre diferentes dominios (cross-domain knowledge) entre enfoques inerciales y basados en la visión para el reconocimiento de la actividad en primera persona. Proponemos una combinación ponderada de información de diferentes modalidades de movimiento y/o secuencias. Los resultados muestran que el algoritmo propuesto obtiene resultados competitivos en comparación con existentes algoritmos basados en deep learning, a la vez que se reduce la complejidad
Human activity recognition using a wearable camera
Tesi en modalitat cotutela Universitat Politècnica de Catalunya i Queen Mary, University of London.
This PhD Thesis has been developed in the framework of, and according to, the rules of the Erasmus Mundus Joint Doctorate on Interactive and Cognitive Environments EMJD ICE [FPA n° 2010-0012]Advances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first person videos. The proposed features encode discriminant characteristics form magnitude, direction and dynamics of motion estimated using optical flow. M:>reover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features.
Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variaitons of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classi fiers, and the temporal modelling is shown to improve recognition performance.
We also investigate the use of deep networks to simplify the feature engineering from first-person videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiplemotion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activiites. Furthermore, we apply cross-domain knowledge transfer between inertial based and vision-based approaches for egocentric activity recognition. We propose sparsity weightedcombination of information from different motion modalities and/or streams . Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity.Los avances en tecnologías wearables facilitan la comprensión de actividades humanas utilizando cuando se usan videos grabados en primera persona para una amplia gama de aplicaciones. En esta tesis, proponemos características robustas de movimiento para el reconocimiento de actividades humana a partir de videos en primera persona. Las características propuestas codifican características discriminativas estimadas a partir de optical flow como magnitud, dirección y dinámica de movimiento. Además, diseñamos nuevas características de inercia virtual a partir de video, sin usar sensores inerciales, utilizando el movimiento del centroide de intensidad a través de los fotogramas. Los resultados obtenidos en múltiples bases de datos demuestran que las características inerciales basadas en centroides mejoran el rendimiento de reconocimiento en comparación con grid-based características. Además, proponemos un algoritmo multicapa que codifica las relaciones jerárquicas y temporales entre actividades. La primera capa opera en grupos de características que codifican eficazmente las dinámicas del movimiento y las variaciones temporales de características de apariencia entre múltiples fotogramas utilizando una jerarquía. La segunda capa aprovecha el contexto temporal ponderando las salidas de la jerarquía durante el modelado. Además, diseñamos una técnica de postprocesado para filtrar las decisiones utilizando estimaciones pasadas y la confianza de la estimación actual. Validamos el algoritmo propuesto utilizando varios clasificadores. El modelado temporal muestra una mejora del rendimiento en el reconocimiento de actividades. También investigamos el uso de redes profundas (deep networks) para simplificar el diseño manual de características a partir de videos en primera persona. Proponemos apilar espectrogramas para representar movimientos globales a corto plazo. Estos espectrogramas contienen una representación espaciotemporal de múltiples componentes de movimiento. Esto nos permite aplicar convoluciones bidimensionales para aprender funciones de movimiento. Empleamos long short-term memory recurrent networks para codificar la dependencia temporal a largo plazo entre las actividades. Además, aplicamos transferencia de conocimiento entre diferentes dominios (cross-domain knowledge) entre enfoques inerciales y basados en la visión para el reconocimiento de la actividad en primera persona. Proponemos una combinación ponderada de información de diferentes modalidades de movimiento y/o secuencias. Los resultados muestran que el algoritmo propuesto obtiene resultados competitivos en comparación con existentes algoritmos basados en deep learning, a la vez que se reduce la complejidad.Postprint (published version
Human activity recognition using a wearable camera
Advances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first person videos. The proposed features encode discriminant characteristics form magnitude, direction and dynamics of motion estimated using optical flow. M:>reover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features.
Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variaitons of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classi fiers, and the temporal modelling is shown to improve recognition performance.
We also investigate the use of deep networks to simplify the feature engineering from first-person videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiplemotion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activiites. Furthermore, we apply cross-domain knowledge transfer between inertial based and vision-based approaches for egocentric activity recognition. We propose sparsity weightedcombination of information from different motion modalities and/or streams . Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity.Los avances en tecnologías wearables facilitan la comprensión de actividades humanas utilizando cuando se usan videos grabados en primera persona para una amplia gama de aplicaciones. En esta tesis, proponemos características robustas de movimiento para el reconocimiento de actividades humana a partir de videos en primera persona. Las características propuestas codifican características discriminativas estimadas a partir de optical flow como magnitud, dirección y dinámica de movimiento. Además, diseñamos nuevas características de inercia virtual a partir de video, sin usar sensores inerciales, utilizando el movimiento del centroide de intensidad a través de los fotogramas. Los resultados obtenidos en múltiples bases de datos demuestran que las características inerciales basadas en centroides mejoran el rendimiento de reconocimiento en comparación con grid-based características. Además, proponemos un algoritmo multicapa que codifica las relaciones jerárquicas y temporales entre actividades. La primera capa opera en grupos de características que codifican eficazmente las dinámicas del movimiento y las variaciones temporales de características de apariencia entre múltiples fotogramas utilizando una jerarquía. La segunda capa aprovecha el contexto temporal ponderando las salidas de la jerarquía durante el modelado. Además, diseñamos una técnica de postprocesado para filtrar las decisiones utilizando estimaciones pasadas y la confianza de la estimación actual. Validamos el algoritmo propuesto utilizando varios clasificadores. El modelado temporal muestra una mejora del rendimiento en el reconocimiento de actividades. También investigamos el uso de redes profundas (deep networks) para simplificar el diseño manual de características a partir de videos en primera persona. Proponemos apilar espectrogramas para representar movimientos globales a corto plazo. Estos espectrogramas contienen una representación espaciotemporal de múltiples componentes de movimiento. Esto nos permite aplicar convoluciones bidimensionales para aprender funciones de movimiento. Empleamos long short-term memory recurrent networks para codificar la dependencia temporal a largo plazo entre las actividades. Además, aplicamos transferencia de conocimiento entre diferentes dominios (cross-domain knowledge) entre enfoques inerciales y basados en la visión para el reconocimiento de la actividad en primera persona. Proponemos una combinación ponderada de información de diferentes modalidades de movimiento y/o secuencias. Los resultados muestran que el algoritmo propuesto obtiene resultados competitivos en comparación con existentes algoritmos basados en deep learning, a la vez que se reduce la complejidad
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
A study of deep neural networks for human activity recognition
Human activity recognition and deep learning are two fields that have attracted attention in recent years. The former due to its relevance in many application domains, such as ambient assisted living or health monitoring, and the latter for its recent and excellent performance achievements in different domains of application such as image and speech recognition. In this article, an extensive analysis among the most suited deep learning architectures for activity recognition is conducted to compare its performance in terms of accuracy, speed, and memory requirements. In particular, convolutional neural networks (CNN), long short‐term memory networks (LSTM), bidirectional LSTM (biLSTM), gated recurrent unit networks (GRU), and deep belief networks (DBN) have been tested on a total of 10 publicly available datasets, with different sensors, sets of activities, and sampling rates. All tests have been designed under a multimodal approach to take advantage of synchronized raw sensor' signals. Results show that CNNs are efficient at capturing local temporal dependencies of activity signals, as well as at identifying correlations among sensors. Their performance in activity classification is comparable with, and in most cases better than, the performance of recurrent models. Their faster response and lower memory footprint make them the architecture of choice for wearable and IoT devices
Electromagnetic Source Imaging via a Data-Synthesis-Based Convolutional Encoder-Decoder Network
Electromagnetic source imaging (ESI) requires solving a highly ill-posed
inverse problem. To seek a unique solution, traditional ESI methods impose
various forms of priors that may not accurately reflect the actual source
properties, which may hinder their broad applications. To overcome this
limitation, in this paper a novel data-synthesized spatio-temporally
convolutional encoder-decoder network method termed DST-CedNet is proposed for
ESI. DST-CedNet recasts ESI as a machine learning problem, where discriminative
learning and latent-space representations are integrated in a convolutional
encoder-decoder network (CedNet) to learn a robust mapping from the measured
electroencephalography/magnetoencephalography (E/MEG) signals to the brain
activity. In particular, by incorporating prior knowledge regarding dynamical
brain activities, a novel data synthesis strategy is devised to generate
large-scale samples for effectively training CedNet. This stands in contrast to
traditional ESI methods where the prior information is often enforced via
constraints primarily aimed for mathematical convenience. Extensive numerical
experiments as well as analysis of a real MEG and Epilepsy EEG dataset
demonstrate that DST-CedNet outperforms several state-of-the-art ESI methods in
robustly estimating source signals under a variety of source configurations.Comment: 15 pages, 14 figures, and journa
- …