15 research outputs found

    Human activity recognition using a wearable camera

    Get PDF
    Tesi en modalitat cotutela Universitat Politècnica de Catalunya i Queen Mary, University of London. This PhD Thesis has been developed in the framework of, and according to, the rules of the Erasmus Mundus Joint Doctorate on Interactive and Cognitive Environments EMJD ICE [FPA n° 2010-0012]Advances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first­ person videos. The proposed features encode discriminant characteristics form magnitude, direction and dynamics of motion estimated using optical flow. M:>reover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features. Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variaitons of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classi fiers, and the temporal modelling is shown to improve recognition performance. We also investigate the use of deep networks to simplify the feature engineering from first-person videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiplemotion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activiites. Furthermore, we apply cross-domain knowledge transfer between inertial­ based and vision-based approaches for egocentric activity recognition. We propose sparsity weightedcombination of information from different motion modalities and/or streams . Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity.Los avances en tecnologías wearables facilitan la comprensión de actividades humanas utilizando cuando se usan videos grabados en primera persona para una amplia gama de aplicaciones. En esta tesis, proponemos características robustas de movimiento para el reconocimiento de actividades humana a partir de videos en primera persona. Las características propuestas codifican características discriminativas estimadas a partir de optical flow como magnitud, dirección y dinámica de movimiento. Además, diseñamos nuevas características de inercia virtual a partir de video, sin usar sensores inerciales, utilizando el movimiento del centroide de intensidad a través de los fotogramas. Los resultados obtenidos en múltiples bases de datos demuestran que las características inerciales basadas en centroides mejoran el rendimiento de reconocimiento en comparación con grid-based características. Además, proponemos un algoritmo multicapa que codifica las relaciones jerárquicas y temporales entre actividades. La primera capa opera en grupos de características que codifican eficazmente las dinámicas del movimiento y las variaciones temporales de características de apariencia entre múltiples fotogramas utilizando una jerarquía. La segunda capa aprovecha el contexto temporal ponderando las salidas de la jerarquía durante el modelado. Además, diseñamos una técnica de postprocesado para filtrar las decisiones utilizando estimaciones pasadas y la confianza de la estimación actual. Validamos el algoritmo propuesto utilizando varios clasificadores. El modelado temporal muestra una mejora del rendimiento en el reconocimiento de actividades. También investigamos el uso de redes profundas (deep networks) para simplificar el diseño manual de características a partir de videos en primera persona. Proponemos apilar espectrogramas para representar movimientos globales a corto plazo. Estos espectrogramas contienen una representación espaciotemporal de múltiples componentes de movimiento. Esto nos permite aplicar convoluciones bidimensionales para aprender funciones de movimiento. Empleamos long short-term memory recurrent networks para codificar la dependencia temporal a largo plazo entre las actividades. Además, aplicamos transferencia de conocimiento entre diferentes dominios (cross-domain knowledge) entre enfoques inerciales y basados en la visión para el reconocimiento de la actividad en primera persona. Proponemos una combinación ponderada de información de diferentes modalidades de movimiento y/o secuencias. Los resultados muestran que el algoritmo propuesto obtiene resultados competitivos en comparación con existentes algoritmos basados en deep learning, a la vez que se reduce la complejidad.Postprint (published version

    Human activity recognition using a wearable camera

    Get PDF
    Advances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first­ person videos. The proposed features encode discriminant characteristics form magnitude, direction and dynamics of motion estimated using optical flow. M:>reover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features. Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variaitons of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classi fiers, and the temporal modelling is shown to improve recognition performance. We also investigate the use of deep networks to simplify the feature engineering from first-person videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiplemotion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activiites. Furthermore, we apply cross-domain knowledge transfer between inertial­ based and vision-based approaches for egocentric activity recognition. We propose sparsity weightedcombination of information from different motion modalities and/or streams . Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity.Los avances en tecnologías wearables facilitan la comprensión de actividades humanas utilizando cuando se usan videos grabados en primera persona para una amplia gama de aplicaciones. En esta tesis, proponemos características robustas de movimiento para el reconocimiento de actividades humana a partir de videos en primera persona. Las características propuestas codifican características discriminativas estimadas a partir de optical flow como magnitud, dirección y dinámica de movimiento. Además, diseñamos nuevas características de inercia virtual a partir de video, sin usar sensores inerciales, utilizando el movimiento del centroide de intensidad a través de los fotogramas. Los resultados obtenidos en múltiples bases de datos demuestran que las características inerciales basadas en centroides mejoran el rendimiento de reconocimiento en comparación con grid-based características. Además, proponemos un algoritmo multicapa que codifica las relaciones jerárquicas y temporales entre actividades. La primera capa opera en grupos de características que codifican eficazmente las dinámicas del movimiento y las variaciones temporales de características de apariencia entre múltiples fotogramas utilizando una jerarquía. La segunda capa aprovecha el contexto temporal ponderando las salidas de la jerarquía durante el modelado. Además, diseñamos una técnica de postprocesado para filtrar las decisiones utilizando estimaciones pasadas y la confianza de la estimación actual. Validamos el algoritmo propuesto utilizando varios clasificadores. El modelado temporal muestra una mejora del rendimiento en el reconocimiento de actividades. También investigamos el uso de redes profundas (deep networks) para simplificar el diseño manual de características a partir de videos en primera persona. Proponemos apilar espectrogramas para representar movimientos globales a corto plazo. Estos espectrogramas contienen una representación espaciotemporal de múltiples componentes de movimiento. Esto nos permite aplicar convoluciones bidimensionales para aprender funciones de movimiento. Empleamos long short-term memory recurrent networks para codificar la dependencia temporal a largo plazo entre las actividades. Además, aplicamos transferencia de conocimiento entre diferentes dominios (cross-domain knowledge) entre enfoques inerciales y basados en la visión para el reconocimiento de la actividad en primera persona. Proponemos una combinación ponderada de información de diferentes modalidades de movimiento y/o secuencias. Los resultados muestran que el algoritmo propuesto obtiene resultados competitivos en comparación con existentes algoritmos basados en deep learning, a la vez que se reduce la complejidad

    Human activity recognition using a wearable camera

    Get PDF
    Advances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first­ person videos. The proposed features encode discriminant characteristics form magnitude, direction and dynamics of motion estimated using optical flow. M:>reover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features. Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variaitons of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classi fiers, and the temporal modelling is shown to improve recognition performance. We also investigate the use of deep networks to simplify the feature engineering from first-person videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiplemotion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activiites. Furthermore, we apply cross-domain knowledge transfer between inertial­ based and vision-based approaches for egocentric activity recognition. We propose sparsity weightedcombination of information from different motion modalities and/or streams . Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity.Los avances en tecnologías wearables facilitan la comprensión de actividades humanas utilizando cuando se usan videos grabados en primera persona para una amplia gama de aplicaciones. En esta tesis, proponemos características robustas de movimiento para el reconocimiento de actividades humana a partir de videos en primera persona. Las características propuestas codifican características discriminativas estimadas a partir de optical flow como magnitud, dirección y dinámica de movimiento. Además, diseñamos nuevas características de inercia virtual a partir de video, sin usar sensores inerciales, utilizando el movimiento del centroide de intensidad a través de los fotogramas. Los resultados obtenidos en múltiples bases de datos demuestran que las características inerciales basadas en centroides mejoran el rendimiento de reconocimiento en comparación con grid-based características. Además, proponemos un algoritmo multicapa que codifica las relaciones jerárquicas y temporales entre actividades. La primera capa opera en grupos de características que codifican eficazmente las dinámicas del movimiento y las variaciones temporales de características de apariencia entre múltiples fotogramas utilizando una jerarquía. La segunda capa aprovecha el contexto temporal ponderando las salidas de la jerarquía durante el modelado. Además, diseñamos una técnica de postprocesado para filtrar las decisiones utilizando estimaciones pasadas y la confianza de la estimación actual. Validamos el algoritmo propuesto utilizando varios clasificadores. El modelado temporal muestra una mejora del rendimiento en el reconocimiento de actividades. También investigamos el uso de redes profundas (deep networks) para simplificar el diseño manual de características a partir de videos en primera persona. Proponemos apilar espectrogramas para representar movimientos globales a corto plazo. Estos espectrogramas contienen una representación espaciotemporal de múltiples componentes de movimiento. Esto nos permite aplicar convoluciones bidimensionales para aprender funciones de movimiento. Empleamos long short-term memory recurrent networks para codificar la dependencia temporal a largo plazo entre las actividades. Además, aplicamos transferencia de conocimiento entre diferentes dominios (cross-domain knowledge) entre enfoques inerciales y basados en la visión para el reconocimiento de la actividad en primera persona. Proponemos una combinación ponderada de información de diferentes modalidades de movimiento y/o secuencias. Los resultados muestran que el algoritmo propuesto obtiene resultados competitivos en comparación con existentes algoritmos basados en deep learning, a la vez que se reduce la complejidad

    Human activity recognition using a wearable camera

    Get PDF
    PhDAdvances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first-person videos. The proposed features encode discriminant characteristics from magnitude, direction and dynamics of motion estimated using optical flow. Moreover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features. Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variations of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classifiers, and the temporal modelling is shown to improve recognition performance. We also investigate the use of deep networks to simplify the feature engineering from firstperson videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiple motion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activities. Furthermore, we apply cross-domain knowledge transfer between inertial-based and vision-based approaches for egocentric activity recognition. We propose sparsity weighted combination of information from different motion modalities and/or streams. Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity

    Domain-agnostic and Multi-level Evaluation of Generative Models

    Full text link
    While the capabilities of generative models heavily improved in different domains (images, text, graphs, molecules, etc.), their evaluation metrics largely remain based on simplified quantities or manual inspection with limited practicality. To this end, we propose a framework for Multi-level Performance Evaluation of Generative mOdels (MPEGO), which could be employed across different domains. MPEGO aims to quantify generation performance hierarchically, starting from a sub-feature-based low-level evaluation to a global features-based high-level evaluation. MPEGO offers great customizability as the employed features are entirely user-driven and can thus be highly domain/problem-specific while being arbitrarily complex (e.g., outcomes of experimental procedures). We validate MPEGO using multiple generative models across several datasets from the material discovery domain. An ablation study is conducted to study the plausibility of intermediate steps in MPEGO. Results demonstrate that MPEGO provides a flexible, user-driven, and multi-level evaluation framework, with practical insights on the generation quality. The framework, source code, and experiments will be available at https://github.com/GT4SD/mpego

    Feasibility of wearable monitors to detect heart rate variability in children with hand, foot and mouth disease

    Get PDF
    Hand foot and mouth disease (HFMD) is caused by a variety of enteroviruses, and occurs in large outbreaks in which a small proportion of children deteriorate rapidly with cardiopulmonary failure. Determining which children are likely to deteriorate is difficult and health systems may become overloaded during outbreaks as many children require hospitalization for monitoring. Heart rate variability (HRV) may help distinguish those with more severe diseases but requires simple scalable methods to collect ECG data.We carried out a prospective observational study to examine the feasibility of using wearable devices to measure HRV in 142 children admitted with HFMD at a children's hospital in Vietnam. ECG data were collected in all children. HRV indices calculated were lower in those with enterovirus A71 associated HFMD compared to those with other viral pathogens.HRV analysis collected from wearable devices is feasible in a low and middle income country (LMIC) and may help classify disease severity in HFMD

    DMLR: Data-centric Machine Learning Research -- Past, Present and Future

    Full text link
    Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.Comment: This editorial report accompanies the inaugural Data-centric Machine Learning Research (DMLR) Workshop that took place at ICML 2023 https://dmlr.ai

    Towards Creativity Characterization of Generative Models via Group-based Subset Scanning

    Full text link
    Deep generative models, such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), have been employed widely in computational creativity research. However, such models discourage out-of-distribution generation to avoid spurious sample generation, thereby limiting their creativity. Thus, incorporating research on human creativity into generative deep learning techniques presents an opportunity to make their outputs more compelling and human-like. As we see the emergence of generative models directed toward creativity research, a need for machine learning-based surrogate metrics to characterize creative output from these models is imperative. We propose group-based subset scanning to identify, quantify, and characterize creative processes by detecting a subset of anomalous node-activations in the hidden layers of the generative models. Our experiments on the standard image benchmarks, and their "creatively generated" variants, reveal that the proposed subset scores distribution is more useful for detecting creative processes in the activation space rather than the pixel space. Further, we found that creative samples generate larger subsets of anomalies than normal or non-creative samples across datasets. The node activations highlighted during the creative decoding process are different from those responsible for the normal sample generation. Lastly, we assess if the images from the subsets selected by our method were also found creative by human evaluators, presenting a link between creativity perception in humans and node activations within deep neural nets.Comment: Accepted to IJCAI 2022 - Creativity Track - Extended version from Synthetic Data Generation Workshop at ICLR'21 submission (arXiv:2104.00479). arXiv admin note: text overlap with arXiv:2105.1247
    corecore