143 research outputs found

    Egocentric Activity Recognition Using HOG, HOF and MBH Features

    Get PDF
    recognizing egocentric actions is a challenging task that has to be addressed in recent years. The recognition of first person activities helps in assisting elderly people, disabled patients and so on. Here, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. In this research work, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Histogram of optical Flow (HOF) and Motion Boundary Histogram (MBH). The extracted features are given as input to the classifiers like Support Vector Machine (SVM) and k Nearest Neighbor (kNN). The performance results showed that SVM gave better results than kNN classifier for both categories

    Human activity recognition using a wearable camera

    Get PDF
    Advances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first­ person videos. The proposed features encode discriminant characteristics form magnitude, direction and dynamics of motion estimated using optical flow. M:>reover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features. Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variaitons of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classi fiers, and the temporal modelling is shown to improve recognition performance. We also investigate the use of deep networks to simplify the feature engineering from first-person videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiplemotion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activiites. Furthermore, we apply cross-domain knowledge transfer between inertial­ based and vision-based approaches for egocentric activity recognition. We propose sparsity weightedcombination of information from different motion modalities and/or streams . Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity.Los avances en tecnologías wearables facilitan la comprensión de actividades humanas utilizando cuando se usan videos grabados en primera persona para una amplia gama de aplicaciones. En esta tesis, proponemos características robustas de movimiento para el reconocimiento de actividades humana a partir de videos en primera persona. Las características propuestas codifican características discriminativas estimadas a partir de optical flow como magnitud, dirección y dinámica de movimiento. Además, diseñamos nuevas características de inercia virtual a partir de video, sin usar sensores inerciales, utilizando el movimiento del centroide de intensidad a través de los fotogramas. Los resultados obtenidos en múltiples bases de datos demuestran que las características inerciales basadas en centroides mejoran el rendimiento de reconocimiento en comparación con grid-based características. Además, proponemos un algoritmo multicapa que codifica las relaciones jerárquicas y temporales entre actividades. La primera capa opera en grupos de características que codifican eficazmente las dinámicas del movimiento y las variaciones temporales de características de apariencia entre múltiples fotogramas utilizando una jerarquía. La segunda capa aprovecha el contexto temporal ponderando las salidas de la jerarquía durante el modelado. Además, diseñamos una técnica de postprocesado para filtrar las decisiones utilizando estimaciones pasadas y la confianza de la estimación actual. Validamos el algoritmo propuesto utilizando varios clasificadores. El modelado temporal muestra una mejora del rendimiento en el reconocimiento de actividades. También investigamos el uso de redes profundas (deep networks) para simplificar el diseño manual de características a partir de videos en primera persona. Proponemos apilar espectrogramas para representar movimientos globales a corto plazo. Estos espectrogramas contienen una representación espaciotemporal de múltiples componentes de movimiento. Esto nos permite aplicar convoluciones bidimensionales para aprender funciones de movimiento. Empleamos long short-term memory recurrent networks para codificar la dependencia temporal a largo plazo entre las actividades. Además, aplicamos transferencia de conocimiento entre diferentes dominios (cross-domain knowledge) entre enfoques inerciales y basados en la visión para el reconocimiento de la actividad en primera persona. Proponemos una combinación ponderada de información de diferentes modalidades de movimiento y/o secuencias. Los resultados muestran que el algoritmo propuesto obtiene resultados competitivos en comparación con existentes algoritmos basados en deep learning, a la vez que se reduce la complejidad

    Human activity recognition using a wearable camera

    Get PDF
    Advances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first­ person videos. The proposed features encode discriminant characteristics form magnitude, direction and dynamics of motion estimated using optical flow. M:>reover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features. Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variaitons of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classi fiers, and the temporal modelling is shown to improve recognition performance. We also investigate the use of deep networks to simplify the feature engineering from first-person videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiplemotion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activiites. Furthermore, we apply cross-domain knowledge transfer between inertial­ based and vision-based approaches for egocentric activity recognition. We propose sparsity weightedcombination of information from different motion modalities and/or streams . Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity.Los avances en tecnologías wearables facilitan la comprensión de actividades humanas utilizando cuando se usan videos grabados en primera persona para una amplia gama de aplicaciones. En esta tesis, proponemos características robustas de movimiento para el reconocimiento de actividades humana a partir de videos en primera persona. Las características propuestas codifican características discriminativas estimadas a partir de optical flow como magnitud, dirección y dinámica de movimiento. Además, diseñamos nuevas características de inercia virtual a partir de video, sin usar sensores inerciales, utilizando el movimiento del centroide de intensidad a través de los fotogramas. Los resultados obtenidos en múltiples bases de datos demuestran que las características inerciales basadas en centroides mejoran el rendimiento de reconocimiento en comparación con grid-based características. Además, proponemos un algoritmo multicapa que codifica las relaciones jerárquicas y temporales entre actividades. La primera capa opera en grupos de características que codifican eficazmente las dinámicas del movimiento y las variaciones temporales de características de apariencia entre múltiples fotogramas utilizando una jerarquía. La segunda capa aprovecha el contexto temporal ponderando las salidas de la jerarquía durante el modelado. Además, diseñamos una técnica de postprocesado para filtrar las decisiones utilizando estimaciones pasadas y la confianza de la estimación actual. Validamos el algoritmo propuesto utilizando varios clasificadores. El modelado temporal muestra una mejora del rendimiento en el reconocimiento de actividades. También investigamos el uso de redes profundas (deep networks) para simplificar el diseño manual de características a partir de videos en primera persona. Proponemos apilar espectrogramas para representar movimientos globales a corto plazo. Estos espectrogramas contienen una representación espaciotemporal de múltiples componentes de movimiento. Esto nos permite aplicar convoluciones bidimensionales para aprender funciones de movimiento. Empleamos long short-term memory recurrent networks para codificar la dependencia temporal a largo plazo entre las actividades. Además, aplicamos transferencia de conocimiento entre diferentes dominios (cross-domain knowledge) entre enfoques inerciales y basados en la visión para el reconocimiento de la actividad en primera persona. Proponemos una combinación ponderada de información de diferentes modalidades de movimiento y/o secuencias. Los resultados muestran que el algoritmo propuesto obtiene resultados competitivos en comparación con existentes algoritmos basados en deep learning, a la vez que se reduce la complejidad

    Human activity recognition using a wearable camera

    Get PDF
    Tesi en modalitat cotutela Universitat Politècnica de Catalunya i Queen Mary, University of London. This PhD Thesis has been developed in the framework of, and according to, the rules of the Erasmus Mundus Joint Doctorate on Interactive and Cognitive Environments EMJD ICE [FPA n° 2010-0012]Advances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first­ person videos. The proposed features encode discriminant characteristics form magnitude, direction and dynamics of motion estimated using optical flow. M:>reover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features. Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variaitons of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classi fiers, and the temporal modelling is shown to improve recognition performance. We also investigate the use of deep networks to simplify the feature engineering from first-person videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiplemotion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activiites. Furthermore, we apply cross-domain knowledge transfer between inertial­ based and vision-based approaches for egocentric activity recognition. We propose sparsity weightedcombination of information from different motion modalities and/or streams . Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity.Los avances en tecnologías wearables facilitan la comprensión de actividades humanas utilizando cuando se usan videos grabados en primera persona para una amplia gama de aplicaciones. En esta tesis, proponemos características robustas de movimiento para el reconocimiento de actividades humana a partir de videos en primera persona. Las características propuestas codifican características discriminativas estimadas a partir de optical flow como magnitud, dirección y dinámica de movimiento. Además, diseñamos nuevas características de inercia virtual a partir de video, sin usar sensores inerciales, utilizando el movimiento del centroide de intensidad a través de los fotogramas. Los resultados obtenidos en múltiples bases de datos demuestran que las características inerciales basadas en centroides mejoran el rendimiento de reconocimiento en comparación con grid-based características. Además, proponemos un algoritmo multicapa que codifica las relaciones jerárquicas y temporales entre actividades. La primera capa opera en grupos de características que codifican eficazmente las dinámicas del movimiento y las variaciones temporales de características de apariencia entre múltiples fotogramas utilizando una jerarquía. La segunda capa aprovecha el contexto temporal ponderando las salidas de la jerarquía durante el modelado. Además, diseñamos una técnica de postprocesado para filtrar las decisiones utilizando estimaciones pasadas y la confianza de la estimación actual. Validamos el algoritmo propuesto utilizando varios clasificadores. El modelado temporal muestra una mejora del rendimiento en el reconocimiento de actividades. También investigamos el uso de redes profundas (deep networks) para simplificar el diseño manual de características a partir de videos en primera persona. Proponemos apilar espectrogramas para representar movimientos globales a corto plazo. Estos espectrogramas contienen una representación espaciotemporal de múltiples componentes de movimiento. Esto nos permite aplicar convoluciones bidimensionales para aprender funciones de movimiento. Empleamos long short-term memory recurrent networks para codificar la dependencia temporal a largo plazo entre las actividades. Además, aplicamos transferencia de conocimiento entre diferentes dominios (cross-domain knowledge) entre enfoques inerciales y basados en la visión para el reconocimiento de la actividad en primera persona. Proponemos una combinación ponderada de información de diferentes modalidades de movimiento y/o secuencias. Los resultados muestran que el algoritmo propuesto obtiene resultados competitivos en comparación con existentes algoritmos basados en deep learning, a la vez que se reduce la complejidad.Postprint (published version

    Hierarchical approach to classify food scenes in egocentric photo-streams.

    Get PDF
    Recent studies have shown that the environment where people eat can affect their nutritional behavior. In this paper, we provide automatic tools for personalized analysis of a person's health habits by the examination of daily recorded egocentric photo-streams. Specifically, we propose a new automatic approach for the classification of food-related environments, which is able to classify up to 15 such scenes. In this way, people can monitor the context around their food intake in order to get an objective insight into their daily eating routine. We propose a model that classifies food-related scenes organized in a semantic hierarchy. Additionally, we present and make available a new egocentric dataset composed of more than 33,000 images recorded by a wearable camera, over which our proposed model has been tested. Our approach obtains an accuracy and F-score of 56% and 65%, respectively, clearly outperforming the baseline methods

    Radar for Assisted Living in the Context of Internet of Things for Health and Beyond

    Get PDF
    This paper discusses the place of radar for assisted living in the context of IoT for Health and beyond. First, the context of assisted living and the urgency to address the problem is described. The second part gives a literature review of existing sensing modalities for assisted living and explains why radar is an upcoming preferred modality to address this issue. The third section presents developments in machine learning that helps improve performances in classification especially with deep learning with a reflection on lessons learned from it. The fourth section introduces recent published work from our research group in the area that shows promise with multimodal sensor fusion for classification and long short-term memory applied to early stages in the radar signal processing chain. Finally, we conclude with open challenges still to be addressed in the area and open to future research directions in animal welfare

    Human activity recognition using a wearable camera

    Get PDF
    PhDAdvances in wearable technologies are facilitating the understanding of human activities using first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose robust multiple motion features for human activity recognition from first-person videos. The proposed features encode discriminant characteristics from magnitude, direction and dynamics of motion estimated using optical flow. Moreover, we design novel virtual-inertial features from video, without using the actual inertial sensor, from the movement of intensity centroid across frames. Results on multiple datasets demonstrate that centroid-based inertial features improve the recognition performance of grid-based features. Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal relationships among activities. The first layer operates on groups of features that effectively encode motion dynamics and temporal variations of intra-frame appearance descriptors of activities with a hierarchical topology. The second layer exploits the temporal context by weighting the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique utilises decisions on past samples based on the confidence of the current sample. We validate the proposed framework with several classifiers, and the temporal modelling is shown to improve recognition performance. We also investigate the use of deep networks to simplify the feature engineering from firstperson videos. We propose a stacking of spectrograms to represent short-term global motions that contains a frequency-time representation of multiple motion components. This enables us to apply 2D convolutions to extract/learn motion features. We employ long short-term memory recurrent network to encode long-term temporal dependency among activities. Furthermore, we apply cross-domain knowledge transfer between inertial-based and vision-based approaches for egocentric activity recognition. We propose sparsity weighted combination of information from different motion modalities and/or streams. Results show that the proposed approach performs competitively with existing deep frameworks, moreover, with reduced complexity
    • …
    corecore