9 research outputs found

    Boosted Multiple Kernel Learning for First-Person Activity Recognition

    Get PDF
    Activity recognition from first-person (ego-centric) videos has recently gained attention due to the increasing ubiquity of the wearable cameras. There has been a surge of efforts adapting existing feature descriptors and designing new descriptors for the first-person videos. An effective activity recognition system requires selection and use of complementary features and appropriate kernels for each feature. In this study, we propose a data-driven framework for first-person activity recognition which effectively selects and combines features and their respective kernels during the training. Our experimental results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in first-person activity recognition problem exhibits improved results in comparison to the state-of-the-art. In addition, these techniques enable the expansion of the framework with new features in an efficient and convenient way.Comment: First published in the Proceedings of the 25th European Signal Processing Conference (EUSIPCO-2017) in 2017, published by EURASI

    Social Activity Recognition on Continuous RGB-D Video Sequences

    Get PDF
    Modern service robots are provided with one or more sensors, often including RGB-D cameras, to perceive objects and humans in the environment. This paper proposes a new system for the recognition of human social activities from a continuous stream of RGB-D data. Many of the works until now have succeeded in recognising activities from clipped videos in datasets, but for robotic applications it is important to be able to move to more realistic scenarios in which such activities are not manually selected. For this reason, it is useful to detect the time intervals when humans are performing social activities, the recognition of which can contribute to trigger human-robot interactions or to detect situations of potential danger. The main contributions of this research work include a novel system for the recognition of social activities from continuous RGB-D data, combining temporal segmentation and classification, as well as a model for learning the proximity-based priors of the social activities. A new public dataset with RGB-D videos of social and individual activities is also provided and used for evaluating the proposed solutions. The results show the good performance of the system in recognising social activities from continuous RGB-D data

    Badminton activity recognition using accelerometer data

    Get PDF
    A thorough analysis of sports is becoming increasingly important during the training process of badminton players at both the recreational and professional level. Nowadays, game situations are usually filmed and reviewed afterwards in order to analyze the game situation, but these video set-ups tend to be difficult to analyze, expensive, and intrusive to set up. In contrast, we classified badminton movements using off-the-shelf accelerometer and gyroscope data. To this end, we organized a data capturing campaign and designed a novel neural network using different frame sizes as input. This paper shows that with only accelerometer data, our novel convolutional neural network is able to distinguish nine activities with 86% precision when using a sampling frequency of 50 Hz. Adding the gyroscope data causes an increase of up to 99% precision, as compared to, respectively, 79% and 88% when using a traditional convolutional neural network. In addition, our paper analyses the impact of different sensor placement options and discusses the impact of different sampling frequenciess of the sensors. As such, our approach provides a low cost solution that is easy to use and can collect useful information for the analysis of a badminton game

    Redes neurais convolucionais de múltiplos canais para reconhecimento de ações em sequências de vídeos baseado em informações espaço-temporais

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Avanços na tecnologia digital aumentaram as capacidades de reconhecimento de eventos por meio do desenvolvimento de dispositivos com alta resolução, pequenas dimensões físicas e altas taxas de amostragem. O reconhecimento de eventos complexos em vídeos possui várias aplicações relevantes, particularmente devido à grande disponibilidade de câmeras digitais em ambientes como aeroportos, bancos, estradas, entre outros. A grande quantidade de dados produzidos é o cenário ideal para o desenvolvimento de métodos automáticos baseados em aprendizado de máquina profundo. Apesar do progresso significativo alcançado com as redes neurais profundas aplicadas a imagens, a compreensão do conteúdo de vídeos ainda enfrenta desafios na modelagem de relações espaço-temporais. Nesta dissertação, o problema do reconhecimento de ações humanas em vídeos foi investigada. Uma rede de múltiplos canais é a arquitetura de escolha para incorporar informações temporais, uma vez que se pode beneficiar de redes profundas pré-treinadas para imagens e de características tradicionais para inicialização. Além disso, seu custo de treinamento é geralmente menor do que o das redes neurais para vídeos. Imagens de ritmo visual são exploradas, pois codificam informações de longo prazo quando comparadas a quadros estáticos e fluxo ótico. Um novo método baseado em rastreamento de pontos é deesnvolvido para decidir a melhor direção do ritmo visual para cada vídeo. Além disso, redes neurais recorrentes foram treinadas a partir das características extraídas dos canais da arquitetura proposta. Experimentos conduzidos nas desafiadoras bases de dados públicas UCF101 e HMDB51 mostraram que a abordagem é capaz de melhorar o desempenho da rede, alcançando taxas de acurácia comparáveis aos métodos da literatura. Embora os ritmos visuais sejam originalmente criados a partir de imagens RGB, outros tipos de fontes e estratégias para sua criação são explorados e discutidos, tais como fluxo ótico, gradientes de imagem e histogramas de coresAbstract: Advances in digital technology have increased event recognition capabilities through the development of devices with high resolution, small physical dimensions and high sampling rates. The recognition of complex events in videos has several relevant applications, particularly due to the large availability of digital cameras in environments such as airports, banks, roads, among others. The large amount of data produced is the ideal scenario for the development of automatic methods based on deep learning. Despite the significant progress achieved through image-based deep neural networks, video content understanding still faces challenges in modeling spatio-temporal relations. In this dissertation, we address the problem of human action recognition in videos. A multi-stream network is our architecture of choice to incorporate temporal information, since it may benefit from pre-trained deep networks for images and from hand-crafted features for initialization. Furthermore, its training cost is usually lower than video-based networks. We explore visual rhythm images since they encode longer-term information when compared to still frames and optical flow. We propose a novel method based on point tracking for deciding the best visual rhythm direction for each video. In addition, we experimented with recurrent neural networks trained from the features extracted from the streams of the previous architecture. Experiments conducted on the challenging UCF101 and HMDB51 public datasets demonstrated that our approach is able to improve network performance, achieving accuracy rates comparable to the state-of-the-art methods. Even though the visual rhythms are originally created from RGB images, other types of source and strategies for their creation are explored and discussed, such as optical flow, image gradients and color histogramsMestradoCiência da ComputaçãoMestre em Ciência da Computação1736920CAPE

    Energy-Sustainable IoT Connectivity: Vision, Technological Enablers, Challenges, and Future Directions

    Full text link
    Technology solutions must effectively balance economic growth, social equity, and environmental integrity to achieve a sustainable society. Notably, although the Internet of Things (IoT) paradigm constitutes a key sustainability enabler, critical issues such as the increasing maintenance operations, energy consumption, and manufacturing/disposal of IoT devices have long-term negative economic, societal, and environmental impacts and must be efficiently addressed. This calls for self-sustainable IoT ecosystems requiring minimal external resources and intervention, effectively utilizing renewable energy sources, and recycling materials whenever possible, thus encompassing energy sustainability. In this work, we focus on energy-sustainable IoT during the operation phase, although our discussions sometimes extend to other sustainability aspects and IoT lifecycle phases. Specifically, we provide a fresh look at energy-sustainable IoT and identify energy provision, transfer, and energy efficiency as the three main energy-related processes whose harmonious coexistence pushes toward realizing self-sustainable IoT systems. Their main related technologies, recent advances, challenges, and research directions are also discussed. Moreover, we overview relevant performance metrics to assess the energy-sustainability potential of a certain technique, technology, device, or network and list some target values for the next generation of wireless systems. Overall, this paper offers insights that are valuable for advancing sustainability goals for present and future generations.Comment: 25 figures, 12 tables, submitted to IEEE Open Journal of the Communications Societ

    Redes neurais convolucionais baseadas em ritmos visuais e fusão adaptativa para uma arquitetura de múltiplos canais aplicada ao reconhecimento de ações humanas

    Get PDF
    Orientadores: Hélio Pedrini, Marcelo Bernardes VieiraTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A grande quantidade de dados de vídeos produzidos e divulgados todos os dias torna a inspeção visual por um operador humano impraticável. No entanto, o conteúdo desses vídeos pode ser útil para várias tarefas importantes, como vigilância e monitoramento de saúde. Portanto, métodos automáticos são necessários para detectar e compreender eventos relevantes em vídeos. O problema abordado neste trabalho é o reconhecimento das ações humanas em vídeos que visa classificar a ação que está sendo realizada por um ou mais atores. A complexidade do problema e o volume de dados de vídeo sugerem o uso de técnicas baseadas em aprendizado profundo, no entanto, ao contrário de problemas relacionados a imagens, não há uma grande variedade de arquiteturas específicas bem estabelecidas nem conjuntos de dados anotados tão grandes quanto aqueles baseados em imagens. Para contornar essas limitações, propomos e analisamos uma arquitetura de múltiplos canais composta de redes baseadas em imagens pré-treinadas na base ImageNet. Diferentes representações de imagens são extraídas dos vídeos que servem como entrada para os canais, a fim de fornecer informações complementares para o sistema. Neste trabalho, propomos novos canais baseados em ritmo visual que codificam informações de mais longo prazo quando comparados a quadros estáticos e fluxo óptico. Tão importante quanto a definição de aspectos representativos e complementares é a escolha de métodos de combinação adequados que explorem os pontos fortes de cada modalidade. Assim, nós também analisamos diferentes abordagens de fusão para combinar as modalidades. Para definir os melhores parâmetros de nossos métodos de fusão usando o conjunto de treinamento, temos que reduzir o sobreajuste em modalidades individuais, caso contrário, as saídas 100\% precisas não ofereceriam uma representação realista e relevante para o método de fusão. Assim, investigamos uma técnica de parada precoce para treinar redes individuais. Além de reduzir o sobreajuste, esse método também reduz o custo de treinamento, pois normalmente requer menos épocas para concluir o processo de classificação, e se adapta a novos canais e conjuntos de dados graças aos seus parâmetros treináveis. Os experimentos são realizados nos conjuntos de dados UCF101 e HMDB51, que são duas bases desafiadoras no contexto de reconhecimento de açõesAbstract: The large amount of video data produced and released every day makes visual inspection by a human operator impracticable. However, the content of these videos can be useful for various important tasks, such as surveillance and health monitoring. Therefore, automatic methods are needed to detect and understand relevant events in videos. The problem addressed in this work is the recognition of human actions in videos that aims to classify the action that is being performed by one or more actors. The complexity of the problem and the volume of video data suggest the use of deep learning-based techniques, however, unlike image-related problems, there is neither a great variety of specific well-established architectures nor annotated datasets as large as image-based ones. To circumvent these limitations, we propose and analyze a multi-stream architecture containing image-based networks pre-trained on the large ImageNet. Different image representations are extracted from the videos to feed the streams, in order to provide complementary information for the system. Here, we propose new streams based on visual rhythm that encode longer-term information when compared to still frames and optical flow. As important as the definition of representative and complementary aspects is the choice of proper combination methods that explore the strengths of each modality. Thus, here we also analyze different fusion approaches to combine the modalities. In order to define the best parameters of our fusion methods using the training set, we have to reduce overfitting in individual modalities, otherwise, the 100%\%-accurate outputs would not offer a realistic and relevant representation for the fusion method. Thus, we investigate an early stopping technique to train individual networks. In addition to reducing overfitting, this method also reduces the training cost, since it usually requires fewer epochs to complete the classification process, and adapts to new streams and datasets thanks to its trainable parameters. Experiments are conducted on UCF101 and HMDB51 datasets, which are two challenging benchmarks in the context of action recognitionDoutoradoCiência da ComputaçãoDoutora em Ciência da Computação0012017/09160-1CAPESFAPES

    Intergenerational Transmission of Human Parenting Styles to Human–Dog Relationships

    Get PDF
    Parenting style and intergenerational transmission have been extensively studied in parent–child relationships. As dogs are increasingly recognized as integral members of the family system, there is a growing interest in understanding how parenting behaviors directed towards dogs can also influence a dog’s behaviors. However, the reasons why people adopt certain parenting behaviors towards dogs remain relatively unknown. This study delved into the intergenerational transmission of parenting styles from one’s upbringing to caregiving for dogs. Using a mixed methods approach with 391 dog caregivers and 10 interviews, this study employed multivariate linear regression and thematic analysis. Permissive parenting exhibited an intergenerational effect, with those experiencing it being more likely to replicate the style with their dogs. Orientation towards dogs emerged as a crucial mediator, with protectionistic attitudes reducing the likelihood of replicating authoritarianparenting. Humanistic and protectionistic orientation increased the likelihood of compensatory permissive behaviors. Insights from interviews underscored the impact of perceived childhood experiences on adopting specific parenting behaviors. Ultimately, this study provides valuable insights that can contribute to the promotion of appropriate caregiving behaviors toward dogs. By drawing on our understanding of child–parent relationships, addressing the underlying elements of human–dog dynamics may lead to positive outcomes both for dogs and their caregivers

    Multitype Activity Recognition in Robot-Centric Scenarios

    No full text
    corecore