    Redes neurais convolucionais de múltiplos canais para reconhecimento de ações em sequências de vídeos baseado em informações espaço-temporais

    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Avanços na tecnologia digital aumentaram as capacidades de reconhecimento de eventos por meio do desenvolvimento de dispositivos com alta resolução, pequenas dimensões físicas e altas taxas de amostragem. O reconhecimento de eventos complexos em vídeos possui várias aplicações relevantes, particularmente devido à grande disponibilidade de câmeras digitais em ambientes como aeroportos, bancos, estradas, entre outros. A grande quantidade de dados produzidos é o cenário ideal para o desenvolvimento de métodos automáticos baseados em aprendizado de máquina profundo. Apesar do progresso significativo alcançado com as redes neurais profundas aplicadas a imagens, a compreensão do conteúdo de vídeos ainda enfrenta desafios na modelagem de relações espaço-temporais. Nesta dissertação, o problema do reconhecimento de ações humanas em vídeos foi investigada. Uma rede de múltiplos canais é a arquitetura de escolha para incorporar informações temporais, uma vez que se pode beneficiar de redes profundas pré-treinadas para imagens e de características tradicionais para inicialização. Além disso, seu custo de treinamento é geralmente menor do que o das redes neurais para vídeos. Imagens de ritmo visual são exploradas, pois codificam informações de longo prazo quando comparadas a quadros estáticos e fluxo ótico. Um novo método baseado em rastreamento de pontos é deesnvolvido para decidir a melhor direção do ritmo visual para cada vídeo. Além disso, redes neurais recorrentes foram treinadas a partir das características extraídas dos canais da arquitetura proposta. Experimentos conduzidos nas desafiadoras bases de dados públicas UCF101 e HMDB51 mostraram que a abordagem é capaz de melhorar o desempenho da rede, alcançando taxas de acurácia comparáveis aos métodos da literatura. Embora os ritmos visuais sejam originalmente criados a partir de imagens RGB, outros tipos de fontes e estratégias para sua criação são explorados e discutidos, tais como fluxo ótico, gradientes de imagem e histogramas de coresAbstract: Advances in digital technology have increased event recognition capabilities through the development of devices with high resolution, small physical dimensions and high sampling rates. The recognition of complex events in videos has several relevant applications, particularly due to the large availability of digital cameras in environments such as airports, banks, roads, among others. The large amount of data produced is the ideal scenario for the development of automatic methods based on deep learning. Despite the significant progress achieved through image-based deep neural networks, video content understanding still faces challenges in modeling spatio-temporal relations. In this dissertation, we address the problem of human action recognition in videos. A multi-stream network is our architecture of choice to incorporate temporal information, since it may benefit from pre-trained deep networks for images and from hand-crafted features for initialization. Furthermore, its training cost is usually lower than video-based networks. We explore visual rhythm images since they encode longer-term information when compared to still frames and optical flow. We propose a novel method based on point tracking for deciding the best visual rhythm direction for each video. In addition, we experimented with recurrent neural networks trained from the features extracted from the streams of the previous architecture. Experiments conducted on the challenging UCF101 and HMDB51 public datasets demonstrated that our approach is able to improve network performance, achieving accuracy rates comparable to the state-of-the-art methods. Even though the visual rhythms are originally created from RGB images, other types of source and strategies for their creation are explored and discussed, such as optical flow, image gradients and color histogramsMestradoCiência da ComputaçãoMestre em Ciência da Computação1736920CAPE

    Rapid Cut Detection On Compressed Video

    The temporal segmentation of a video sequence is one of the most important aspects for video processing, analysis, indexing, and retrieval. Most of existing techniques to address the problem of identifying the boundary between consecutive shots have focused on the uncompressed domain. However, decoding and analyzing of a video sequence are two extremely time-consuming tasks. Since video data are usually available in compressed form, it is desirable to directly process video material without decoding. In this paper, we present a novel approach for video cut detection that works in the compressed domain. The proposed method is based on both exploiting visual features extracted from the video stream and on using a simple and fast algorithm to detect the video transitions. Experiments on a real-world video dataset with several genres show that our approach presents high accuracy relative to the state-of-the-art solutions and in a computational time that makes it suitable for online usage. © 2011 Springer-Verlag.

    Visual Rhythm-based Time Series Analysis For Phenology Studies

    Plant phenology has gained importance in the context of global change research, stimulating the development of new technologies for phenological observation. In this context, digital cameras have been successfully used as multi-channel imaging sensors, providing measures to estimate changes on phenological events, such as leaf flushing and senescence. We monitored leaf-changing patterns of a cerrado-savanna vegetation by taken daily digital images. For that, we extract leaf color information and correlated with phenological changes. In this way, time series associated with plant species are obtained, raising the need of using appropriate tools for mining patterns of interest. In this paper, we present a novel approach for representing phenological patterns of plant species. The proposed method is based on encoding time series as a visual rhythm, which is characterized by color description algorithms. A comparative analysis of different descriptors is conducted and discussed. Plant phenology has gained importance in the context of global change research, stimulating the development of new technologies for phenological observation. In this context, digital cameras have been successfully used as multi-channel imaging sensors, providing measures to estimate changes on phenological events, such as leaf flushing and senescence. We monitored leaf-changing patterns of a cerrado-savanna vegetation by taken daily digital images. For that, we extract leaf color information and correlated with phenological changes. In this way, time series associated with plant species are obtained, raising the need of using appropriate tools for mining patterns of interest. In this paper, we present a novel approach for representing phenological patterns of plant species. The proposed method is based on encoding time series as a visual rhythm, which is characterized by color description algorithms. A comparative analysis of different descriptors is conducted and discussed. Experimental results show that our approach presents high accuracy on identifying plant species. © 2013 IEEE. Near remote phenology: Applying digital images to monitor leaf phenology in a brazilian cerrado savanna (2012) Int. Conf. Phenology (Phenology'12) EScience (eScience'12) Information and Knowledge Management (CIKM'02) Multimedia (ISM'10)