7,566 research outputs found

    Automatic Food Intake Assessment Using Camera Phones

    Get PDF
    Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user\u27s memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies. In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation. This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs. A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation. To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors

    Hierarchical Hidden Markov Model in Detecting Activities of Daily Living in Wearable Videos for Studies of Dementia

    Get PDF
    International audienceThis paper presents a method for indexing activities of daily living in videos obtained from wearable cameras. In the context of dementia diagnosis by doctors, the videos are recorded at patients' houses and later visualized by the medical practitioners. The videos may last up to two hours, therefore a tool for an efficient navigation in terms of activities of interest is crucial for the doctors. The specific recording mode provides video data which are really difficult, being a single sequence shot where strong motion and sharp lighting changes often appear. Our work introduces an automatic motion based segmentation of the video and a video structuring approach in terms of activities by a hierarchical two-level Hidden Markov Model. We define our description space over motion and visual characteristics of video and audio channels. Experiments on real data obtained from the recording at home of several patients show the difficulty of the task and the promising results of our approach

    Cognitive visual tracking and camera control

    Get PDF
    Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision

    Reconhecimento de ações em vídeos baseado na fusão de representações de ritmos visuais

    Get PDF
    Orientadores: Hélio Pedrini, David Menotti GomesTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Avanços nas tecnologias de captura e armazenamento de vídeos têm promovido uma grande demanda pelo reconhecimento automático de ações. O uso de câmeras para propó- sitos de segurança e vigilância tem aplicações em vários cenários, tais coomo aeroportos, parques, bancos, estações, estradas, hospitais, supermercados, indústrias, estádios, escolas. Uma dificuldade inerente ao problema é a complexidade da cena sob condições habituais de gravação, podendo conter fundo complexo e com movimento, múltiplas pes- soas na cena, interações com outros atores ou objetos e movimentos de câmera. Bases de dados mais recentes são construídas principalmente com gravações compartilhadas no YouTube e com trechos de filmes, situações em que não se restringem esses obstáculos. Outra dificuldade é o impacto da dimensão temporal, pois ela infla o tamanho dos da- dos, aumentando o custo computacional e o espaço de armazenamento. Neste trabalho, apresentamos uma metodologia de descrição de volumes utilizando a representação de Ritmos Visuais (VR). Esta técnica remodela o volume original do vídeo em uma imagem, em que se computam descritores bidimensionais. Investigamos diferentes estratégias para construção do ritmo visual, combinando configurações em diversos domínios de imagem e direções de varredura dos quadros. A partir disso, propomos dois métodos de extração de características originais, denominados Naïve Visual Rhythm (Naïve VR) e Visual Rhythm Trajectory Descriptor (VRTD). A primeira abordagem é a aplicação direta da técnica no volume de vídeo original, formando um descritor holístico que considera os eventos da ação como padrões e formatos na imagem de ritmo visual. A segunda variação foca na análise de pequenas vizinhanças obtidas a partir do processo das trajetórias densas, que permite que o algoritmo capture detalhes despercebidos pela descrição global. Testamos a nossa proposta em oito bases de dados públicas, sendo uma de gestos (SKIG), duas em primeira pessoa (DogCentric e JPL), e cinco em terceira pessoa (Weizmann, KTH, MuHAVi, UCF11 e HMDB51). Os resultados mostram que a técnica empregada é capaz de extrair elementos de movimento juntamente com informações de formato e de aparência, obtendo taxas de acurácia competitivas comparadas com o estado da arteAbstract: Advances in video acquisition and storage technologies have promoted a great demand for automatic recognition of actions. The use of cameras for security and surveillance purposes has applications in several scenarios, such as airports, parks, banks, stations, roads, hospitals, supermarkets, industries, stadiums, schools. An inherent difficulty of the problem is the complexity of the scene under usual recording conditions, which may contain complex background and motion, multiple people on the scene, interactions with other actors or objects, and camera motion. Most recent databases are built primarily with shared recordings on YouTube and with snippets of movies, situations where these obstacles are not restricted. Another difficulty is the impact of the temporal dimension since it expands the size of the data, increasing computational cost and storage space. In this work, we present a methodology of volume description using the Visual Rhythm (VR) representation. This technique reshapes the original volume of the video into an image, where two-dimensional descriptors are computed. We investigated different strategies for constructing the representation by combining configurations in several image domains and traversing directions of the video frames. From this, we propose two feature extraction methods, Naïve Visual Rhythm (Naïve VR) and Visual Rhythm Trajectory Descriptor (VRTD). The first approach is the straightforward application of the technique in the original video volume, forming a holistic descriptor that considers action events as patterns and formats in the visual rhythm image. The second variation focuses on the analysis of small neighborhoods obtained from the process of dense trajectories, which allows the algorithm to capture details unnoticed by the global description. We tested our methods in eight public databases, one of hand gestures (SKIG), two in first person (DogCentric and JPL), and five in third person (Weizmann, KTH, MuHAVi, UCF11 and HMDB51). The results show that the developed techniques are able to extract motion elements along with format and appearance information, achieving competitive accuracy rates compared to state-of-the-art action recognition approachesDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2015/03156-7FAPES
    corecore