3,740 research outputs found

    Egocentric vision-based passive dietary intake monitoring

    Get PDF
    Egocentric (first-person) perception captures and reveals how people perceive their surroundings. This unique perceptual view enables passive and objective monitoring of human-centric activities and behaviours. In capturing egocentric visual data, wearable cameras are used. Recent advances in wearable technologies have enabled wearable cameras to be lightweight, accurate, and with long battery life, making long-term passive monitoring a promising solution for healthcare and human behaviour understanding. In addition, recent progress in deep learning has provided an opportunity to accelerate the development of passive methods to enable pervasive and accurate monitoring, as well as comprehensive modelling of human-centric behaviours. This thesis investigates and proposes innovative egocentric technologies for passive dietary intake monitoring and human behaviour analysis. Compared to conventional dietary assessment methods in nutritional epidemiology, such as 24-hour dietary recall (24HR) and food frequency questionnaires (FFQs), which heavily rely on subjects’ memory to recall the dietary intake, and trained dietitians to collect, interpret, and analyse the dietary data, passive dietary intake monitoring can ease such burden and provide more accurate and objective assessment of dietary intake. Egocentric vision-based passive monitoring uses wearable cameras to continuously record human-centric activities with a close-up view. This passive way of monitoring does not require active participation from the subject, and records rich spatiotemporal details for fine-grained analysis. Based on egocentric vision and passive dietary intake monitoring, this thesis proposes: 1) a novel network structure called PAR-Net to achieve accurate food recognition by mining discriminative food regions. PAR-Net has been evaluated with food intake images captured by wearable cameras as well as those non-egocentric food images to validate its effectiveness for food recognition; 2) a deep learning-based solution for recognising consumed food items as well as counting the number of bites taken by the subjects from egocentric videos in an end-to-end manner; 3) in light of privacy concerns in egocentric data, this thesis also proposes a privacy-preserved solution for passive dietary intake monitoring, which uses image captioning techniques to summarise the image content and subsequently combines image captioning with 3D container reconstruction to report the actual food volume consumed. Furthermore, a novel framework that integrates food recognition, hand tracking and face recognition has also been developed to tackle the challenge of assessing individual dietary intake in food sharing scenarios with the use of a panoramic camera. Extensive experiments have been conducted. Tested with both laboratory (captured in London) and field study data (captured in Africa), the above proposed solutions have proven the feasibility and accuracy of using the egocentric camera technologies with deep learning methods for individual dietary assessment and human behaviour analysis.Open Acces

    Deep learning for time series classification: a review

    Get PDF
    Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.Comment: Accepted at Data Mining and Knowledge Discover

    Food Recognition and Detection with Minimum Supervision

    Get PDF
    Detecting multiple food items in one image is a challenge task. We propose a novel method which detects food items and their locations in the image with minimal supervision. In training, we generate candidate object regions for each image and extract their CNN features. Then we perform region mining to select discriminative regions for each class by submodular optimization. With these mined regions, we train a binary SVM classifier for each class and further refine these classifiers with hard negatives mining. In testing, a score is computed for each proposed region and we select the regions using non-maximum suppression and output the locations and predicted class names. Our experiments show very promising results with an average precision of 83.78% on test dataset. Our food detection method could be easily extended to a larger dataset as no ground-truth bounding boxes is needed during training

    One-Shot Fine-Grained Instance Retrieval

    Full text link
    Fine-Grained Visual Categorization (FGVC) has achieved significant progress recently. However, the number of fine-grained species could be huge and dynamically increasing in real scenarios, making it difficult to recognize unseen objects under the current FGVC framework. This raises an open issue to perform large-scale fine-grained identification without a complete training set. Aiming to conquer this issue, we propose a retrieval task named One-Shot Fine-Grained Instance Retrieval (OSFGIR). "One-Shot" denotes the ability of identifying unseen objects through a fine-grained retrieval task assisted with an incomplete auxiliary training set. This paper first presents the detailed description to OSFGIR task and our collected OSFGIR-378K dataset. Next, we propose the Convolutional and Normalization Networks (CN-Nets) learned on the auxiliary dataset to generate a concise and discriminative representation. Finally, we present a coarse-to-fine retrieval framework consisting of three components, i.e., coarse retrieval, fine-grained retrieval, and query expansion, respectively. The framework progressively retrieves images with similar semantics, and performs fine-grained identification. Experiments show our OSFGIR framework achieves significantly better accuracy and efficiency than existing FGVC and image retrieval methods, thus could be a better solution for large-scale fine-grained object identification.Comment: Accepted by MM2017, 9 pages, 7 figure
    corecore