2 research outputs found

    A Study of the Lexicography of Hand Gestures During Eating

    Full text link
    This paper considers the lexicographical challenge of defining actions a person takes while eating. The goal is to establish objective and repeatable gesture definitions based on discernible intent. Such a standard would support the sharing of data and results between researchers working on the problem of automatic monitoring of dietary intake. We define five gestures: taking a bite of food (bite), sipping a drink of liquid (drink), manipulating food for preparation of intake (utensiling), not moving (rest) and a non-eating category (other). To test this lexicography, we used our definitions to label a large data set and tested for inter-rater reliability. The data set consists of a total of 276 participants eating a single meal while wearing a watch-like device to track wrist motion. Video was simultaneously recorded and subsequently reviewed to label gestures. A total of 18 raters manually labeled 51,614 gestures. Every meal was labeled by at least 1 rater, with 95 meals labeled by 2 raters. Inter-rater reliability was calculated in terms of agreement, boundary ambiguity, and mistakes. Results were 92.5% agreement (75% exact agreement, 17.5% boundary ambiguity). Mistakes of intake gestures (0.6% bite and 1.9% drink) occur much less frequently than non-intake gestures (16.5% utensiling and 8.7% rest). Similar rates were found across all 18 raters. Finally, a comparison of gesture segments against single index labels of bites and drinks from a previous effort showed an agreement of 95.8% with 0.6% ambiguity and 3.6% mistakes. Overall, these findings take a step towards developing a consensus lexicography of eating gestures for the research community

    An Overview Of 3D Object Detection

    Full text link
    Point cloud 3D object detection has recently received major attention and becomes an active research topic in 3D computer vision community. However, recognizing 3D objects in LiDAR (Light Detection and Ranging) is still a challenge due to the complexity of point clouds. Objects such as pedestrians, cyclists, or traffic cones are usually represented by quite sparse points, which makes the detection quite complex using only point cloud. In this project, we propose a framework that uses both RGB and point cloud data to perform multiclass object recognition. We use existing 2D detection models to localize the region of interest (ROI) on the RGB image, followed by a pixel mapping strategy in the point cloud, and finally, lift the initial 2D bounding box to 3D space. We use the recently released nuScenes dataset---a large-scale dataset contains many data formats---to training and evaluate our proposed architecture
    corecore