2 research outputs found
A Study of the Lexicography of Hand Gestures During Eating
This paper considers the lexicographical challenge of defining actions a
person takes while eating. The goal is to establish objective and repeatable
gesture definitions based on discernible intent. Such a standard would support
the sharing of data and results between researchers working on the problem of
automatic monitoring of dietary intake. We define five gestures: taking a bite
of food (bite), sipping a drink of liquid (drink), manipulating food for
preparation of intake (utensiling), not moving (rest) and a non-eating category
(other). To test this lexicography, we used our definitions to label a large
data set and tested for inter-rater reliability. The data set consists of a
total of 276 participants eating a single meal while wearing a watch-like
device to track wrist motion. Video was simultaneously recorded and
subsequently reviewed to label gestures. A total of 18 raters manually labeled
51,614 gestures. Every meal was labeled by at least 1 rater, with 95 meals
labeled by 2 raters. Inter-rater reliability was calculated in terms of
agreement, boundary ambiguity, and mistakes. Results were 92.5% agreement (75%
exact agreement, 17.5% boundary ambiguity). Mistakes of intake gestures (0.6%
bite and 1.9% drink) occur much less frequently than non-intake gestures (16.5%
utensiling and 8.7% rest). Similar rates were found across all 18 raters.
Finally, a comparison of gesture segments against single index labels of bites
and drinks from a previous effort showed an agreement of 95.8% with 0.6%
ambiguity and 3.6% mistakes. Overall, these findings take a step towards
developing a consensus lexicography of eating gestures for the research
community
An Overview Of 3D Object Detection
Point cloud 3D object detection has recently received major attention and
becomes an active research topic in 3D computer vision community. However,
recognizing 3D objects in LiDAR (Light Detection and Ranging) is still a
challenge due to the complexity of point clouds. Objects such as pedestrians,
cyclists, or traffic cones are usually represented by quite sparse points,
which makes the detection quite complex using only point cloud. In this
project, we propose a framework that uses both RGB and point cloud data to
perform multiclass object recognition. We use existing 2D detection models to
localize the region of interest (ROI) on the RGB image, followed by a pixel
mapping strategy in the point cloud, and finally, lift the initial 2D bounding
box to 3D space. We use the recently released nuScenes dataset---a large-scale
dataset contains many data formats---to training and evaluate our proposed
architecture