Every day, humans perform many closely related activities that involve subtle
discriminative motions, such as putting on a shirt vs. putting on a jacket, or
shaking hands vs. giving a high five. Activity recognition by ethical visual AI
could provide insights into our patterns of daily life, however existing
activity recognition datasets do not capture the massive diversity of these
human activities around the world. To address this limitation, we introduce
Collector, a free mobile app to record video while simultaneously annotating
objects and activities of consented subjects. This new data collection platform
was used to curate the Consented Activities of People (CAP) dataset, the first
large-scale, fine-grained activity dataset of people worldwide. The CAP dataset
contains 1.45M video clips of 512 fine grained activity labels of daily life,
collected by 780 subjects in 33 countries. We provide activity classification
and activity detection benchmarks for this dataset, and analyze baseline
results to gain insight into how people around with world perform common
activities. The dataset, benchmarks, evaluation tools, public leaderboards and
mobile apps are available for use at visym.github.io/cap