1,283 research outputs found
Learning to Localize and Align Fine-Grained Actions to Sparse Instructions
Automatic generation of textual video descriptions that are time-aligned with
video content is a long-standing goal in computer vision. The task is
challenging due to the difficulty of bridging the semantic gap between the
visual and natural language domains. This paper addresses the task of
automatically generating an alignment between a set of instructions and a first
person video demonstrating an activity. The sparse descriptions and ambiguity
of written instructions create significant alignment challenges. The key to our
approach is the use of egocentric cues to generate a concise set of action
proposals, which are then matched to recipe steps using object recognition and
computational linguistic techniques. We obtain promising results on both the
Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions
Dataset
Contextual Media Retrieval Using Natural Language Queries
The widespread integration of cameras in hand-held and head-worn devices as
well as the ability to share content online enables a large and diverse visual
capture of the world that millions of users build up collectively every day. We
envision these images as well as associated meta information, such as GPS
coordinates and timestamps, to form a collective visual memory that can be
queried while automatically taking the ever-changing context of mobile users
into account. As a first step towards this vision, in this work we present
Xplore-M-Ego: a novel media retrieval system that allows users to query a
dynamic database of images and videos using spatio-temporal natural language
queries. We evaluate our system using a new dataset of real user queries as
well as through a usability study. One key finding is that there is a
considerable amount of inter-user variability, for example in the resolution of
spatial relations in natural language utterances. We show that our retrieval
system can cope with this variability using personalisation through an online
learning-based retrieval formulation.Comment: 8 pages, 9 figures, 1 tabl
The insider on the outside: a novel system for the detection of information leakers in social networks
Confidential information is all too easily leaked by naive users posting comments. In this paper we introduce DUIL, a system for Detecting Unintentional Information Leakers. The value of DUIL is in its ability to detect those responsible for information leakage that occurs through comments posted on news articles in a public environment, when those articles have withheld material non-public information. DUIL is comprised of several artefacts, each designed to analyse a different aspect of this challenge: the information, the user(s) who posted the information, and the user(s) who may be involved in the dissemination of information. We present a design science analysis of DUIL as an information system artefact comprised of social, information, and technology artefacts. We demonstrate the performance of DUIL on real data crawled from several Facebook news pages spanning two years of news articles
EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding
Object understanding in egocentric visual data is arguably a fundamental
research topic in egocentric vision. However, existing object datasets are
either non-egocentric or have limitations in object categories, visual content,
and annotation granularities. In this work, we introduce EgoObjects, a
large-scale egocentric dataset for fine-grained object understanding. Its Pilot
version contains over 9K videos collected by 250 participants from 50+
countries using 4 wearable devices, and over 650K object annotations from 368
object categories. Unlike prior datasets containing only object category
labels, EgoObjects also annotates each object with an instance-level
identifier, and includes over 14K unique object instances. EgoObjects was
designed to capture the same object under diverse background complexities,
surrounding objects, distance, lighting and camera motion. In parallel to the
data collection, we conducted data annotation by developing a multi-stage
federated annotation process to accommodate the growing nature of the dataset.
To bootstrap the research on EgoObjects, we present a suite of 4 benchmark
tasks around the egocentric object understanding, including a novel instance
level- and the classical category level object detection. Moreover, we also
introduce 2 novel continual learning object detection tasks. The dataset and
API are available at https://github.com/facebookresearch/EgoObjects.Comment: ICCV 2023 final version and supplement. See more details in project
page: https://github.com/facebookresearch/EgoObject
- …