163,497 research outputs found
IoT2Vec: Identification of Similar IoT Devices via Activity Footprints
We consider a smart home or smart office environment with a number of IoT
devices connected and passing data between one another. The footprints of the
data transferred can provide valuable information about the devices, which can
be used to (a) identify the IoT devices and (b) in case of failure, to identify
the correct replacements for these devices. In this paper, we generate the
embeddings for IoT devices in a smart home using Word2Vec, and explore the
possibility of having a similar concept for IoT devices, aka IoT2Vec. These
embeddings can be used in a number of ways, such as to find similar devices in
an IoT device store, or as a signature of each type of IoT device. We show
results of a feasibility study on the CASAS dataset of IoT device activity
logs, using our method to identify the patterns in embeddings of various types
of IoT devices in a household.Comment: 5 pages, 4 figure
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
Generating captions for images is a task that has recently received
considerable attention. In this work we focus on caption generation for
abstract scenes, or object layouts where the only information provided is a set
of objects and their locations. We propose OBJ2TEXT, a sequence-to-sequence
model that encodes a set of objects and their locations as an input sequence
using an LSTM network, and decodes this representation using an LSTM language
model. We show that our model, despite encoding object layouts as a sequence,
can represent spatial relationships between objects, and generate descriptions
that are globally coherent and semantically relevant. We test our approach in a
task of object-layout captioning by using only object annotations as inputs. We
additionally show that our model, combined with a state-of-the-art object
detector, improves an image captioning model from 0.863 to 0.950 (CIDEr score)
in the test benchmark of the standard MS-COCO Captioning task.Comment: Accepted at EMNLP 201
Video Storytelling: Textual Summaries for Events
Bridging vision and natural language is a longstanding goal in computer
vision and multimedia research. While earlier works focus on generating a
single-sentence description for visual content, recent works have studied
paragraph generation. In this work, we introduce the problem of video
storytelling, which aims at generating coherent and succinct stories for long
videos. Video storytelling introduces new challenges, mainly due to the
diversity of the story and the length and complexity of the video. We propose
novel methods to address the challenges. First, we propose a context-aware
framework for multimodal embedding learning, where we design a Residual
Bidirectional Recurrent Neural Network to leverage contextual information from
past and future. Second, we propose a Narrator model to discover the underlying
storyline. The Narrator is formulated as a reinforcement learning agent which
is trained by directly optimizing the textual metric of the generated story. We
evaluate our method on the Video Story dataset, a new dataset that we have
collected to enable the study. We compare our method with multiple
state-of-the-art baselines, and show that our method achieves better
performance, in terms of quantitative measures and user study.Comment: Published in IEEE Transactions on Multimedi
- …