8,801 research outputs found
Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos
We propose a new zero-shot Event Detection method by Multi-modal
Distributional Semantic embedding of videos. Our model embeds object and action
concepts as well as other available modalities from videos into a
distributional semantic space. To our knowledge, this is the first Zero-Shot
event detection model that is built on top of distributional semantics and
extends it in the following directions: (a) semantic embedding of multimodal
information in videos (with focus on the visual modalities), (b) automatically
determining relevance of concepts/attributes to a free text query, which could
be useful for other applications, and (c) retrieving videos by free text event
query (e.g., "changing a vehicle tire") based on their content. We embed videos
into a distributional semantic space and then measure the similarity between
videos and the event query in a free text form. We validated our method on the
large TRECVID MED (Multimedia Event Detection) challenge. Using only the event
title as a query, our method outperformed the state-of-the-art that uses big
descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC
metric. It is also an order of magnitude faster.Comment: To appear in AAAI 201
Topic modeling for conference analytics
This work presents our attempt to understand the research topics that characterize the papers submitted to a conference, by using topic modeling and data visualization techniques. We infer the latent topics from the abstracts of all the papers submitted to Interspeech2014 by means of Latent Dirichlet Allocation. Pertopic word distributions thus obtained are visualized through word clouds. We also compare the automatically inferred topics against the expert-defined topics (also known as tracks for Interspeech2014). The comparison is based on an information retrieval framework, where we use each latent topic as a query and each track as a document. For each latent topic, we retrieve a ranked list of tracks scored by the degree of word overlap. Each latent topic is associated with the top-scoring track. This analytic procedure was applied to all submissions to Interspeech2014 and sheds some interesting light in terms of providing an overview of topic categorization in the conference, popular versus unpopular topics, emerging topics and topic compositions. Such insights are potentially valuable for understanding the technical content of a field and planning the future development of its conference(s)
The Lowlands team at TRECVID 2007
In this report we summarize our methods and results for the search tasks in\ud
TRECVID 2007. We employ two different kinds of search: purely ASR based and\ud
purely concept based search. However, there is not significant difference of the\ud
performance of the two systems. Using neighboring shots for the combination of\ud
two concepts seems to be beneficial. General preprocessing of queries increased\ud
the performance and choosing detector sources helped. However, for all automatic\ud
search components we need to perform further investigations
Conceptual spatial representations for indoor mobile robots
We present an approach for creating conceptual representations of human-made indoor environments using mobile
robots. The concepts refer to spatial and functional properties of typical indoor environments. Following ļ¬ndings
in cognitive psychology, our model is composed of layers representing maps at diļ¬erent levels of abstraction. The
complete system is integrated in a mobile robot endowed with laser and vision sensors for place and object recognition.
The system also incorporates a linguistic framework that actively supports the map acquisition process, and which
is used for situated dialogue. Finally, we discuss the capabilities of the integrated system
- ā¦