24,159 research outputs found
Semantic bottleneck for computer vision tasks
This paper introduces a novel method for the representation of images that is
semantic by nature, addressing the question of computation intelligibility in
computer vision tasks. More specifically, our proposition is to introduce what
we call a semantic bottleneck in the processing pipeline, which is a crossing
point in which the representation of the image is entirely expressed with
natural language , while retaining the efficiency of numerical representations.
We show that our approach is able to generate semantic representations that
give state-of-the-art results on semantic content-based image retrieval and
also perform very well on image classification tasks. Intelligibility is
evaluated through user centered experiments for failure detection
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
Towards memory supporting personal information management tools
In this article we discuss re-retrieving personal information objects and relate the task to recovering from lapse(s) in memory. We propose that fundamentally it is lapses in memory that impede users from successfully re-finding the information they need. Our hypothesis is that by learning more about memory lapses in non-computing contexts and how people cope and recover from these lapses, we can better inform the design of PIM tools and improve the user's ability to re-access and re-use objects. We describe a diary study that investigates the everyday memory problems of 25 people from a wide range of backgrounds. Based on the findings, we present a series of principles that we hypothesize will improve the design of personal information management tools. This hypothesis is validated by an evaluation of a tool for managing personal photographs, which was designed with respect to our findings. The evaluation suggests that users' performance when re-finding objects can be improved by building personal information management tools to support characteristics of human memory
K-Space at TRECVid 2007
In this paper we describe K-Space participation in
TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance.
The first of the two systems was a ‘shot’ based interface,
where the results from a query were presented as a ranked
list of shots. The second interface was ‘broadcast’ based,
where results were presented as a ranked list of broadcasts.
Both systems made use of the outputs of our high-level feature submission as well as low-level visual features
Lifelong Learning of Spatiotemporal Representations with Dual-Memory Recurrent Self-Organization
Artificial autonomous agents and robots interacting in complex environments
are required to continually acquire and fine-tune knowledge over sustained
periods of time. The ability to learn from continuous streams of information is
referred to as lifelong learning and represents a long-standing challenge for
neural network models due to catastrophic forgetting. Computational models of
lifelong learning typically alleviate catastrophic forgetting in experimental
scenarios with given datasets of static images and limited complexity, thereby
differing significantly from the conditions artificial agents are exposed to.
In more natural settings, sequential information may become progressively
available over time and access to previous experience may be restricted. In
this paper, we propose a dual-memory self-organizing architecture for lifelong
learning scenarios. The architecture comprises two growing recurrent networks
with the complementary tasks of learning object instances (episodic memory) and
categories (semantic memory). Both growing networks can expand in response to
novel sensory experience: the episodic memory learns fine-grained
spatiotemporal representations of object instances in an unsupervised fashion
while the semantic memory uses task-relevant signals to regulate structural
plasticity levels and develop more compact representations from episodic
experience. For the consolidation of knowledge in the absence of external
sensory input, the episodic memory periodically replays trajectories of neural
reactivations. We evaluate the proposed model on the CORe50 benchmark dataset
for continuous object recognition, showing that we significantly outperform
current methods of lifelong learning in three different incremental learning
scenario
- …