89,812 research outputs found
Semantic Image Retrieval via Active Grounding of Visual Situations
We describe a novel architecture for semantic image retrieval---in
particular, retrieval of instances of visual situations. Visual situations are
concepts such as "a boxing match," "walking the dog," "a crowd waiting for a
bus," or "a game of ping-pong," whose instantiations in images are linked more
by their common spatial and semantic structure than by low-level visual
similarity. Given a query situation description, our architecture---called
Situate---learns models capturing the visual features of expected objects as
well the expected spatial configuration of relationships among objects. Given a
new image, Situate uses these models in an attempt to ground (i.e., to create a
bounding box locating) each expected component of the situation in the image
via an active search procedure. Situate uses the resulting grounding to compute
a score indicating the degree to which the new image is judged to contain an
instance of the situation. Such scores can be used to rank images in a
collection as part of a retrieval system. In the preliminary study described
here, we demonstrate the promise of this system by comparing Situate's
performance with that of two baseline methods, as well as with a related
semantic image-retrieval system based on "scene graphs.
Deep Memory Networks for Attitude Identification
We consider the task of identifying attitudes towards a given set of entities
from text. Conventionally, this task is decomposed into two separate subtasks:
target detection that identifies whether each entity is mentioned in the text,
either explicitly or implicitly, and polarity classification that classifies
the exact sentiment towards an identified entity (the target) into positive,
negative, or neutral.
Instead, we show that attitude identification can be solved with an
end-to-end machine learning architecture, in which the two subtasks are
interleaved by a deep memory network. In this way, signals produced in target
detection provide clues for polarity classification, and reversely, the
predicted polarity provides feedback to the identification of targets.
Moreover, the treatments for the set of targets also influence each other --
the learned representations may share the same semantics for some targets but
vary for others. The proposed deep memory network, the AttNet, outperforms
methods that do not consider the interactions between the subtasks or those
among the targets, including conventional machine learning methods and the
state-of-the-art deep learning models.Comment: Accepted to WSDM'1
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach
The ever-growing number of people using Twitter makes it a valuable source of
timely information. However, detecting events in Twitter is a difficult task,
because tweets that report interesting events are overwhelmed by a large volume
of tweets on unrelated topics. Existing methods focus on the textual content of
tweets and ignore the social aspect of Twitter. In this paper we propose MABED
(i.e. mention-anomaly-based event detection), a novel statistical method that
relies solely on tweets and leverages the creation frequency of dynamic links
(i.e. mentions) that users insert in tweets to detect significant events and
estimate the magnitude of their impact over the crowd. MABED also differs from
the literature in that it dynamically estimates the period of time during which
each event is discussed, rather than assuming a predefined fixed duration for
all events. The experiments we conducted on both English and French Twitter
data show that the mention-anomaly-based approach leads to more accurate event
detection and improved robustness in presence of noisy Twitter content.
Qualitatively speaking, we find that MABED helps with the interpretation of
detected events by providing clear textual descriptions and precise temporal
descriptions. We also show how MABED can help understanding users' interest.
Furthermore, we describe three visualizations designed to favor an efficient
exploration of the detected events.Comment: 17 page
- …