123 research outputs found
Seeing What You're Told: Sentence-Guided Activity Recognition In Video
We present a system that demonstrates how the compositional structure of
events, in concert with the compositional structure of language, can interplay
with the underlying focusing mechanisms in video action recognition, thereby
providing a medium, not only for top-down and bottom-up integration, but also
for multi-modal integration between vision and language. We show how the roles
played by participants (nouns), their characteristics (adjectives), the actions
performed (verbs), the manner of such actions (adverbs), and changing spatial
relations between participants (prepositions) in the form of whole sentential
descriptions mediated by a grammar, guides the activity-recognition process.
Further, the utility and expressiveness of our framework is demonstrated by
performing three separate tasks in the domain of multi-activity videos:
sentence-guided focus of attention, generation of sentential descriptions of
video, and query-based video search, simply by leveraging the framework in
different manners.Comment: To appear in CVPR 201
Saying What You're Looking For: Linguistics Meets Video Search
We present an approach to searching large video corpora for video clips which
depict a natural-language query in the form of a sentence. This approach uses
compositional semantics to encode subtle meaning that is lost in other systems,
such as the difference between two sentences which have identical words but
entirely different meaning: "The person rode the horse} vs. \emph{The horse
rode the person". Given a video-sentence pair and a natural-language parser,
along with a grammar that describes the space of sentential queries, we produce
a score which indicates how well the video depicts the sentence. We produce
such a score for each video clip in a corpus and return a ranked list of clips.
Furthermore, this approach addresses two fundamental problems simultaneously:
detecting and tracking objects, and recognizing whether those tracks depict the
query. Because both tracking and object detection are unreliable, this uses
knowledge about the intended sentential query to focus the tracker on the
relevant participants and ensures that the resulting tracks are described by
the sentential query. While earlier work was limited to single-word queries
which correspond to either verbs or nouns, we show how one can search for
complex queries which contain multiple phrases, such as prepositional phrases,
and modifiers, such as adverbs. We demonstrate this approach by searching for
141 queries involving people and horses interacting with each other in 10
full-length Hollywood movies.Comment: 13 pages, 8 figure
AD in Fortran, Part 1: Design
We propose extensions to Fortran which integrate forward and reverse
Automatic Differentiation (AD) directly into the programming model.
Irrespective of implementation technology, embedding AD constructs directly
into the language extends the reach and convenience of AD while allowing
abstraction of concepts of interest to scientific-computing practice, such as
root finding, optimization, and finding equilibria of continuous games.
Multiple different subprograms for these tasks can share common interfaces,
regardless of whether and how they use AD internally. A programmer can maximize
a function F by calling a library maximizer, XSTAR=ARGMAX(F,X0), which
internally constructs derivatives of F by AD, without having to learn how to
use any particular AD tool. We illustrate the utility of these extensions by
example: programs become much more concise and closer to traditional
mathematical notation. A companion paper describes how these extensions can be
implemented by a program that generates input to existing Fortran-based AD
tools
AD in Fortran, Part 2: Implementation via Prepreprocessor
We describe an implementation of the Farfel Fortran AD extensions. These
extensions integrate forward and reverse AD directly into the programming
model, with attendant benefits to flexibility, modularity, and ease of use. The
implementation we describe is a "prepreprocessor" that generates input to
existing Fortran-based AD tools. In essence, blocks of code which are targeted
for AD by Farfel constructs are put into subprograms which capture their
lexical variable context, and these are closure-converted into top-level
subprograms and specialized to eliminate EXTERNAL arguments, rendering them
amenable to existing AD preprocessors, which are then invoked, possibly
repeatedly if the AD is nested
- …