10 research outputs found
Saying What You're Looking For: Linguistics Meets Video Search
We present an approach to searching large video corpora for video clips which
depict a natural-language query in the form of a sentence. This approach uses
compositional semantics to encode subtle meaning that is lost in other systems,
such as the difference between two sentences which have identical words but
entirely different meaning: "The person rode the horse} vs. \emph{The horse
rode the person". Given a video-sentence pair and a natural-language parser,
along with a grammar that describes the space of sentential queries, we produce
a score which indicates how well the video depicts the sentence. We produce
such a score for each video clip in a corpus and return a ranked list of clips.
Furthermore, this approach addresses two fundamental problems simultaneously:
detecting and tracking objects, and recognizing whether those tracks depict the
query. Because both tracking and object detection are unreliable, this uses
knowledge about the intended sentential query to focus the tracker on the
relevant participants and ensures that the resulting tracks are described by
the sentential query. While earlier work was limited to single-word queries
which correspond to either verbs or nouns, we show how one can search for
complex queries which contain multiple phrases, such as prepositional phrases,
and modifiers, such as adverbs. We demonstrate this approach by searching for
141 queries involving people and horses interacting with each other in 10
full-length Hollywood movies.Comment: 13 pages, 8 figure
Recognition and localization of relevant human behavior in videos, SPIE,
ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we present an improved system that can recognize actions of a human and interactions between multiple humans. Central to the new system is our agent-based architecture. The system is trained on thousands of videos and evaluated on realistic persistent surveillance data in the DARPA Mind's Eye program, with hours of videos of challenging scenes. The results show that our system is able to track the people, detect and localize events, and discriminate between different behaviors, and it performs 3.4 times better than our previous system
Joint Tracking and Event Analysis for Carried Object Detection
This paper proposes a novel method for jointly estimating the track of a moving object and the events in which it participates. The method is intended for dealing with generic objects that are hard to localise and track with the performance of current detection algorithms - our focus is on events involving carried objects. The tracks for other objects with which the target object interacts (e.g. the carrying person) are assumed to be given. The method is posed as maximisation of a posterior probability defined over event sequences and temporally-disjoint subsets of the tracklets from an earlier tracking process. The probability function is a Hidden Markov Model coupled with a term that penalises non-smooth tracks and large gaps in the observed data. We evaluate the method using tracklets output by three state of the art trackers on the new created MINDSEYE2015 dataset and demonstrate improved performance
Language-as-skill Approach in Foreign Language Education: A Phenomenological Study
The purpose of this qualitative phenomenological study was to understand foreign language educators\u27 lived experience of language-as-skill that focuses on language use. The central research question explored the foreign language educators\u27 experiences and perspectives on the concept of language acquisition as a type of skill acquisition. In addition, the researcher investigated foreign language educators\u27 language-as-knowledge and language-as-skill methodologies. This study also aimed to discover how the language-as-skill with advanced technology could be a way to address the contemporary challenges in foreign language education for learners and improve learners\u27 communicative competence to thrive in a globalized world with diversity. A transcendental phenomenological study design was selected to explicate the essence of human understanding. At this stage in the research, skill acquisition views Language learning as other cognitive skills development, such as how people learn to play the piano or drive a car. The theory guiding this study was DeKeyser\u27s skill acquisition theory, which explained the relationship between skill development and Language acquisition. In this study, 10 foreign language teachers from a local language training school became participants in semi-structured interviews, classroom observations, and document analysis. Data that were collected from the interviews, documentation, and observations were reviewed, grouped, coded, and reported as faithfully as possible to the participants\u27 experiences and perceptions of this phenomenological study
Simultaneous object detection, tracking, and event recognition,” arXiv 1204 2741,
Abstract Integrating information across modalities is a long-standing challenge for cognitive systems. The common internal structure and algorithmic organization of object detection, detection-based tracking, and event recognition facilitates a general approach to integrating these three components. This supports multidirectional information flow between these components allowing object detection to influence tracking and event recognition; and event recognition to influence tracking and object detection. The performance of the combination can exceed the performance of the components in isolation when inspecting the quality of the object tracks produced. We demonstrate this qualitatively on a number of videos which show how failures in each of the components are resolved when they are integrated together. This can be done with linear asymptotic complexity