10 research outputs found

    Saying What You're Looking For: Linguistics Meets Video Search

    Full text link
    We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but entirely different meaning: "The person rode the horse} vs. \emph{The horse rode the person". Given a video-sentence pair and a natural-language parser, along with a grammar that describes the space of sentential queries, we produce a score which indicates how well the video depicts the sentence. We produce such a score for each video clip in a corpus and return a ranked list of clips. Furthermore, this approach addresses two fundamental problems simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, this uses knowledge about the intended sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While earlier work was limited to single-word queries which correspond to either verbs or nouns, we show how one can search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 141 queries involving people and horses interacting with each other in 10 full-length Hollywood movies.Comment: 13 pages, 8 figure

    Recognition and localization of relevant human behavior in videos, SPIE,

    Get PDF
    ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we present an improved system that can recognize actions of a human and interactions between multiple humans. Central to the new system is our agent-based architecture. The system is trained on thousands of videos and evaluated on realistic persistent surveillance data in the DARPA Mind's Eye program, with hours of videos of challenging scenes. The results show that our system is able to track the people, detect and localize events, and discriminate between different behaviors, and it performs 3.4 times better than our previous system

    Joint Tracking and Event Analysis for Carried Object Detection

    Get PDF
    This paper proposes a novel method for jointly estimating the track of a moving object and the events in which it participates. The method is intended for dealing with generic objects that are hard to localise and track with the performance of current detection algorithms - our focus is on events involving carried objects. The tracks for other objects with which the target object interacts (e.g. the carrying person) are assumed to be given. The method is posed as maximisation of a posterior probability defined over event sequences and temporally-disjoint subsets of the tracklets from an earlier tracking process. The probability function is a Hidden Markov Model coupled with a term that penalises non-smooth tracks and large gaps in the observed data. We evaluate the method using tracklets output by three state of the art trackers on the new created MINDSEYE2015 dataset and demonstrate improved performance

    Language-as-skill Approach in Foreign Language Education: A Phenomenological Study

    Get PDF
    The purpose of this qualitative phenomenological study was to understand foreign language educators\u27 lived experience of language-as-skill that focuses on language use. The central research question explored the foreign language educators\u27 experiences and perspectives on the concept of language acquisition as a type of skill acquisition. In addition, the researcher investigated foreign language educators\u27 language-as-knowledge and language-as-skill methodologies. This study also aimed to discover how the language-as-skill with advanced technology could be a way to address the contemporary challenges in foreign language education for learners and improve learners\u27 communicative competence to thrive in a globalized world with diversity. A transcendental phenomenological study design was selected to explicate the essence of human understanding. At this stage in the research, skill acquisition views Language learning as other cognitive skills development, such as how people learn to play the piano or drive a car. The theory guiding this study was DeKeyser\u27s skill acquisition theory, which explained the relationship between skill development and Language acquisition. In this study, 10 foreign language teachers from a local language training school became participants in semi-structured interviews, classroom observations, and document analysis. Data that were collected from the interviews, documentation, and observations were reviewed, grouped, coded, and reported as faithfully as possible to the participants\u27 experiences and perceptions of this phenomenological study

    Simultaneous object detection, tracking, and event recognition,” arXiv 1204 2741,

    No full text
    Abstract Integrating information across modalities is a long-standing challenge for cognitive systems. The common internal structure and algorithmic organization of object detection, detection-based tracking, and event recognition facilitates a general approach to integrating these three components. This supports multidirectional information flow between these components allowing object detection to influence tracking and event recognition; and event recognition to influence tracking and object detection. The performance of the combination can exceed the performance of the components in isolation when inspecting the quality of the object tracks produced. We demonstrate this qualitatively on a number of videos which show how failures in each of the components are resolved when they are integrated together. This can be done with linear asymptotic complexity
    corecore