12,208 research outputs found

    K-Space at TRECVid 2007

    Get PDF
    In this paper we describe K-Space participation in TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance. The first of the two systems was a ā€˜shotā€™ based interface, where the results from a query were presented as a ranked list of shots. The second interface was ā€˜broadcastā€™ based, where results were presented as a ranked list of broadcasts. Both systems made use of the outputs of our high-level feature submission as well as low-level visual features

    Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements

    Full text link
    Emotion evoked by an advertisement plays a key role in influencing brand recall and eventual consumer choices. Automatic ad affect recognition has several useful applications. However, the use of content-based feature representations does not give insights into how affect is modulated by aspects such as the ad scene setting, salient object attributes and their interactions. Neither do such approaches inform us on how humans prioritize visual information for ad understanding. Our work addresses these lacunae by decomposing video content into detected objects, coarse scene structure, object statistics and actively attended objects identified via eye-gaze. We measure the importance of each of these information channels by systematically incorporating related information into ad affect prediction models. Contrary to the popular notion that ad affect hinges on the narrative and the clever use of linguistic and social cues, we find that actively attended objects and the coarse scene structure better encode affective information as compared to individual scene objects or conspicuous background elements.Comment: Accepted for publication in the Proceedings of 20th ACM International Conference on Multimodal Interaction, Boulder, CO, US

    Deformable Prototypes for Encoding Shape Categories in Image Databases

    Full text link
    We describe a method for shape-based image database search that uses deformable prototypes to represent categories. Rather than directly comparing a candidate shape with all shape entries in the database, shapes are compared in terms of the types of nonrigid deformations (differences) that relate them to a small subset of representative prototypes. To solve the shape correspondence and alignment problem, we employ the technique of modal matching, an information-preserving shape decomposition for matching, describing, and comparing shapes despite sensor variations and nonrigid deformations. In modal matching, shape is decomposed into an ordered basis of orthogonal principal components. We demonstrate the utility of this approach for shape comparison in 2-D image databases.Office of Naval Research (Young Investigator Award N00014-06-1-0661

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and ā€œenablersā€, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Dance-the-music : an educational platform for the modeling, recognition and audiovisual monitoring of dance steps using spatiotemporal motion templates

    Get PDF
    In this article, a computational platform is presented, entitled ā€œDance-the-Musicā€, that can be used in a dance educational context to explore and learn the basics of dance steps. By introducing a method based on spatiotemporal motion templates, the platform facilitates to train basic step models from sequentially repeated dance figures performed by a dance teacher. Movements are captured with an optical motion capture system. The teachersā€™ models can be visualized from a first-person perspective to instruct students how to perform the specific dance steps in the correct manner. Moreover, recognition algorithms-based on a template matching method can determine the quality of a studentā€™s performance in real time by means of multimodal monitoring techniques. The results of an evaluation study suggest that the Dance-the-Music is effective in helping dance students to master the basics of dance figures

    Geometric reasoning via internet crowdsourcing

    Get PDF
    The ability to interpret and reason about shapes is a peculiarly human capability that has proven difficult to reproduce algorithmically. So despite the fact that geometric modeling technology has made significant advances in the representation, display and modification of shapes, there have only been incremental advances in geometric reasoning. For example, although today's CAD systems can confidently identify isolated cylindrical holes, they struggle with more ambiguous tasks such as the identification of partial symmetries or similarities in arbitrary geometries. Even well defined problems such as 2D shape nesting or 3D packing generally resist elegant solution and rely instead on brute force explorations of a subset of the many possible solutions. Identifying economic ways to solving such problems would result in significant productivity gains across a wide range of industrial applications. The authors hypothesize that Internet Crowdsourcing might provide a pragmatic way of removing many geometric reasoning bottlenecks.This paper reports the results of experiments conducted with Amazon's mTurk site and designed to determine the feasibility of using Internet Crowdsourcing to carry out geometric reasoning tasks as well as establish some benchmark data for the quality, speed and costs of using this approach.After describing the general architecture and terminology of the mTurk Crowdsourcing system, the paper details the implementation and results of the following three investigations; 1) the identification of "Canonical" viewpoints for individual shapes, 2) the quantification of "similarity" relationships with-in collections of 3D models and 3) the efficient packing of 2D Strips into rectangular areas. The paper concludes with a discussion of the possibilities and limitations of the approach

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
    • ā€¦
    corecore