13,484 research outputs found

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Fast algorithm for the 3-D DCT-II

    Get PDF
    Recently, many applications for three-dimensional (3-D) image and video compression have been proposed using 3-D discrete cosine transforms (3-D DCTs). Among different types of DCTs, the type-II DCT (DCT-II) is the most used. In order to use the 3-D DCTs in practical applications, fast 3-D algorithms are essential. Therefore, in this paper, the 3-D vector-radix decimation-in-frequency (3-D VR DIF) algorithm that calculates the 3-D DCT-II directly is introduced. The mathematical analysis and the implementation of the developed algorithm are presented, showing that this algorithm possesses a regular structure, can be implemented in-place for efficient use of memory, and is faster than the conventional row-column-frame (RCF) approach. Furthermore, an application of 3-D video compression-based 3-D DCT-II is implemented using the 3-D new algorithm. This has led to a substantial speed improvement for 3-D DCT-II-based compression systems and proved the validity of the developed algorithm

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Data Modeling and Hybrid Query for Video Database

    Get PDF
    Video data management is important since the effective use of video in multimedia applications is often impeded by the difficulty in cataloging and managing video data. Major aspects of video data management include data modelling, indexing and querying. Modelling is concerned with representing the structural properties of video as well as its content. A video data model should be expressive enough to capture several characteristics inherent to video. Depending on the underlying data model, video can be indexed by text for describing semantics or by their low-level visual features such as colour. It is not reasonable to assume that all types of multimedia data can be described sufficiently with words alone. Although query by text annotations complements query by low-level features, query formulation in existing systems is still done separately. Existing systems do not support combination of these two types of queries since there are essential differences between querying multimedia data and traditional databases. These differences cause us to consider new types of queries. The purpose of this research is to model video data that would allow users to formulate queries using hybrid query mechanism. In this research, we define a video data model that captures the hierarchical structure and contents of video. Based on this data model, we design and develop a Video Database System (VDBS). We compared query formulation using single types against a hybrid query type. Results of the hybrid query type are better than the single query types. We extend the Structured Query Language (SQL) to support video functions and design a visual query interface for supporting hybrid queries, which is a combination of exact and similarity-based queries. Our research contributions include a video data model that captures the hierarchical structure of video (sequence, scene, shot and key frame), as well as high-level concepts (object, activity, event) and low-level visual features (colour, texture, shape and location). By introducing video functions, the extended SQL supports queries on video segments, semantic as well as low-level visual features. The hybrid query formulation has allowed the combination of query by text and query by example in a single query statement. We have designed a visual query interface that would facilitate the hybrid query formulation. In addition we have proposed a video database system architecture that includes shot detection, annotation and query formulation modules. Further works consider the implementation and integration of these modules with other attributes of video data such as spatio-temporal and object motion
    corecore