26,499 research outputs found

    Text Extraction in Video

    Get PDF
    The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization of visual information. The current state of the art for extracting text from video either makes simplistic assumptions as to the nature of the text to be found, or restricts itself to a subclass of the wide variety of text that can occur in broadcast video. Most published methods only work on artificial text (captions) that is composited on the video frame. Also, these methods have been developed for extracting text from images that have been applied to video frames. They do not use the additional temporal information in video to good effect.This thesis presents a reliable system for detecting, localizing, extracting, tracking and binarizing text from unconstrained, general-purpose video. In developing methods for extraction of text from video it was observed that no single algorithm could detect all forms of text. The strategy is to have a multi-pronged approach to the problem, one that involves multiple methods, and algorithms operating in functional parallelism. The system utilizes the temporal information available in video. The system can operate on JPEG images, MPEG-1 bit streams, as well as live video feeds. It is also possible to operate the methods individually and independently

    Retrieval of high-dimensional visual data: current state, trends and challenges ahead

    Get PDF
    Information retrieval algorithms have changed the way we manage and use various data sources, such as images, music or multimedia collections. First, free text information of documents from varying sources became accessible in addition to structured data in databases, initially for exact search and then for more probabilistic models. Novel approaches enable content-based visual search of images using computerized image analysis making visual image content searchable without requiring high quality manual annotations. Other multimedia data followed such as video and music retrieval, sometimes based on techniques such as extracting objects and classifying genre. 3D (surface) objects and solid textures have also been produced in quickly increasing quantities, for example in medical tomographic imaging. For these two types of 3D information sources, systems have become available to characterize the objects or textures and search for similar visual content in large databases. With 3D moving sequences (i.e., 4D), in particular medical imaging, even higher-dimensional data have become available for analysis and retrieval and currently present many multimedia retrieval challenges. This article systematically reviews current techniques in various fields of 3D and 4D visual information retrieval and analyses the currently dominating application areas. The employed techniques are analysed and regrouped to highlight similarities and complementarities among them in order to guide the choice of optimal approaches for new 3D and 4D retrieval problems. Opportunities for future applications conclude the article. 3D or higher-dimensional visual information retrieval is expected to grow quickly in the coming years and in this respect this article can serve as a basis for designing new applications

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Measuring concept similarities in multimedia ontologies: analysis and evaluations

    Get PDF
    The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
    corecore