8 research outputs found

    Using video objects and relevance feedback in video retrieval

    Get PDF
    Video retrieval is mostly based on using text from dialogue and this remains the most signiÂŻcant component, despite progress in other aspects. One problem with this is when a searcher wants to locate video based on what is appearing in the video rather than what is being spoken about. Alternatives such as automatically-detected features and image-based keyframe matching can be used, though these still need further improvement in quality. One other modality for video retrieval is based on segmenting objects from video and allowing end users to use these as part of querying. This uses similarity between query objects and objects from video, and in theory allows retrieval based on what is actually appearing on-screen. The main hurdles to greater use of this are the overhead of object segmentation on large amounts of video and the issue of whether we can actually achieve effective object-based retrieval. We describe a system to support object-based video retrieval where a user selects example video objects as part of the query. During a search a user builds up a set of these which are matched against objects previously segmented from a video library. This match is based on MPEG-7 Dominant Colour, Shape Compaction and Texture Browsing descriptors. We use a user-driven semi-automated segmentation process to segment the video archive which is very accurate and is faster than conventional video annotation

    Video scene retrieval based on local region features

    Get PDF

    Guiding object recognition: a shape model with co-activation networks

    Get PDF
    The goal of image understanding research is to develop techniques to automatically extract meaningful information from a population of images. This abstract goal manifests itself in a variety of application domains. Video understanding is a natural extension of image understanding. Many video understanding algorithms apply static-image algorithms to successive frames to identify patterns of consistency. This consumes a significant amount of irrelevant computation and may have erroneous results because static algorithms are not designed to indicate corresponding pixel locations between frames. Video is more than a collection of images, it is an ordered collection of images that exhibits temporal coherence, which is an additional feature like edges, colors, and textures. Motion information provides another level of visual information that can not be obtained from an isolated image. Leveraging motion cues prevents an algorithm from ?starting fresh? at each frame by focusing the region of attention. This approach is analogous to the attentional system of the human visual system. Relying on motion information alone is insufficient due to the aperture problem, where local motion information is ambiguous in at least one direction. Consequently, motion cues only provide leading and trailing motion edges and bottom-up approaches using gradient or region properties to complete moving regions are limited. Object recognition facilitates higher-level processing and is an integral component of image understanding. We present a components-based object detection and localization algorithm for static images. We show how this same system provides top-down segmentation for the detected object. We present a detailed analysis of the model dynamics during the localization process. This analysis shows consistent behavior in response to a variety of input, permitting model reduction and a substantial speed increase with little or no performance degradation. We present four specific enhancements to reduce false positives when instances of the target category are not present. First, a one-shot rule is used to discount coincident secondary hypotheses. Next, we demonstrate that the use of an entire shape model is inappropriate to localize any single instance and introduce the use of co-activation networks to represent the appropriate component relations for a particular recognition context. Next, we describe how the co-activation network can be combined with motion cues to overcome the aperture problem by providing context-specific, top-down shape information to achieve detection and segmentation in video. Finally, we present discriminating features arising from these enhancements and apply supervised learning techniques to embody the informational contribution of each approach to associate a confidence measure with each detection

    Use of Coherent Point Drift in computer vision applications

    Get PDF
    This thesis presents the novel use of Coherent Point Drift in improving the robustness of a number of computer vision applications. CPD approach includes two methods for registering two images - rigid and non-rigid point set approaches which are based on the transformation model used. The key characteristic of a rigid transformation is that the distance between points is preserved, which means it can be used in the presence of translation, rotation, and scaling. Non-rigid transformations - or affine transforms - provide the opportunity of registering under non-uniform scaling and skew. The idea is to move one point set coherently to align with the second point set. The CPD method finds both the non-rigid transformation and the correspondence distance between two point sets at the same time without having to use a-priori declaration of the transformation model used. The first part of this thesis is focused on speaker identification in video conferencing. A real-time, audio-coupled video based approach is presented, which focuses more on the video analysis side, rather than the audio analysis that is known to be prone to errors. CPD is effectively utilised for lip movement detection and a temporal face detection approach is used to minimise false positives if face detection algorithm fails to perform. The second part of the thesis is focused on multi-exposure and multi-focus image fusion with compensation for camera shake. Scale Invariant Feature Transforms (SIFT) are first used to detect keypoints in images being fused. Subsequently this point set is reduced to remove outliers, using RANSAC (RANdom Sample Consensus) and finally the point sets are registered using CPD with non-rigid transformations. The registered images are then fused with a Contourlet based image fusion algorithm that makes use of a novel alpha blending and filtering technique to minimise artefacts. The thesis evaluates the performance of the algorithm in comparison to a number of state-of-the-art approaches, including the key commercial products available in the market at present, showing significantly improved subjective quality in the fused images. The final part of the thesis presents a novel approach to Vehicle Make & Model Recognition in CCTV video footage. CPD is used to effectively remove skew of vehicles detected as CCTV cameras are not specifically configured for the VMMR task and may capture vehicles at different approaching angles. A LESH (Local Energy Shape Histogram) feature based approach is used for vehicle make and model recognition with the novelty that temporal processing is used to improve reliability. A number of further algorithms are used to maximise the reliability of the final outcome. Experimental results are provided to prove that the proposed system demonstrates an accuracy in excess of 95% when tested on real CCTV footage with no prior camera calibration

    Object Discovery with a Mobile Robot

    Get PDF
    <p>The world is full of objects: cups, phones, computers, books, and</p><p>countless other things. For many tasks, robots need to understand that</p><p>this object is a stapler, that object is a textbook, and this other</p><p>object is a gallon of milk. The classic approach to this problem is</p><p>object recognition, which classifies each observation into one of</p><p>several previously-defined classes. While modern object recognition</p><p>algorithms perform well, they require extensive supervised training:</p><p>in a standard benchmark, the training data average more than four</p><p>hundred images of each object class.</p><p>The cost of manually labeling the training data prohibits these</p><p>techniques from scaling to general environments. Homes and workplaces</p><p>can contain hundreds of unique objects, and the objects in one</p><p>environment may not appear in another.</p><p>We propose a different approach: object discovery. Rather than rely on</p><p>manual labeling, we describe unsupervised algorithms that leverage the</p><p>unique capabilities of a mobile robot to discover the objects (and</p><p>classes of objects) in an environment. Because our algorithms are</p><p>unsupervised, they scale gracefully to large, general environments</p><p>over long periods of time. To validate our results, we collected 67</p><p>robotic runs through a large office environment. This dataset, which</p><p>we have made available to the community, is the largest of its kind.</p><p>At each step, we treat the problem as one of robotics, not disembodied</p><p>computer vision. The scale and quality of our results demonstrate the</p><p>merit of this perspective, and prove the practicality of long-term</p><p>large-scale object discovery.</p>Dissertatio

    Large Scale Pattern Detection in Videos and Images from the Wild

    Get PDF
    PhDPattern detection is a well-studied area of computer vision, but still current methods are unstable in images of poor quality. This thesis describes improvements over contemporary methods in the fast detection of unseen patterns in a large corpus of videos that vary tremendously in colour and texture definition, captured “in the wild” by mobile devices and surveillance cameras. We focus on three key areas of this broad subject; First, we identify consistency weaknesses in existing techniques of processing an image and it’s horizontally reflected (mirror) image. This is important in police investigations where subjects change their appearance to try to avoid recognition, and we propose that invariance to horizontal reflection should be more widely considered in image description and recognition tasks too. We observe online Deep Learning system behaviours in this respect, and provide a comprehensive assessment of 10 popular low level feature detectors. Second, we develop simple and fast algorithms that combine to provide memory- and processing-efficient feature matching. These involve static scene elimination in the presence of noise and on-screen time indicators, a blur-sensitive feature detection that finds a greater number of corresponding features in images of varying sharpness, and a combinatorial texture and colour feature matching algorithm that matches features when either attribute may be poorly defined. A comprehensive evaluation is given, showing some improvements over existing feature correspondence methods. Finally, we study random decision forests for pattern detection. A new method of indexing patterns in video sequences is devised and evaluated. We automatically label positive and negative image training data, reducing a task of unsupervised learning to one of supervised learning, and devise a node split function that is invariant to mirror reflection and rotation through 90 degree angles. A high dimensional vote accumulator encodes the hypothesis support, yielding implicit back-projection for pattern detection.European Union’s Seventh Framework Programme, specific topic “framework and tools for (semi-) automated exploitation of massive amounts of digital data for forensic purposes”, under grant agreement number 607480 (LASIE IP project)

    Deliverable D1.1 State of the art and requirements analysis for hypervideo

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary

    An object-based approach to retrieval of image and video content

    Get PDF
    Promising new directions have been opened up for content-based visual retrieval in recent years. Object-based retrieval which allows users to manipulate video objects as part of their searching and browsing interaction, is one of these. It is the purpose of this thesis to constitute itself as a part of a larger stream of research that investigates visual objects as a possible approach to advancing the use of semantics in content-based visual retrieval. The notion of using objects in video retrieval has been seen as desirable for some years, but only very recently has technology started to allow even very basic object-location functions on video. The main hurdles to greater use of objects in video retrieval are the overhead of object segmentation on large amounts of video and the issue of whether objects can actually be used efficiently for multimedia retrieval. Despite this, there are already some examples of work which supports retrieval based on video objects. This thesis investigates an object-based approach to content-based visual retrieval. The main research contributions of this work are a study of shot boundary detection on compressed domain video where a fast detection approach is proposed and evaluated, and a study on the use of objects in interactive image retrieval. An object-based retrieval framework is developed in order to investigate object-based retrieval on a corpus of natural image and video. This framework contains the entire processing chain required to analyse, index and interactively retrieve images and video via object-to-object matching. The experimental results indicate that object-based searching consistently outperforms image-based search using low-level features. This result goes some way towards validating the approach of allowing users to select objects as a basis for searching video archives when the information need dictates it as appropriate
    corecore