3,487 research outputs found

    Advanced content-based semantic scene analysis and information retrieval: the SCHEMA project

    Get PDF
    The aim of the SCHEMA Network of Excellence is to bring together a critical mass of universities, research centers, industrial partners and end users, in order to design a reference system for content-based semantic scene analysis, interpretation and understanding. Relevant research areas include: content-based multimedia analysis and automatic annotation of semantic multimedia content, combined textual and multimedia information retrieval, semantic -web, MPEG-7 and MPEG-21 standards, user interfaces and human factors. In this paper, recent advances in content-based analysis, indexing and retrieval of digital media within the SCHEMA Network are presented. These advances will be integrated in the SCHEMA module-based, expandable reference system

    Highly accurate texture-based vehicle segmentation method

    Get PDF
    In modern traffic surveillance, computer vision methods have often been employed to detect vehicles of interest because of the rich information content contained in an image. Segmentation of moving vehicles using image processing and analysis algorithms has been an important research topic in the past decade. However, segmentation results are strongly affected by two issues: moving cast shadows and reflective regions, both of which reduce accuracy and require postprocessing to alleviate the degradation. We propose an efficient and highly accurate texture-based method for extracting the boundary of vehicles from the stationary background that is free from the effect of moving cast shadows and reflective regions. The segmentation method utilizes the differences in textural property between the road, vehicle cast shadow, reflection on the vehicle, and the vehicle itself, rather than just the intensity differences between them. By further combining the luminance and chrominance properties into an OR map, a number of foreground vehicle masks are constructed through a series of morphological operations, where each mask describes the outline of a moving vehicle. The proposed method has been tested on real-world traffic image sequences and achieved an average error rate of 3.44% for 50 tested vehicle images. © 2004 Society of Photo-Optical Instrumentation Engineers.published_or_final_versio

    A practical vision system for the detection of moving objects

    Get PDF
    The main goal of this thesis is to review and offer robust and efficient algorithms for the detection (or the segmentation) of foreground objects in indoor and outdoor scenes using colour image sequences captured by a stationary camera. For this purpose, the block diagram of a simple vision system is offered in Chapter 2. First this block diagram gives the idea of a precise order of blocks and their tasks, which should be performed to detect moving foreground objects. Second, a check mark () on the top right corner of a block indicates that this thesis contains a review of the most recent algorithms and/or some relevant research about it. In many computer vision applications, segmenting and extraction of moving objects in video sequences is an essential task. Background subtraction has been widely used for this purpose as the first step. In this work, a review of the efficiency of a number of important background subtraction and modelling algorithms, along with their major features, are presented. In addition, two background approaches are offered. The first approach is a Pixel-based technique whereas the second one works at object level. For each approach, three algorithms are presented. They are called Selective Update Using Non-Foreground Pixels of the Input Image , Selective Update Using Temporal Averaging and Selective Update Using Temporal Median , respectively in this thesis. The first approach has some deficiencies, which makes it incapable to produce a correct dynamic background. Three methods of the second approach use an invariant colour filter and a suitable motion tracking technique, which selectively exclude foreground objects (or blobs) from the background frames. The difference between the three algorithms of the second approach is in updating process of the background pixels. It is shown that the Selective Update Using Temporal Median method produces the correct background image for each input frame. Representing foreground regions using their boundaries is also an important task. Thus, an appropriate RLE contour tracing algorithm has been implemented for this purpose. However, after the thresholding process, the boundaries of foreground regions often have jagged appearances. Thus, foreground regions may not correctly be recognised reliably due to their corrupted boundaries. A very efficient boundary smoothing method based on the RLE data is proposed in Chapter 7. It just smoothes the external and internal boundaries of foreground objects and does not distort the silhouettes of foreground objects. As a result, it is very fast and does not blur the image. Finally, the goal of this thesis has been presenting simple, practical and efficient algorithms with little constraints which can run in real time

    MonoPerfCap: Human Performance Capture from Monocular Video

    Full text link
    We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

    Vision-Based Production of Personalized Video

    No full text
    In this paper we present a novel vision-based system for the automated production of personalised video souvenirs for visitors in leisure and cultural heritage venues. Visitors are visually identified and tracked through a camera network. The system produces a personalized DVD souvenir at the end of a visitor’s stay allowing visitors to relive their experiences. We analyze how we identify visitors by fusing facial and body features, how we track visitors, how the tracker recovers from failures due to occlusions, as well as how we annotate and compile the final product. Our experiments demonstrate the feasibility of the proposed approach
    • 

    corecore