4 research outputs found
VISUAL SEMANTIC SEGMENTATION AND ITS APPLICATIONS
This dissertation addresses the difficulties of semantic segmentation when dealing with an extensive collection of images and 3D point clouds. Due to the ubiquity of digital cameras that help capture the world around us, as well as the advanced scanning techniques that are able to record 3D replicas of real cities, the sheer amount of visual data available presents many opportunities for both academic research and industrial applications. But the mere quantity of data also poses a tremendous challenge. In particular, the problem of distilling useful information from such a large repository of visual data has attracted ongoing interests in the fields of computer vision and data mining.
Structural Semantics are fundamental to understanding both natural and man-made objects. Buildings, for example, are like languages in that they are made up of repeated structures or patterns that can be captured in images. In order to find these recurring patterns in images, I present an unsupervised frequent visual pattern mining approach that goes beyond co-location to identify spatially coherent visual patterns, regardless of their shape, size, locations and orientation.
First, my approach categorizes visual items from scale-invariant image primitives with similar appearance using a suite of polynomial-time algorithms that have been designed to identify consistent structural associations among visual items, representing frequent visual patterns. After detecting repetitive image patterns, I use unsupervised and automatic segmentation of the identified patterns to generate more semantically meaningful representations. The underlying assumption is that pixels capturing the same portion of image patterns are visually consistent, while pixels that come from different backdrops are usually inconsistent. I further extend this approach to perform automatic segmentation of foreground objects from an Internet photo collection of landmark locations.
New scanning technologies have successfully advanced the digital acquisition of large-scale urban landscapes. In addressing semantic segmentation and reconstruction of this data using LiDAR point clouds and geo-registered images of large-scale residential areas, I develop a complete system that simultaneously uses classification and segmentation methods to first identify different object categories and then apply category-specific reconstruction techniques to create visually pleasing and complete scene models
Deliverable D1.1 State of the art and requirements analysis for hypervideo
This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary
Recommended from our members
Damage detection and monitoring for tunnel inspection based on computer vision
The deterioration of the underground infrastructure of the major cities around the world, due to ageing, has become a topic of great concern among engineers. Visual inspection, as part of the routine maintenance procedures, is a common practice used in the condition assessment of infrastructure to ensure its safety and serviceability. This practice, however, is labour-intensive, costly and inaccurate and, therefore, a new system based on computer vision technology is presented in this thesis, aiming to tackle these inadequacies.
This thesis proposes a novel mosaicing system for inspection reporting, which can create an almost distortion-free mosaic of tunnels, thus allowing a large area of tunnels to be visualised. The system relies on Structure from Motion (SFM), which enables the system to cope with images with a general camera motion, in contrast to standard mosaicing software that can cope only with a strict camera motion. The system involves the automatic robust estimation of a 3D cylindrical surface using a Support Vector Machine to classify 3D points to improve the accuracy of the estimation. It is shown that some curvatures are observed in the mosaics when an inaccurate surface is used for mosaicing, while the mosaics from a surface estimated using the proposed method are almost distortion-free.
New feature matching algorithms aiming to improve the performance of SFM systems are proposed. These algorithms apply a spatial consistency constraint to match features with a similar topography, in contrast to other matching algorithms that rely on matching based on the similar appearance of local image patches. The Shape Context and Random Forest algorithms are combined in the proposed algorithm, revealing promising results.
The final contribution is a new change detection system for monitoring cracks in multi-temporal images. The system can cope with images with a general camera motion achieved by geometrical registration using SFM, unlike other systems that assume fixed or controlled cameras. The system performs photometric normalisation to cope with illumination variation in the images, and also a motion-invariant change detection algorithm is applied to handle deformable objects. It is shown that the results from the proposed change detection system are still impractical for use with tunnel images from a real environment, and further study is required
Flexible spatial configuration of local image features
Local image features have been designed to be informative and repeatable under rigid transformations and illumination deformations. Even though current state-of-the-art local image features present a high degree of repeatability, their local appearance alone usually does not bring enough discriminative power to support a reliable matching, resulting in a relatively high number of mismatches in the correspondence set formed during the data association procedure. As a result, geometric filters, commonly based on global spatial configuration, have been used to reduce this number of mismatches. However, this approach presents a trade off between the effectiveness to reject mismatches and the robustness to non-rigid deformations. In this paper, we propose two geometric filters, based on semilocal spatial configuration of local features, that are designed to be robust to non-rigid deformations and to rigid transformations, without compromising its efficacy to reject mismatches. We compare our methods to the Hough transform, which is an efficient and effective mismatch rejection step based on global spatial configuration of features. In these comparisons, our methods are shown to be more effective in the task of rejecting mismatches for rigid transformations and non-rigid deformations at comparable time complexity figures. Finally, we demonstrate how to integrate these methods in a probabilistic recognition system such that the final verification step uses not only the similarity between features, but also their semi-local configuration.Gustavo Carneiro and Allan D. Jepso