1,287 research outputs found
On morphological hierarchical representations for image processing and spatial data clustering
Hierarchical data representations in the context of classi cation and data
clustering were put forward during the fties. Recently, hierarchical image
representations have gained renewed interest for segmentation purposes. In this
paper, we briefly survey fundamental results on hierarchical clustering and
then detail recent paradigms developed for the hierarchical representation of
images in the framework of mathematical morphology: constrained connectivity
and ultrametric watersheds. Constrained connectivity can be viewed as a way to
constrain an initial hierarchy in such a way that a set of desired constraints
are satis ed. The framework of ultrametric watersheds provides a generic scheme
for computing any hierarchical connected clustering, in particular when such a
hierarchy is constrained. The suitability of this framework for solving
practical problems is illustrated with applications in remote sensing
Automatic Main Road Extraction from High Resolution Satellite Imagery
Road information is essential for automatic GIS (geographical information system) data acquisition, transportation and urban planning. Automatic road (network) detection from high resolution satellite imagery will hold great potential for significant reduction of database development/updating cost and turnaround time. From so called low level feature detection to high level context supported grouping, so many algorithms and methodologies have been presented for this purpose. There is not any practical system that can fully automatically extract road network from space imagery for the purpose of automatic mapping. This paper presents the methodology of automatic main road detection from high resolution satellite IKONOS imagery. The strategies include multiresolution or image pyramid method, Gaussian blurring and the line finder using 1-dimemsional template correlation filter, line segment grouping and multi-layer result integration. Multi-layer or multi-resolution method for road extraction is a very effective strategy to save processing time and improve robustness. To realize the strategy, the original IKONOS image is compressed into different corresponding image resolution so that an image pyramid is generated; after that the line finder of 1-dimemsional template correlation filter after Gaussian blurring filtering is applied to detect the road centerline. Extracted centerline segments belong to or do not belong to roads. There are two ways to identify the attributes of the segments, the one is using segment grouping to form longer line segments and assign a possibility to the segment depending on the length and other geometric and photometric attribute of the segment, for example the longer segment means bigger possibility of being road. Perceptual-grouping based method is used for road segment linking by a possibility model that takes multi-information into account; here the clues existing in the gaps are considered. Another way to identify the segments is feature detection back-to-higher resolution layer from the image pyramid
Research on robust salient object extraction in image
制度:新 ; 文部省報告番号:甲2641号 ; 学位の種類:博士(工学) ; 授与年月日:2008/3/15 ; 早大学位記番号:新480
Text Detection Using Transformation Scaling Extension Algorithm in Natural Scene Images
In recent study efforts, the importance of text identification and recognition in images of natural scenes has been stressed more and more. Natural scene text contains an enormous amount of useful semantic data that can be applied in a variety of vision-related applications. The detection of shape-robust text confronts two major challenges: 1. A large number of traditional quadrangular bounding box-based detectors failed to identify text with irregular forms, making it difficult to include such text within perfect rectangles.2. Pixel-wise segmentation-based detectors sometimes struggle to identify closely positioned text examples from one another. Understanding the surroundings and extracting information from images of natural scenes depends heavily on the ability to detect and recognise text. Scene text can be aligned in a variety of ways, including vertical, curved, random, and horizontal alignments. This paper has created a novel method, the Transformation Scaling Extention Algorithm (TSEA), for text detection using a mask-scoring R-ConvNN (Region Convolutional Neural Network). This method works exceptionally well at accurately identifying text that is curved and text that has multiple orientations inside real-world input images. This study incorporates a mask-scoring R-ConvNN network framework to enhance the model's ability to score masks correctly for the observed occurrences. By providing more weight to accurate mask predictions, our scoring system eliminates inconsistencies between mask quality and score and enhances the effectiveness of instance segmentation. This paper also incorporates a Pyramid-based Text Proposal Network (PBTPN) and a Transformation Component Network (TCN) to enhance the feature extraction capabilities of the mask-scoring R-ConvNN for text identification and segmentation with the TSEA. Studies show that Pyramid Networks are especially effective in reducing false alarms caused by images with backgrounds that mimic text. On benchmark datasets ICDAR 2015, SCUT-CTW1500 containing multi-oriented and curved text, this method outperforms existing methods by conducting extensive testing across several scales and utilizing a single model. This study expands the field of vision-oriented applications by highlighting the growing significance of effectively locating and detecting text in natural situations
Image Understanding by Hierarchical Symbolic Representation and Inexact Matching of Attributed Graphs
We study the symbolic representation of imagery information by a powerful global representation scheme in the form of Attributed Relational Graph (ARG), and propose new techniques for the extraction of such representation from spatial-domain images, and for performing the task of image understanding through the analysis of the extracted ARG representation. To achieve practical image understanding tasks, the system needs to comprehend the imagery information in a global form. Therefore, we propose a multi-layer hierarchical scheme for the extraction of global symbolic representation from spatial-domain images. The proposed scheme produces a symbolic mapping of the input data in terms of an output alphabet, whose elements are defined over global subimages. The proposed scheme uses a combination of model-driven and data-driven concepts. The model- driven principle is represented by a graph transducer, which is used to specify the alphabet at each layer in the scheme. A symbolic mapping is driven by the input data to map the input local alphabet into the output global alphabet. Through the iterative application of the symbolic transformational mapping at different levels of hierarchy, the system extracts a global representation from the image in the form of attributed relational graphs. Further processing and interpretation of the imagery information can, then, be performed on their ARG representation. We also propose an efficient approach for calculating a distance measure and finding the best inexact matching configuration between attributed relational graphs. For two ARGs, we define sequences of weighted error-transformations which when performed on one ARG (or a subgraph of it), will produce the other ARG. A distance measure between two ARGs is defined as the weight of the sequence which possesses minimum total-weight. Moreover, this minimum-total weight sequence defines the best inexact matching configuration between the two ARGs. The global minimization over the possible sequences is performed by a dynamic programming technique, the approach shows good results for ARGs of practical sizes. The proposed system possesses the capability to inference the alphabets of the ARG representation which it uses. In the inference phase, the hierarchical scheme is usually driven by the input data only, which normally consist of images of model objects. It extracts the global alphabet of the ARG representation of the models. The extracted model representation is then used in the operation phase of the system to: perform the mapping in the multi-layer scheme. We present our experimental results for utilizing the proposed system for locating objects in complex scenes
Patch-based semantic labelling of images.
PhDThe work presented in this thesis is focused at associating a semantics
to the content of an image, linking the content to high level
semantic categories. The process can take place at two levels: either
at image level, towards image categorisation, or at pixel level, in se-
mantic segmentation or semantic labelling. To this end, an analysis
framework is proposed, and the different steps of part (or patch) extraction,
description and probabilistic modelling are detailed. Parts of
different nature are used, and one of the contributions is a method to
complement information associated to them. Context for parts has to
be considered at different scales. Short range pixel dependences are accounted
by associating pixels to larger patches. A Conditional Random
Field, that is, a probabilistic discriminative graphical model, is used
to model medium range dependences between neighbouring patches.
Another contribution is an efficient method to consider rich neighbourhoods
without having loops in the inference graph. To this end, weak
neighbours are introduced, that is, neighbours whose label probability
distribution is pre-estimated rather than mutable during the inference.
Longer range dependences, that tend to make the inference problem
intractable, are addressed as well. A novel descriptor based on local
histograms of visual words has been proposed, meant to both complement
the feature descriptor of the patches and augment the context
awareness in the patch labelling process. Finally, an alternative approach
to consider multiple scales in a hierarchical framework based
on image pyramids is proposed. An image pyramid is a compositional
representation of the image based on hierarchical clustering. All the
presented contributions are extensively detailed throughout the thesis,
and experimental results performed on publicly available datasets are
reported to assess their validity. A critical comparison with the state
of the art in this research area is also presented, and the advantage in
adopting the proposed improvements are clearly highlighted
Spott : on-the-spot e-commerce for television using deep learning-based video analysis techniques
Spott is an innovative second screen mobile multimedia application which offers viewers relevant information on objects (e.g., clothing, furniture, food) they see and like on their television screens. The application enables interaction between TV audiences and brands, so producers and advertisers can offer potential consumers tailored promotions, e-shop items, and/or free samples. In line with the current views on innovation management, the technological excellence of the Spott application is coupled with iterative user involvement throughout the entire development process. This article discusses both of these aspects and how they impact each other. First, we focus on the technological building blocks that facilitate the (semi-) automatic interactive tagging process of objects in the video streams. The majority of these building blocks extensively make use of novel and state-of-the-art deep learning concepts and methodologies. We show how these deep learning based video analysis techniques facilitate video summarization, semantic keyframe clustering, and (similar) object retrieval. Secondly, we provide insights in user tests that have been performed to evaluate and optimize the application's user experience. The lessons learned from these open field tests have already been an essential input in the technology development and will further shape the future modifications to the Spott application
- …