280 research outputs found

    Robust Object Detection with Interleaved Categorization and Segmentation

    Get PDF
    This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance. The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion. An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable system

    A Review of Codebook Models in Patch-Based Visual Object Recognition

    No full text
    The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods

    Energy Based Multi-Model Fitting and Matching Problems

    Get PDF
    Feature matching and model fitting are fundamental problems in multi-view geometry. They are chicken-&-egg problems: if models are known it is easier to find matches and vice versa. Standard multi-view geometry techniques sequentially solve feature matching and model fitting as two independent problems after making fairly restrictive assumptions. For example, matching methods rely on strong discriminative power of feature descriptors, which fail for stereo images with repetitive textures or wide baseline. Also, model fitting methods assume given feature matches, which are not known a priori. Moreover, when data supports multiple models the fitting problem becomes challenging even with known matches and current methods commonly use heuristics. One of the main contributions of this thesis is a joint formulation of fitting and matching problems. We are first to introduce an objective function combining both matching and multi-model estimation. We also propose an approximation algorithm for the corresponding NP-hard optimization problem using block-coordinate descent with respect to matching and model fitting variables. For fixed models, our method uses min-cost-max-flow based algorithm to solve a generalization of a linear assignment problem with label cost (sparsity constraint). Fixed matching case reduces to multi-model fitting subproblem, which is interesting in its own right. In contrast to standard heuristic approaches, we introduce global objective functions for multi-model fitting using various forms of regularization (spatial smoothness and sparsity) and propose a graph-cut based optimization algorithm, PEaRL. Experimental results show that our proposed mathematical formulations and optimization algorithms improve the accuracy and robustness of model estimation over the state-of-the-art in computer vision

    Human detection, tracking and segmentation from low-level to high-level vision

    Get PDF
    The goal of this research is to detect, segment and track a human body as well as estimate its limb configuration from cluttered background. These are fundamental research issues that have attracted intensive attention in the computer vision community because of their wide applications. Meanwhile they also remain to be ones of the most challenging research issues largely due to the ubiquitous visual ambiguities in images/videos. The other challenging factor is the ill-posed nature of the problems. Inspired by the recent findings in cognitive psychology, we adopt several biologically plausible approaches to attack these challenging problems. This dissertation provides a comprehensive study of human detection, tracking and segmentation that covers several research issues ranging from low, middle, and high-level vision.In low-level vision, we investigate video segmentation where the main challenge is the non-convex classification problem, and we develop a cascaded multi-layer segmentation framework where no-convex classification problems are addressed in a split-and-merge paradigm by combining merits of both statistical modeling and graph theory.In middle-level vision, we propose a segmentation based hypothesis-and-test paradigm to achieve joint localization and segmentation that exploits the complementary nature of region-based and edge-based shape priors. In addition, we integrate both priors into a Graph-cut framework to improve the segmentation results.In high-level vision, our research has two related parts. First, we propose a hybrid body representation that embraces part-whole shape priors and part-based spatial prior for integrated pose recognition, localization and segmentation in a given image. Second, we further combine spatial and temporal priors in an integrated online learning and inference framework, where body parts can be detected, localized and segmented simultaneously from a video sequence. Both of them are supported by previous low-level and mid-level vision tasks.Experimental results show that the proposed algorithms can achieve accurate and robust tracking, localization and segmentation results for different walking subjects with significant appearance and motion variability and under cluttered background

    Two and three dimensional segmentation of multimodal imagery

    Get PDF
    The role of segmentation in the realms of image understanding/analysis, computer vision, pattern recognition, remote sensing and medical imaging in recent years has been significantly augmented due to accelerated scientific advances made in the acquisition of image data. This low-level analysis protocol is critical to numerous applications, with the primary goal of expediting and improving the effectiveness of subsequent high-level operations by providing a condensed and pertinent representation of image information. In this research, we propose a novel unsupervised segmentation framework for facilitating meaningful segregation of 2-D/3-D image data across multiple modalities (color, remote-sensing and biomedical imaging) into non-overlapping partitions using several spatial-spectral attributes. Initially, our framework exploits the information obtained from detecting edges inherent in the data. To this effect, by using a vector gradient detection technique, pixels without edges are grouped and individually labeled to partition some initial portion of the input image content. Pixels that contain higher gradient densities are included by the dynamic generation of segments as the algorithm progresses to generate an initial region map. Subsequently, texture modeling is performed and the obtained gradient, texture and intensity information along with the aforementioned initial partition map are used to perform a multivariate refinement procedure, to fuse groups with similar characteristics yielding the final output segmentation. Experimental results obtained in comparison to published/state-of the-art segmentation techniques for color as well as multi/hyperspectral imagery, demonstrate the advantages of the proposed method. Furthermore, for the purpose of achieving improved computational efficiency we propose an extension of the aforestated methodology in a multi-resolution framework, demonstrated on color images. Finally, this research also encompasses a 3-D extension of the aforementioned algorithm demonstrated on medical (Magnetic Resonance Imaging / Computed Tomography) volumes

    Using contour information and segmentation for object registration, modeling and retrieval

    Get PDF
    This thesis considers different aspects of the utilization of contour information and syntactic and semantic image segmentation for object registration, modeling and retrieval in the context of content-based indexing and retrieval in large collections of images. Target applications include retrieval in collections of closed silhouettes, holistic w ord recognition in handwritten historical manuscripts and shape registration. Also, the thesis explores the feasibility of contour-based syntactic features for improving the correspondence of the output of bottom-up segmentation to semantic objects present in the scene and discusses the feasibility of different strategies for image analysis utilizing contour information, e.g. segmentation driven by visual features versus segmentation driven by shape models or semi-automatic in selected application scenarios. There are three contributions in this thesis. The first contribution considers structure analysis based on the shape and spatial configuration of image regions (socalled syntactic visual features) and their utilization for automatic image segmentation. The second contribution is the study of novel shape features, matching algorithms and similarity measures. Various applications of the proposed solutions are presented throughout the thesis providing the basis for the third contribution which is a discussion of the feasibility of different recognition strategies utilizing contour information. In each case, the performance and generality of the proposed approach has been analyzed based on extensive rigorous experimentation using as large as possible test collections

    Combination of multiple image segmentations

    Full text link
    Die Arbeit betrachtet die Kombination von mehreren Bildsegmentierungen im Bereich von contour detection und regionenbasierter Bildsegmentierung. Das Ziel ist die Kombination von mehreren Segmentierungen in eine verbesserte finale Segmentierung. Im Fall der regionenbasierten Kombination von Segmentierungen wird das generalized median Konzept verwendet, um automatisch die endgueltige Anzahl von Regionen zu bestimmen. Umfangreiche Experimente zeigen, dass die vorgeschlagene Kombinationsmethode bessere Ergebnisse erzielt als der Lernansatz unter Verwendung von Ground Truth Daten. Schliesslich untersuchen Experimente mit Evaluationsmassen fuer Segmentierungen das Verhalten sowie die Metrik-Eigenschaft der Masse. Die Studie soll als Leitlinie fuer die geeignete Wahl von Evaluationsmassen dienen. The thesis concerns combination of multiple image segmentations in the domains of contour detection and region-based image segmentation. The goal is to combine multiple segmentations into a final improved result. In the case of region-based image segmentation combination, a generalized median concept is proposed to automatically determine the final number of regions. Extensive experiments demonstrate that our combination method outperforms the ground truth based training approach. In addition, experimental investigation of existing segmentation evaluation measures on the metric property and the evaluating behaviors is presented. This study is intended to be as a guideline for appropriately choosing the evaluation measures

    Man-made Surface Structures from Triangulated Point Clouds

    Get PDF
    Photogrammetry aims at reconstructing shape and dimensions of objects captured with cameras, 3D laser scanners or other spatial acquisition systems. While many acquisition techniques deliver triangulated point clouds with millions of vertices within seconds, the interpretation is usually left to the user. Especially when reconstructing man-made objects, one is interested in the underlying surface structure, which is not inherently present in the data. This includes the geometric shape of the object, e.g. cubical or cylindrical, as well as corresponding surface parameters, e.g. width, height and radius. Applications are manifold and range from industrial production control to architectural on-site measurements to large-scale city models. The goal of this thesis is to automatically derive such surface structures from triangulated 3D point clouds of man-made objects. They are defined as a compound of planar or curved geometric primitives. Model knowledge about typical primitives and relations between adjacent pairs of them should affect the reconstruction positively. After formulating a parametrized model for man-made surface structures, we develop a reconstruction framework with three processing steps: During a fast pre-segmentation exploiting local surface properties we divide the given surface mesh into planar regions. Making use of a model selection scheme based on minimizing the description length, this surface segmentation is free of control parameters and automatically yields an optimal number of segments. A subsequent refinement introduces a set of planar or curved geometric primitives and hierarchically merges adjacent regions based on their joint description length. A global classification and constraint parameter estimation combines the data-driven segmentation with high-level model knowledge. Therefore, we represent the surface structure with a graphical model and formulate factors based on likelihood as well as prior knowledge about parameter distributions and class probabilities. We infer the most probable setting of surface and relation classes with belief propagation and estimate an optimal surface parametrization with constraints induced by inter-regional relations. The process is specifically designed to work on noisy data with outliers and a few exceptional freeform regions not describable with geometric primitives. It yields full 3D surface structures with watertightly connected surface primitives of different types. The performance of the proposed framework is experimentally evaluated on various data sets. On small synthetically generated meshes we analyze the accuracy of the estimated surface parameters, the sensitivity w.r.t. various properties of the input data and w.r.t. model assumptions as well as the computational complexity. Additionally we demonstrate the flexibility w.r.t. different acquisition techniques on real data sets. The proposed method turns out to be accurate, reasonably fast and little sensitive to defects in the data or imprecise model assumptions.KĂŒnstliche OberflĂ€chenstrukturen aus triangulierten Punktwolken Ein Ziel der Photogrammetrie ist die Rekonstruktion der Form und GrĂ¶ĂŸe von Objekten, die mit Kameras, 3D-Laserscannern und anderern rĂ€umlichen Erfassungssystemen aufgenommen wurden. WĂ€hrend viele Aufnahmetechniken innerhalb von Sekunden triangulierte Punktwolken mit Millionen von Punkten liefern, ist deren Interpretation gewöhnlicherweise dem Nutzer ĂŒberlassen. Besonders bei der Rekonstruktion kĂŒnstlicher Objekte (i.S.v. engl. man-made = „von Menschenhand gemacht“ ist man an der zugrunde liegenden OberflĂ€chenstruktur interessiert, welche nicht inhĂ€rent in den Daten enthalten ist. Diese umfasst die geometrische Form des Objekts, z.B. quaderförmig oder zylindrisch, als auch die zugehörigen OberflĂ€chenparameter, z.B. Breite, Höhe oder Radius. Die Anwendungen sind vielfĂ€ltig und reichen von industriellen Fertigungskontrollen ĂŒber architektonische Raumaufmaße bis hin zu großmaßstĂ€bigen Stadtmodellen. Das Ziel dieser Arbeit ist es, solche OberflĂ€chenstrukturen automatisch aus triangulierten Punktwolken von kĂŒnstlichen Objekten abzuleiten. Sie sind definiert als ein Verbund ebener und gekrĂŒmmter geometrischer Primitive. Modellwissen ĂŒber typische Primitive und Relationen zwischen Paaren von ihnen soll die Rekonstruktion positiv beeinflussen. Nachdem wir ein parametrisiertes Modell fĂŒr kĂŒnstliche OberflĂ€chenstrukturen formuliert haben, entwickeln wir ein Rekonstruktionsverfahren mit drei Verarbeitungsschritten: Im Rahmen einer schnellen Vorsegmentierung, die lokale OberflĂ€cheneigenschaften berĂŒcksichtigt, teilen wir die gegebene vermaschte OberflĂ€che in ebene Regionen. Unter Verwendung eines Schemas zur Modellauswahl, das auf der Minimierung der BeschreibungslĂ€nge beruht, ist diese OberflĂ€chensegmentierung unabhĂ€ngig von Kontrollparametern und liefert automatisch eine optimale Anzahl an Regionen. Eine anschließende Verbesserung fĂŒhrt eine Menge von ebenen und gekrĂŒmmten geometrischen Primitiven ein und fusioniert benachbarte Regionen hierarchisch basierend auf ihrer gemeinsamen BeschreibungslĂ€nge. Eine globale Klassifikation und bedingte ParameterschĂ€tzung verbindet die datengetriebene Segmentierung mit hochrangigem Modellwissen. Dazu stellen wir die OberflĂ€chenstruktur in Form eines graphischen Modells dar und formulieren Faktoren basierend auf der Likelihood sowie auf apriori Wissen ĂŒber die Parameterverteilungen und Klassenwahrscheinlichkeiten. Wir leiten die wahrscheinlichste Konfiguration von FlĂ€chen- und Relationsklassen mit Hilfe von Belief-Propagation ab und schĂ€tzen eine optimale OberflĂ€chenparametrisierung mit Bedingungen, die durch die Relationen zwischen benachbarten Primitiven induziert werden. Der Prozess ist eigens fĂŒr verrauschte Daten mit Ausreißern und wenigen Ausnahmeregionen konzipiert, die nicht durch geometrische Primitive beschreibbar sind. Er liefert wasserdichte 3D-OberflĂ€chenstrukturen mit OberflĂ€chenprimitiven verschiedener Art. Die LeistungsfĂ€higkeit des vorgestellten Verfahrens wird an verschiedenen DatensĂ€tzen experimentell evaluiert. Auf kleinen, synthetisch generierten OberflĂ€chen untersuchen wir die Genauigkeit der geschĂ€tzten OberflĂ€chenparameter, die SensitivitĂ€t bzgl. verschiedener Eigenschaften der Eingangsdaten und bzgl. Modellannahmen sowie die RechenkomplexitĂ€t. Außerdem demonstrieren wir die FlexibilitĂ€t bzgl. verschiedener Aufnahmetechniken anhand realer DatensĂ€tze. Das vorgestellte Rekonstruktionsverfahren erweist sich als genau, hinreichend schnell und wenig anfĂ€llig fĂŒr Defekte in den Daten oder falsche Modellannahmen
    • 

    corecore