551 research outputs found

    Nuclei/Cell Detection in Microscopic Skeletal Muscle Fiber Images and Histopathological Brain Tumor Images Using Sparse Optimizations

    Get PDF
    Nuclei/Cell detection is usually a prerequisite procedure in many computer-aided biomedical image analysis tasks. In this thesis we propose two automatic nuclei/cell detection frameworks. One is for nuclei detection in skeletal muscle fiber images and the other is for brain tumor histopathological images. For skeletal muscle fiber images, the major challenges include: i) shape and size variations of the nuclei, ii) overlapping nuclear clumps, and iii) a series of z-stack images with out-of-focus regions. We propose a novel automatic detection algorithm consisting of the following components: 1) The original z-stack images are first converted into one all-in-focus image. 2) A sufficient number of hypothetical ellipses are then generated for each nuclei contour. 3) Next, a set of representative training samples and discriminative features are selected by a two-stage sparse model. 4) A classifier is trained using the refined training data. 5) Final nuclei detection is obtained by mean-shift clustering based on inner distance. The proposed method was tested on a set of images containing over 1500 nuclei. The results outperform the current state-of-the-art approaches. For brain tumor histopathological images, the major challenges are to handle significant variations in cell appearance and to split touching cells. The proposed novel automatic cell detection consists of: 1) Sparse reconstruction for splitting touching cells. 2) Adaptive dictionary learning for handling cell appearance variations. The proposed method was extensively tested on a data set with over 2000 cells. The result outperforms other state-of-the-art algorithms with F1 score = 0.96

    High-Level Facade Image Interpretation using Marked Point Processes

    Get PDF
    In this thesis, we address facade image interpretation as one essential ingredient for the generation of high-detailed, semantic meaningful, three-dimensional city-models. Given a single rectified facade image, we detect relevant facade objects such as windows, entrances, and balconies, which yield a description of the image in terms of accurate position and size of these objects. Urban digital three-dimensional reconstruction and documentation is an active area of research with several potential applications, e.g., in the area of digital mapping for navigation, urban planning, emergency management, disaster control or the entertainment industry. A detailed building model which is not just a geometric object enriched with texture, allows for semantic requests as the number of floors or the location of balconies and entrances. Facade image interpretation is one essential step in order to yield such models. In this thesis, we propose the interpretation of facade images by combining evidence for the occurrence of individual object classes which we derive from data, and prior knowledge which guides the image interpretation in its entirety. We present a three-step procedure which generates features that are suited to describe relevant objects, learns a representation that is suited for object detection, and that enables the image interpretation using the results of object detection while incorporating prior knowledge about typical configurations of facade objects, which we learn from training data. According to these three sub-tasks, our major achievements are: We propose a novel method for facade image interpretation based on a marked point process. Therefor, we develop a model for the description of typical configurations of facade objects and propose an image interpretation system which combines evidence derived from data and prior knowledge about typical configurations of facade objects. In order to generate evidence from data, we propose a feature type which we call shapelets. They are scale invariant and provide large distinctiveness for facade objects. Segments of lines, arcs, and ellipses serve as basic features for the generation of shapelets. Therefor, we propose a novel line simplification approach which approximates given pixel-chains by a sequence of lines, circular, and elliptical arcs. Among others, it is based on an adaption to Douglas-Peucker's algorithm, which is based on circles as basic geometric elements We evaluate each step separately. We show the effects of polyline segmentation and simplification on several images with comparable good or even better results, referring to a state-of-the-art algorithm, which proves their large distinctiveness for facade objects. Using shapelets we provide a reasonable classification performance on a challenging dataset, including intra-class variations, clutter, and scale changes. Finally, we show promising results for the facade interpretation system on several datasets and provide a qualitative evaluation which demonstrates the capability of complete and accurate detection of facade objectsHigh-Level Interpretation von Fassaden-Bildern unter Benutzung von Markierten PunktprozessenDas Thema dieser Arbeit ist die Interpretation von Fassadenbildern als wesentlicher Beitrag zur Erstellung hoch detaillierter, semantisch reichhaltiger dreidimensionaler Stadtmodelle. In rektifizierten Einzelaufnahmen von Fassaden detektieren wir relevante Objekte wie Fenster, TĂŒren und Balkone, um daraus eine Bildinterpretation in Form von prĂ€zisen Positionen und GrĂ¶ĂŸen dieser Objekte abzuleiten. Die digitale dreidimensionale Rekonstruktion urbaner Regionen ist ein aktives Forschungsfeld mit zahlreichen Anwendungen, beispielsweise der Herstellung digitaler Kartenwerke fĂŒr Navigation, Stadtplanung, Notfallmanagement, Katastrophenschutz oder die Unterhaltungsindustrie. Detaillierte GebĂ€udemodelle, die nicht nur als geometrische Objekte reprĂ€sentiert und durch eine geeignete Textur visuell ansprechend dargestellt werden, erlauben semantische Anfragen, wie beispielsweise nach der Anzahl der Geschosse oder der Position der Balkone oder EingĂ€nge. Die semantische Interpretation von Fassadenbildern ist ein wesentlicher Schritt fĂŒr die Erzeugung solcher Modelle. In der vorliegenden Arbeit lösen wir diese Aufgabe, indem wir aus Daten abgeleitete Evidenz fĂŒr das Vorkommen einzelner Objekte mit Vorwissen kombinieren, das die Analyse der gesamten Bildinterpretation steuert. Wir prĂ€sentieren dafĂŒr ein dreistufiges Verfahren: Wir erzeugen Bildmerkmale, die fĂŒr die Beschreibung der relevanten Objekte geeignet sind. Wir lernen, auf Basis abgeleiteter Merkmale, eine ReprĂ€sentation dieser Objekte. Schließlich realisieren wir die Bildinterpretation basierend auf der zuvor gelernten ReprĂ€sentation und dem Vorwissen ĂŒber typische Konfigurationen von Fassadenobjekten, welches wir aus Trainingsdaten ableiten. Wir leisten dazu die folgenden wissenschaftlichen BeitrĂ€ge: Wir schlagen eine neuartige Me-thode zur Interpretation von Fassadenbildern vor, die einen sogenannten markierten Punktprozess verwendet. DafĂŒr entwickeln wir ein Modell zur Beschreibung typischer Konfigurationen von Fassadenobjekten und entwickeln ein Bildinterpretationssystem, welches aus Daten abgeleitete Evidenz und a priori Wissen ĂŒber typische Fassadenkonfigurationen kombiniert. FĂŒr die Erzeugung der Evidenz stellen wir Merkmale vor, die wir Shapelets nennen und die skaleninvariant und durch eine ausgesprochene DistinktivitĂ€t im Bezug auf Fassadenobjekte gekennzeichnet sind. Als Basismerkmale fĂŒr die Erzeugung der Shapelets dienen Linien-, Kreis- und Ellipsensegmente. DafĂŒr stellen wir eine neuartige Methode zur Vereinfachung von Liniensegmenten vor, die eine Pixelkette durch eine Sequenz von geraden LinienstĂŒcken und elliptischen Bogensegmenten approximiert. Diese basiert unter anderem auf einer Adaption des Douglas-Peucker Algorithmus, die anstelle gerader LinienstĂŒcke, Bogensegmente als geometrische Basiselemente verwendet. Wir evaluieren jeden dieser drei Teilschritte separat. Wir zeigen Ergebnisse der Liniensegmen-tierung anhand verschiedener Bilder und weisen dabei vergleichbare und teilweise verbesserte Ergebnisse im Vergleich zu bestehende Verfahren nach. FĂŒr die vorgeschlagenen Shapelets weisen wir in der Evaluation ihre diskriminativen Eigenschaften im Bezug auf Fassadenobjekte nach. Wir erzeugen auf einem anspruchsvollen Datensatz von skalenvariablen Fassadenobjekten, mit starker VariabilitĂ€t der Erscheinung innerhalb der Klassen, vielversprechende Klassifikationsergebnisse, die die Verwendbarkeit der gelernten Shapelets fĂŒr die weitere Interpretation belegen. Schließlich zeigen wir Ergebnisse der Interpretation der Fassadenstruktur anhand verschiedener DatensĂ€tze. Die qualitative Evaluation demonstriert die FĂ€higkeit des vorgeschlagenen Lösungsansatzes zur vollstĂ€ndigen und prĂ€zisen Detektion der genannten Fassadenobjekte

    Local, Semi-Local and Global Models for Texture, Object and Scene Recognition

    Get PDF
    This dissertation addresses the problems of recognizing textures, objects, and scenes in photographs. We present approaches to these recognition tasks that combine salient local image features with spatial relations and effective discriminative learning techniques. First, we introduce a bag of features image model for recognizing textured surfaces under a wide range of transformations, including viewpoint changes and non-rigid deformations. We present results of a large-scale comparative evaluation indicating that bags of features can be effective not only for texture, but also for object categization, even in the presence of substantial clutter and intra-class variation. We also show how to augment the purely local image representation with statistical co-occurrence relations between pairs of nearby features, and develop a learning and classification framework for the task of classifying individual features in a multi-texture image. Next, we present a more structured alternative to bags of features for object recognition, namely, an image representation based on semi-local parts, or groups of features characterized by stable appearance and geometric layout. Semi-local parts are automatically learned from small sets of unsegmented, cluttered images. Finally, we present a global method for recognizing scene categories that works by partitioning the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. The resulting spatial pyramid representation demonstrates significantly improved performance on challenging scene categorization tasks

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Model-driven and Data-driven Approaches for some Object Recognition Problems

    Get PDF
    Recognizing objects from images and videos has been a long standing problem in computer vision. The recent surge in the prevalence of visual cameras has given rise to two main challenges where, (i) it is important to understand different sources of object variations in more unconstrained scenarios, and (ii) rather than describing an object in isolation, efficient learning methods for modeling object-scene `contextual' relations are required to resolve visual ambiguities. This dissertation addresses some aspects of these challenges, and consists of two parts. First part of the work focuses on obtaining object descriptors that are largely preserved across certain sources of variations, by utilizing models for image formation and local image features. Given a single instance of an object, we investigate the following three problems. (i) Representing a 2D projection of a 3D non-planar shape invariant to articulations, when there are no self-occlusions. We propose an articulation invariant distance that is preserved across piece-wise affine transformations of a non-rigid object `parts', under a weak perspective imaging model, and then obtain a shape context-like descriptor to perform recognition; (ii) Understanding the space of `arbitrary' blurred images of an object, by representing an unknown blur kernel of a known maximum size using a complete set of orthonormal basis functions spanning that space, and showing that subspaces resulting from convolving a clean object and its blurred versions with these basis functions are equal under some assumptions. We then view the invariant subspaces as points on a Grassmann manifold, and use statistical tools that account for the underlying non-Euclidean nature of the space of these invariants to perform recognition across blur; (iii) Analyzing the robustness of local feature descriptors to different illumination conditions. We perform an empirical study of these descriptors for the problem of face recognition under lighting change, and show that the direction of image gradient largely preserves object properties across varying lighting conditions. The second part of the dissertation utilizes information conveyed by large quantity of data to learn contextual information shared by an object (or an entity) with its surroundings. (i) We first consider a supervised two-class problem of detecting lane markings from road video sequences, where we learn relevant feature-level contextual information through a machine learning algorithm based on boosting. We then focus on unsupervised object classification scenarios where, (ii) we perform clustering using maximum margin principles, by deriving some basic properties on the affinity of `a pair of points' belonging to the same cluster using the information conveyed by `all' points in the system, and (iii) then consider correspondence-free adaptation of statistical classifiers across domain shifting transformations, by generating meaningful `intermediate domains' that incrementally convey potential information about the domain change

    Do-It-Yourself Single Camera 3D Pointer Input Device

    Full text link
    We present a new algorithm for single camera 3D reconstruction, or 3D input for human-computer interfaces, based on precise tracking of an elongated object, such as a pen, having a pattern of colored bands. To configure the system, the user provides no more than one labelled image of a handmade pointer, measurements of its colored bands, and the camera's pinhole projection matrix. Other systems are of much higher cost and complexity, requiring combinations of multiple cameras, stereocameras, and pointers with sensors and lights. Instead of relying on information from multiple devices, we examine our single view more closely, integrating geometric and appearance constraints to robustly track the pointer in the presence of occlusion and distractor objects. By probing objects of known geometry with the pointer, we demonstrate acceptable accuracy of 3D localization.Comment: 8 pages, 6 figures, 2018 15th Conference on Computer and Robot Visio

    Active occlusion-handling for appearance-based object recognition models

    Get PDF
    Struwe M. Active occlusion-handling for appearance-based object recognition models. Bielefeld: UniversitÀt Bielefeld; 2017.Despite extensive research, visual detection of objects in natural scenes is still not robustly solved. The reason for this is the large variation in appearance in which objects or classes occur. A particularly challenging variation is occlusion, which is caused by the constellation of objects in a scene. Occlusion reduces the number of visible features of an object, but also causes accidental features. Current object representations yield acceptable results during a low to medium level of occlusion, but fail for stronger occlusions. This thesis addresses single image-based object recognition during occlusion and proposes different occlusion-handling strategies. Initially, it depicts a holistic discriminative car detection framework, which several chapters use as reference system. Motivated by a label analysis of hand-annotated video traffic scenes, it then presents a car detector, taking car-car constellations into account. The following chapter illustrates a modification of the reference system to cover with more general occlusion constellations. Inspired by the fact that parts-based detection approaches are more robust against occlusion, the next chapter discusses a parts-based car detector with active occlusion-handling at the detection step. At first, this exploits a strategy using the mask of the occluding object to re-weight the score of possible car hypotheses. This is followed by the presentation of an extended version, which especially targets strongly occluded cars. Due to the fact that hand-annotated video streams do not provide pixel-level information about the object instances, this thesis presents a rendered benchmark data set to resolve this issue. The pixel-level information permits intensive evaluation of occlusion-handling strategies. An eye-tracker study also uses this rendered data set to explore how humans cope with the absence of visual object features, and which information they use to deal with occlusion

    Adaptive online performance evaluation of video trackers

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. C. SanMiguel, A. Caballaro, and J. M. Martínez, "Adaptive Online Performance Evaluation of Video Trackers", IEEE Transactions on Image Processing, vol. 21, no. 5, pp. 2812 - 2823. May 2012We propose an adaptive framework to estimate the quality of video tracking algorithms without ground-truth data. The framework is divided into two main stages, namely, the estimation of the tracker condition to identify temporal segments during which a target is lost and the measurement of the quality of the estimated track when the tracker is successful. A key novelty of the proposed framework is the capability of evaluating video trackers with multiple failures and recoveries over long sequences. Successful tracking is identified by analyzing the uncertainty of the tracker, whereas track recovery from errors is determined based on the time-reversibility constraint. The proposed approach is demonstrated on a particle filter tracker over a heterogeneous data set. Experimental results show the effectiveness and robustness of the proposed framework that improves state-of-the-art approaches in the presence of tracking challenges such as occlusions, illumination changes, and clutter and on sequences containing multiple tracking errors and recoveries.This work was partially supported by the Spanish Government (TEC2007- 65400 SemanticVideo), Cátedra Infoglobal-UAM for “Nuevas Tecnologías de video aplicadas a la seguridad”, Consejería de Educación of the Comunidad de Madrid and European Social Fund

    A Methodology for Extracting Human Bodies from Still Images

    Get PDF
    Monitoring and surveillance of humans is one of the most prominent applications of today and it is expected to be part of many future aspects of our life, for safety reasons, assisted living and many others. Many efforts have been made towards automatic and robust solutions, but the general problem is very challenging and remains still open. In this PhD dissertation we examine the problem from many perspectives. First, we study the performance of a hardware architecture designed for large-scale surveillance systems. Then, we focus on the general problem of human activity recognition, present an extensive survey of methodologies that deal with this subject and propose a maturity metric to evaluate them. One of the numerous and most popular algorithms for image processing found in the field is image segmentation and we propose a blind metric to evaluate their results regarding the activity at local regions. Finally, we propose a fully automatic system for segmenting and extracting human bodies from challenging single images, which is the main contribution of the dissertation. Our methodology is a novel bottom-up approach relying mostly on anthropometric constraints and is facilitated by our research in the fields of face, skin and hands detection. Experimental results and comparison with state-of-the-art methodologies demonstrate the success of our approach
    • 

    corecore