63 research outputs found

    DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

    Get PDF
    In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM

    Brain Tumor Detection and Segmentation in Multisequence MRI

    Get PDF
    Tato prĂĄce se zabĂœvĂĄ detekcĂ­ a segmentacĂ­ mozkovĂ©ho nĂĄdoru v multisekvenčnĂ­ch MR obrazech se zaměƙenĂ­m na gliomy vysokĂ©ho a nĂ­zkĂ©ho stupně malignity. Jsou zde pro tento Ășčel navrĆŸeny tƙi metody. PrvnĂ­ metoda se zabĂœvĂĄ detekcĂ­ prezence částĂ­ mozkovĂ©ho nĂĄdoru v axiĂĄlnĂ­ch a koronĂĄrnĂ­ch ƙezech. JednĂĄ se o algoritmus zaloĆŸenĂœ na analĂœze symetrie pƙi rĆŻznĂœch rozliĆĄenĂ­ch obrazu, kterĂœ byl otestovĂĄn na T1, T2, T1C a FLAIR obrazech. DruhĂĄ metoda se zabĂœvĂĄ extrakcĂ­ oblasti celĂ©ho mozkovĂ©ho nĂĄdoru, zahrnujĂ­cĂ­ oblast jĂĄdra tumoru a edĂ©mu, ve FLAIR a T2 obrazech. Metoda je schopna extrahovat mozkovĂœ nĂĄdor z 2D i 3D obrazĆŻ. Je zde opět vyuĆŸita analĂœza symetrie, kterĂĄ je nĂĄsledovĂĄna automatickĂœm stanovenĂ­m intenzitnĂ­ho prahu z nejvĂ­ce asymetrickĂœch částĂ­. TƙetĂ­ metoda je zaloĆŸena na predikci lokĂĄlnĂ­ struktury a je schopna segmentovat celou oblast nĂĄdoru, jeho jĂĄdro i jeho aktivnĂ­ část. Metoda vyuĆŸĂ­vĂĄ faktu, ĆŸe větĆĄina lĂ©kaƙskĂœch obrazĆŻ vykazuje vysokou podobnost intenzit sousednĂ­ch pixelĆŻ a silnou korelaci mezi intenzitami v rĆŻznĂœch obrazovĂœch modalitĂĄch. JednĂ­m ze zpĆŻsobĆŻ, jak s touto korelacĂ­ pracovat a pouĆŸĂ­vat ji, je vyuĆŸitĂ­ lokĂĄlnĂ­ch obrazovĂœch polĂ­. PodobnĂĄ korelace existuje takĂ© mezi sousednĂ­mi pixely v anotaci obrazu. Tento pƙíznak byl vyuĆŸit v predikci lokĂĄlnĂ­ struktury pƙi lokĂĄlnĂ­ anotaci polĂ­. Jako klasifikačnĂ­ algoritmus je v tĂ©to metodě pouĆŸita konvolučnĂ­ neuronovĂĄ sĂ­Ć„ vzhledem k jejĂ­ znĂĄme schopnosti zachĂĄzet s korelacĂ­ mezi pƙíznaky. VĆĄechny tƙi metody byly otestovĂĄny na veƙejnĂ© databĂĄzi 254 multisekvenčnĂ­ch MR obrazech a byla dosĂĄhnuta pƙesnost srovnatelnĂĄ s nejmodernějĆĄĂ­mi metodami v mnohem kratĆĄĂ­m vĂœpočetnĂ­m čase (v ƙádu sekund pƙi pouĆŸitĂœ CPU), coĆŸ poskytuje moĆŸnost manuĂĄlnĂ­ch Ășprav pƙi interaktivnĂ­ segmetaci.This work deals with the brain tumor detection and segmentation in multisequence MR images with particular focus on high- and low-grade gliomas. Three methods are propose for this purpose. The first method deals with the presence detection of brain tumor structures in axial and coronal slices. This method is based on multi-resolution symmetry analysis and it was tested for T1, T2, T1C and FLAIR images. The second method deals with extraction of the whole brain tumor region, including tumor core and edema, in FLAIR and T2 images and is suitable to extract the whole brain tumor region from both 2D and 3D. It also uses the symmetry analysis approach which is followed by automatic determination of the intensity threshold from the most asymmetric parts. The third method is based on local structure prediction and it is able to segment the whole tumor region as well as tumor core and active tumor. This method takes the advantage of a fact that most medical images feature a high similarity in intensities of nearby pixels and a strong correlation of intensity profiles across different image modalities. One way of dealing with -- and even exploiting -- this correlation is the use of local image patches. In the same way, there is a high correlation between nearby labels in image annotation, a feature that has been used in the ``local structure prediction'' of local label patches. Convolutional neural network is chosen as a learning algorithm, as it is known to be suited for dealing with correlation between features. All three methods were evaluated on a public data set of 254 multisequence MR volumes being able to reach comparable results to state-of-the-art methods in much shorter computing time (order of seconds running on CPU) providing means, for example, to do online updates when aiming at an interactive segmentation.

    Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos

    Get PDF
    Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field (CRF), which is well-suited to modeling local interactions among adjacent image regions. However the CRF is limited in dealing with complex, global (long-range) interactions between regions in an image, and between frames in a video. This thesis presents approaches to modeling long-range interactions within images and videos, for use in semantic labeling. In order to model these long-range interactions, we incorporate priors based on the restricted Boltzmann machine (RBM). The RBM is a generative model which has demonstrated the ability to learn the shape of an object and the CRBM is a temporal extension which can learn the motion of an object. Although the CRF is a good baseline labeler, we show how the RBM and CRBM can be added to the architecture to model both the global object shape within an image and the temporal dependencies of the object from previous frames in a video. We demonstrate the labeling performance of our models for the parts of complex face images from the Labeled Faces in the Wild database (for images) and the YouTube Faces Database (for videos). Our hybrid models produce results that are both quantitatively and qualitatively better than the baseline CRF alone for both images and videos

    Integrative Levels of Knowing

    Get PDF
    Diese Dissertation beschĂ€ftigt sich mit einer systematischen Organisation der epistemologischen Dimension des menschlichen Wissens in Bezug auf Perspektiven und Methoden. Insbesondere wird untersucht inwieweit das bekannte Organisationsprinzip der integrativen Ebenen, das eine Hierarchie zunehmender KomplexitĂ€t und Integration beschreibt, geeignet ist fĂŒr eine grundlegende Klassifikation von Perspektiven bzw. epistemischen Bezugsrahmen. Die zentrale These dieser Dissertation geht davon aus, dass eine angemessene Analyse solcher epistemischen Kontexte in der Lage sein sollte, unterschiedliche oder gar konfligierende Bezugsrahmen anhand von kontextĂŒbergreifenden Standards und Kriterien vergleichen und bewerten zu können. Diese Aufgabe erfordert theoretische und methodologische Grundlagen, welche die BeschrĂ€nkungen eines radikalen Kontextualismus vermeiden, insbesondere die ihm innewohnende Gefahr einer Fragmentierung des Wissens aufgrund der angeblichen InkommensurabilitĂ€t epistemischer Kontexte. Basierend auf JĂŒrgen Habermas‘ Theorie des kommunikativen Handelns und seiner Methodologie des hermeneutischen Rekonstruktionismus, wird argumentiert, dass epistemischer Pluralismus nicht zwangslĂ€ufig zu epistemischem Relativismus fĂŒhren muss und dass eine systematische Organisation der Perspektivenvielfalt von bereits existierenden Modellen zur kognitiven Entwicklung profitieren kann, wie sie etwa in der Psychologie oder den Sozial- und Kulturwissenschaften rekonstruiert werden. Der vorgestellte Ansatz versteht sich als ein Beitrag zur multi-perspektivischen Wissensorganisation, der sowohl neue analytische Werkzeuge fĂŒr kulturvergleichende Betrachtungen von Wissensorganisationssystemen bereitstellt als auch neue Organisationsprinzipien vorstellt fĂŒr eine Kontexterschließung, die dazu beitragen kann die AusdrucksstĂ€rke bereits vorhandener Dokumentationssprachen zu erhöhen. Zudem enthĂ€lt der Anhang eine umfangreiche Zusammenstellung von Modellen integrativer Wissensebenen.This dissertation is concerned with a systematic organization of the epistemological dimension of human knowledge in terms of viewpoints and methods. In particular, it will be explored to what extent the well-known organizing principle of integrative levels that presents a developmental hierarchy of complexity and integration can be applied for a basic classification of viewpoints or epistemic outlooks. The central thesis pursued in this investigation is that an adequate analysis of such epistemic contexts requires tools that allow to compare and evaluate divergent or even conflicting frames of reference according to context-transcending standards and criteria. This task demands a theoretical and methodological foundation that avoids the limitation of radical contextualism and its inherent threat of a fragmentation of knowledge due to the alleged incommensurability of the underlying frames of reference. Based on JĂŒrgen Habermas’s Theory of Communicative Action and his methodology of hermeneutic reconstructionism, it will be argued that epistemic pluralism does not necessarily imply epistemic relativism and that a systematic organization of the multiplicity of perspectives can benefit from already existing models of cognitive development as reconstructed in research fields like psychology, social sciences, and humanities. The proposed cognitive-developmental approach to knowledge organization aims to contribute to a multi-perspective knowledge organization by offering both analytical tools for cross-cultural comparisons of knowledge organization systems (e.g., Seven Epitomes and Dewey Decimal Classification) and organizing principles for context representation that help to improve the expressiveness of existing documentary languages (e.g., Integrative Levels Classification). Additionally, the appendix includes an extensive compilation of conceptions and models of Integrative Levels of Knowing from a broad multidisciplinary field

    ImageNet Large Scale Visual Recognition Challenge

    Get PDF
    The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional reference

    Patch-based models for visual object classes

    Get PDF
    This thesis concerns models for visual object classes that exhibit a reasonable amount of regularity, such as faces, pedestrians, cells and human brains. Such models are useful for making “within-object” inferences such as determining their individual characteristics and establishing their identity. For example, the model could be used to predict the identity of a face, the pose of a pedestrian or the phenotype of a cell and segment parts of a human brain. Existing object modelling techniques have several limitations. First, most current methods have targeted the above tasks individually using object specific representations; therefore, they cannot be applied to other problems without major alterations. Second, most methods have been designed to work with small databases which do not contain the variations in pose, illumination, occlusion and background clutter seen in ‘real world’ images. Consequently, many existing algorithms fail when tested on unconstrained databases. Finally, the complexity of the training procedure in these methods makes it impractical to use large datasets. In this thesis, we investigate patch-based models for object classes. Our models are capable of exploiting very large databases of objects captured in uncontrolled environments. We represent the test image with a regular grid of patches from a library of images of the same object. All the domain specific information is held in this library: we use one set of images of the object to help draw inferences about others. In each experimental chapter we investigate a different within-object inference task. In particular we develop models for classification, regression, semantic segmentation and identity recognition. In each task, we achieve results that are comparable to or better than the state of the art. We conclude that patch-based representation can be successfully used for the above tasks and shows promise for other applications such as generation and localization

    Video modeling via implicit motion representations

    Get PDF
    Video modeling refers to the development of analytical representations for explaining the intensity distribution in video signals. Based on the analytical representation, we can develop algorithms for accomplishing particular video-related tasks. Therefore video modeling provides us a foundation to bridge video data and related-tasks. Although there are many video models proposed in the past decades, the rise of new applications calls for more efficient and accurate video modeling approaches.;Most existing video modeling approaches are based on explicit motion representations, where motion information is explicitly expressed by correspondence-based representations (i.e., motion velocity or displacement). Although it is conceptually simple, the limitations of those representations and the suboptimum of motion estimation techniques can degrade such video modeling approaches, especially for handling complex motion or non-ideal observation video data. In this thesis, we propose to investigate video modeling without explicit motion representation. Motion information is implicitly embedded into the spatio-temporal dependency among pixels or patches instead of being explicitly described by motion vectors.;Firstly, we propose a parametric model based on a spatio-temporal adaptive localized learning (STALL). We formulate video modeling as a linear regression problem, in which motion information is embedded within the regression coefficients. The coefficients are adaptively learned within a local space-time window based on LMMSE criterion. Incorporating a spatio-temporal resampling and a Bayesian fusion scheme, we can enhance the modeling capability of STALL on more general videos. Under the framework of STALL, we can develop video processing algorithms for a variety of applications by adjusting model parameters (i.e., the size and topology of model support and training window). We apply STALL on three video processing problems. The simulation results show that motion information can be efficiently exploited by our implicit motion representation and the resampling and fusion do help to enhance the modeling capability of STALL.;Secondly, we propose a nonparametric video modeling approach, which is not dependent on explicit motion estimation. Assuming the video sequence is composed of many overlapping space-time patches, we propose to embed motion-related information into the relationships among video patches and develop a generic sparsity-based prior for typical video sequences. First, we extend block matching to more general kNN-based patch clustering, which provides an implicit and distributed representation for motion information. We propose to enforce the sparsity constraint on a higher-dimensional data array signal, which is generated by packing the patches in the similar patch set. Then we solve the inference problem by updating the kNN array and the wanted signal iteratively. Finally, we present a Bayesian fusion approach to fuse multiple-hypothesis inferences. Simulation results in video error concealment, denoising, and deartifacting are reported to demonstrate its modeling capability.;Finally, we summarize the proposed two video modeling approaches. We also point out the perspectives of implicit motion representations in applications ranging from low to high level problems

    Baroque Worlds of the 21st Century

    Get PDF
    This dissertation furnishes an analysis of the unfolding twenty-first century neobaroque phenomenon. Thus this disquisition delves through assorted cultural artifacts from the current digital era as well as manifestations of neobaroque motifs; these virtual baroques of the twenty-first century range from memento mori, video games, social networking sites and the sponsors of these neobaroque manifestations--the corporations, a baroque legacy. This thesis seeks parallels, as opposed to replications of the first modern global culture--the historical baroque of the seventeenth century; it also provides an extensive etymological research of the international and evolving meaning of baroque, taking also into consideration its political instrumentality. The approach of this study also treats the current neobaroque as a global phenomenon. Therefore, to unveil the baroque resonance of these artifacts, global and multidisciplinary scholars and theories from traditional and nontraditional baroque bastions will be applied
    • 

    corecore