63 research outputs found
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
In this work we address the task of semantic image segmentation with Deep
Learning and make three main contributions that are experimentally shown to
have substantial practical merit. First, we highlight convolution with
upsampled filters, or 'atrous convolution', as a powerful tool in dense
prediction tasks. Atrous convolution allows us to explicitly control the
resolution at which feature responses are computed within Deep Convolutional
Neural Networks. It also allows us to effectively enlarge the field of view of
filters to incorporate larger context without increasing the number of
parameters or the amount of computation. Second, we propose atrous spatial
pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP
probes an incoming convolutional feature layer with filters at multiple
sampling rates and effective fields-of-views, thus capturing objects as well as
image context at multiple scales. Third, we improve the localization of object
boundaries by combining methods from DCNNs and probabilistic graphical models.
The commonly deployed combination of max-pooling and downsampling in DCNNs
achieves invariance but has a toll on localization accuracy. We overcome this
by combining the responses at the final DCNN layer with a fully connected
Conditional Random Field (CRF), which is shown both qualitatively and
quantitatively to improve localization performance. Our proposed "DeepLab"
system sets the new state-of-art at the PASCAL VOC-2012 semantic image
segmentation task, reaching 79.7% mIOU in the test set, and advances the
results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and
Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM
Brain Tumor Detection and Segmentation in Multisequence MRI
Tato prĂĄce se zabĂœvĂĄ detekcĂ a segmentacĂ mozkovĂ©ho nĂĄdoru v multisekvenÄnĂch MR obrazech se zamÄĆenĂm na gliomy vysokĂ©ho a nĂzkĂ©ho stupnÄ malignity. Jsou zde pro tento ĂșÄel navrĆŸeny tĆi metody. PrvnĂ metoda se zabĂœvĂĄ detekcĂ prezence ÄĂĄstĂ mozkovĂ©ho nĂĄdoru v axiĂĄlnĂch a koronĂĄrnĂch Ćezech. JednĂĄ se o algoritmus zaloĆŸenĂœ na analĂœze symetrie pĆi rĆŻznĂœch rozliĆĄenĂch obrazu, kterĂœ byl otestovĂĄn na T1, T2, T1C a FLAIR obrazech. DruhĂĄ metoda se zabĂœvĂĄ extrakcĂ oblasti celĂ©ho mozkovĂ©ho nĂĄdoru, zahrnujĂcĂ oblast jĂĄdra tumoru a edĂ©mu, ve FLAIR a T2 obrazech. Metoda je schopna extrahovat mozkovĂœ nĂĄdor z 2D i 3D obrazĆŻ. Je zde opÄt vyuĆŸita analĂœza symetrie, kterĂĄ je nĂĄsledovĂĄna automatickĂœm stanovenĂm intenzitnĂho prahu z nejvĂce asymetrickĂœch ÄĂĄstĂ. TĆetĂ metoda je zaloĆŸena na predikci lokĂĄlnĂ struktury a je schopna segmentovat celou oblast nĂĄdoru, jeho jĂĄdro i jeho aktivnĂ ÄĂĄst. Metoda vyuĆŸĂvĂĄ faktu, ĆŸe vÄtĆĄina lĂ©kaĆskĂœch obrazĆŻ vykazuje vysokou podobnost intenzit sousednĂch pixelĆŻ a silnou korelaci mezi intenzitami v rĆŻznĂœch obrazovĂœch modalitĂĄch. JednĂm ze zpĆŻsobĆŻ, jak s touto korelacĂ pracovat a pouĆŸĂvat ji, je vyuĆŸitĂ lokĂĄlnĂch obrazovĂœch polĂ. PodobnĂĄ korelace existuje takĂ© mezi sousednĂmi pixely v anotaci obrazu. Tento pĆĂznak byl vyuĆŸit v predikci lokĂĄlnĂ struktury pĆi lokĂĄlnĂ anotaci polĂ. Jako klasifikaÄnĂ algoritmus je v tĂ©to metodÄ pouĆŸita konvoluÄnĂ neuronovĂĄ sĂĆ„ vzhledem k jejĂ znĂĄme schopnosti zachĂĄzet s korelacĂ mezi pĆĂznaky. VĆĄechny tĆi metody byly otestovĂĄny na veĆejnĂ© databĂĄzi 254 multisekvenÄnĂch MR obrazech a byla dosĂĄhnuta pĆesnost srovnatelnĂĄ s nejmodernÄjĆĄĂmi metodami v mnohem kratĆĄĂm vĂœpoÄetnĂm Äase (v ĆĂĄdu sekund pĆi pouĆŸitĂœ CPU), coĆŸ poskytuje moĆŸnost manuĂĄlnĂch Ășprav pĆi interaktivnĂ segmetaci.This work deals with the brain tumor detection and segmentation in multisequence MR images with particular focus on high- and low-grade gliomas. Three methods are propose for this purpose. The first method deals with the presence detection of brain tumor structures in axial and coronal slices. This method is based on multi-resolution symmetry analysis and it was tested for T1, T2, T1C and FLAIR images. The second method deals with extraction of the whole brain tumor region, including tumor core and edema, in FLAIR and T2 images and is suitable to extract the whole brain tumor region from both 2D and 3D. It also uses the symmetry analysis approach which is followed by automatic determination of the intensity threshold from the most asymmetric parts. The third method is based on local structure prediction and it is able to segment the whole tumor region as well as tumor core and active tumor. This method takes the advantage of a fact that most medical images feature a high similarity in intensities of nearby pixels and a strong correlation of intensity profiles across different image modalities. One way of dealing with -- and even exploiting -- this correlation is the use of local image patches. In the same way, there is a high correlation between nearby labels in image annotation, a feature that has been used in the ``local structure prediction'' of local label patches. Convolutional neural network is chosen as a learning algorithm, as it is known to be suited for dealing with correlation between features. All three methods were evaluated on a public data set of 254 multisequence MR volumes being able to reach comparable results to state-of-the-art methods in much shorter computing time (order of seconds running on CPU) providing means, for example, to do online updates when aiming at an interactive segmentation.
Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos
Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field (CRF), which is well-suited to modeling local interactions among adjacent image regions. However the CRF is limited in dealing with complex, global (long-range) interactions between regions in an image, and between frames in a video. This thesis presents approaches to modeling long-range interactions within images and videos, for use in semantic labeling.
In order to model these long-range interactions, we incorporate priors based on the restricted Boltzmann machine (RBM). The RBM is a generative model which has demonstrated the ability to learn the shape of an object and the CRBM is a temporal extension which can learn the motion of an object. Although the CRF is a good baseline labeler, we show how the RBM and CRBM can be added to the architecture to model both the global object shape within an image and the temporal dependencies of the object from previous frames in a video. We demonstrate the labeling performance of our models for the parts of complex face images from the Labeled Faces in the Wild database (for images) and the YouTube Faces Database (for videos). Our hybrid models produce results that are both quantitatively and qualitatively better than the baseline CRF alone for both images and videos
Integrative Levels of Knowing
Diese Dissertation beschĂ€ftigt sich mit einer systematischen Organisation der epistemologischen Dimension des menschlichen Wissens in Bezug auf Perspektiven und Methoden. Insbesondere wird untersucht inwieweit das bekannte Organisationsprinzip der integrativen Ebenen, das eine Hierarchie zunehmender KomplexitĂ€t und Integration beschreibt, geeignet ist fĂŒr eine grundlegende Klassifikation von Perspektiven bzw. epistemischen Bezugsrahmen. Die zentrale These dieser Dissertation geht davon aus, dass eine angemessene Analyse solcher epistemischen Kontexte in der Lage sein sollte, unterschiedliche oder gar konfligierende Bezugsrahmen anhand von kontextĂŒbergreifenden Standards und Kriterien vergleichen und bewerten zu können. Diese Aufgabe erfordert theoretische und methodologische Grundlagen, welche die BeschrĂ€nkungen eines radikalen Kontextualismus vermeiden, insbesondere die ihm innewohnende Gefahr einer Fragmentierung des Wissens aufgrund der angeblichen InkommensurabilitĂ€t epistemischer Kontexte. Basierend auf JĂŒrgen Habermasâ Theorie des kommunikativen Handelns und seiner Methodologie des hermeneutischen Rekonstruktionismus, wird argumentiert, dass epistemischer Pluralismus nicht zwangslĂ€ufig zu epistemischem Relativismus fĂŒhren muss und dass eine systematische Organisation der Perspektivenvielfalt von bereits existierenden Modellen zur kognitiven Entwicklung profitieren kann, wie sie etwa in der Psychologie oder den Sozial- und Kulturwissenschaften rekonstruiert werden. Der vorgestellte Ansatz versteht sich als ein Beitrag zur multi-perspektivischen Wissensorganisation, der sowohl neue analytische Werkzeuge fĂŒr kulturvergleichende Betrachtungen von Wissensorganisationssystemen bereitstellt als auch neue Organisationsprinzipien vorstellt fĂŒr eine KontexterschlieĂung, die dazu beitragen kann die AusdrucksstĂ€rke bereits vorhandener Dokumentationssprachen zu erhöhen. Zudem enthĂ€lt der Anhang eine umfangreiche Zusammenstellung von Modellen integrativer Wissensebenen.This dissertation is concerned with a systematic organization of the epistemological dimension of human knowledge in terms of viewpoints and methods. In particular, it will be explored to what extent the well-known organizing principle of integrative levels that presents a developmental hierarchy of complexity and integration can be applied for a basic classification of viewpoints or epistemic outlooks. The central thesis pursued in this investigation is that an adequate analysis of such epistemic contexts requires tools that allow to compare and evaluate divergent or even conflicting frames of reference according to context-transcending standards and criteria. This task demands a theoretical and methodological foundation that avoids the limitation of radical contextualism and its inherent threat of a fragmentation of knowledge due to the alleged incommensurability of the underlying frames of reference. Based on JĂŒrgen Habermasâs Theory of Communicative Action and his methodology of hermeneutic reconstructionism, it will be argued that epistemic pluralism does not necessarily imply epistemic relativism and that a systematic organization of the multiplicity of perspectives can benefit from already existing models of cognitive development as reconstructed in research fields like psychology, social sciences, and humanities. The proposed cognitive-developmental approach to knowledge organization aims to contribute to a multi-perspective knowledge organization by offering both analytical tools for cross-cultural comparisons of knowledge organization systems (e.g., Seven Epitomes and Dewey Decimal Classification) and organizing principles for context representation that help to improve the expressiveness of existing documentary languages (e.g., Integrative Levels Classification). Additionally, the appendix includes an extensive compilation of conceptions and models of Integrative Levels of Knowing from a broad multidisciplinary field
ImageNet Large Scale Visual Recognition Challenge
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in
object category classification and detection on hundreds of object categories
and millions of images. The challenge has been run annually from 2010 to
present, attracting participation from more than fifty institutions.
This paper describes the creation of this benchmark dataset and the advances
in object recognition that have been possible as a result. We discuss the
challenges of collecting large-scale ground truth annotation, highlight key
breakthroughs in categorical object recognition, provide a detailed analysis of
the current state of the field of large-scale image classification and object
detection, and compare the state-of-the-art computer vision accuracy with human
accuracy. We conclude with lessons learned in the five years of the challenge,
and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL
VOC (per-category comparisons in Table 3, distribution of localization
difficulty in Fig 16), a list of queries used for obtaining object detection
images (Appendix C), and some additional reference
Patch-based models for visual object classes
This thesis concerns models for visual object classes that exhibit a reasonable amount of regularity,
such as faces, pedestrians, cells and human brains. Such models are useful for making
âwithin-objectâ inferences such as determining their individual characteristics and establishing
their identity. For example, the model could be used to predict the identity of a face, the pose
of a pedestrian or the phenotype of a cell and segment parts of a human brain.
Existing object modelling techniques have several limitations. First, most current methods
have targeted the above tasks individually using object specific representations; therefore, they
cannot be applied to other problems without major alterations. Second, most methods have been
designed to work with small databases which do not contain the variations in pose, illumination,
occlusion and background clutter seen in âreal worldâ images. Consequently, many existing
algorithms fail when tested on unconstrained databases. Finally, the complexity of the training
procedure in these methods makes it impractical to use large datasets.
In this thesis, we investigate patch-based models for object classes. Our models are capable
of exploiting very large databases of objects captured in uncontrolled environments. We
represent the test image with a regular grid of patches from a library of images of the same
object. All the domain specific information is held in this library: we use one set of images of
the object to help draw inferences about others. In each experimental chapter we investigate
a different within-object inference task. In particular we develop models for classification, regression,
semantic segmentation and identity recognition. In each task, we achieve results that
are comparable to or better than the state of the art. We conclude that patch-based representation
can be successfully used for the above tasks and shows promise for other applications such
as generation and localization
Recommended from our members
Contour and texture for visual recognition of object categories
The recognition of categories of objects in images has become a central
topic in computer vision. Automatic visual recognition systems
are rapidly becoming central to applications such as image search,
robotics, vehicle safety systems, and image editing. This work addresses
three sub-problems of recognition: image classification, object
detection, and semantic segmentation. The task of classification
is to determine whether an object of a particular category is present
or not. Object detection aims to localize any objects of the category.
Semantic segmentation is a more complete image understanding,
whereby an image is partitioned into coherent regions that are assigned
meaningful class labels. This thesis proposes novel discriminative
learning approaches to these problems.
Our primary contributions are threefold. Firstly, we demonstrate
that the contours (the outline and interior edges) of an object are,
alone, sufficient for accurate visual recognition. Secondly, we propose
two powerful new feature types: (i) a learned codebook of contour
fragments matched with an improved oriented chamfer distance,
and (ii) a set of texture-based features that simultaneously exploit
local appearance, approximate shape, and appearance context.
The efficacy of these new features types is evaluated on a wide variety
of datasets. Thirdly, we show how, in combination, these two
largely orthogonal feature types can substantially improve recognition
performance above that achieved by either alone
Video modeling via implicit motion representations
Video modeling refers to the development of analytical representations for explaining the intensity distribution in video signals. Based on the analytical representation, we can develop algorithms for accomplishing particular video-related tasks. Therefore video modeling provides us a foundation to bridge video data and related-tasks. Although there are many video models proposed in the past decades, the rise of new applications calls for more efficient and accurate video modeling approaches.;Most existing video modeling approaches are based on explicit motion representations, where motion information is explicitly expressed by correspondence-based representations (i.e., motion velocity or displacement). Although it is conceptually simple, the limitations of those representations and the suboptimum of motion estimation techniques can degrade such video modeling approaches, especially for handling complex motion or non-ideal observation video data. In this thesis, we propose to investigate video modeling without explicit motion representation. Motion information is implicitly embedded into the spatio-temporal dependency among pixels or patches instead of being explicitly described by motion vectors.;Firstly, we propose a parametric model based on a spatio-temporal adaptive localized learning (STALL). We formulate video modeling as a linear regression problem, in which motion information is embedded within the regression coefficients. The coefficients are adaptively learned within a local space-time window based on LMMSE criterion. Incorporating a spatio-temporal resampling and a Bayesian fusion scheme, we can enhance the modeling capability of STALL on more general videos. Under the framework of STALL, we can develop video processing algorithms for a variety of applications by adjusting model parameters (i.e., the size and topology of model support and training window). We apply STALL on three video processing problems. The simulation results show that motion information can be efficiently exploited by our implicit motion representation and the resampling and fusion do help to enhance the modeling capability of STALL.;Secondly, we propose a nonparametric video modeling approach, which is not dependent on explicit motion estimation. Assuming the video sequence is composed of many overlapping space-time patches, we propose to embed motion-related information into the relationships among video patches and develop a generic sparsity-based prior for typical video sequences. First, we extend block matching to more general kNN-based patch clustering, which provides an implicit and distributed representation for motion information. We propose to enforce the sparsity constraint on a higher-dimensional data array signal, which is generated by packing the patches in the similar patch set. Then we solve the inference problem by updating the kNN array and the wanted signal iteratively. Finally, we present a Bayesian fusion approach to fuse multiple-hypothesis inferences. Simulation results in video error concealment, denoising, and deartifacting are reported to demonstrate its modeling capability.;Finally, we summarize the proposed two video modeling approaches. We also point out the perspectives of implicit motion representations in applications ranging from low to high level problems
Baroque Worlds of the 21st Century
This dissertation furnishes an analysis of the unfolding twenty-first century neobaroque phenomenon. Thus this disquisition delves through assorted cultural artifacts from the current digital era as well as manifestations of neobaroque motifs; these virtual baroques of the twenty-first century range from memento mori, video games, social networking sites and the sponsors of these neobaroque manifestations--the corporations, a baroque legacy. This thesis seeks parallels, as opposed to replications of the first modern global culture--the historical baroque of the seventeenth century; it also provides an extensive etymological research of the international and evolving meaning of baroque, taking also into consideration its political instrumentality. The approach of this study also treats the current neobaroque as a global phenomenon. Therefore, to unveil the baroque resonance of these artifacts, global and multidisciplinary scholars and theories from traditional and nontraditional baroque bastions will be applied
- âŠ