1,051 research outputs found

    Adaptive Nonparametric Image Parsing

    Get PDF
    In this paper, we present an adaptive nonparametric solution to the image parsing task, namely annotating each image pixel with its corresponding category label. For a given test image, first, a locality-aware retrieval set is extracted from the training data based on super-pixel matching similarities, which are augmented with feature extraction for better differentiation of local super-pixels. Then, the category of each super-pixel is initialized by the majority vote of the kk-nearest-neighbor super-pixels in the retrieval set. Instead of fixing kk as in traditional non-parametric approaches, here we propose a novel adaptive nonparametric approach which determines the sample-specific k for each test image. In particular, kk is adaptively set to be the number of the fewest nearest super-pixels which the images in the retrieval set can use to get the best category prediction. Finally, the initial super-pixel labels are further refined by contextual smoothing. Extensive experiments on challenging datasets demonstrate the superiority of the new solution over other state-of-the-art nonparametric solutions.Comment: 11 page

    Supervised geodesic propagation for semantic label transfer

    Get PDF
    Abstract. In this paper we propose a novel semantic label transfer method using supervised geodesic propagation (SGP). We use supervised learning to guide the seed selection and the label propagation. Given an input image, we first retrieve its similar image set from annotated databases. A Joint Boost model is learned on the similar image set of the input image. Then the recognition proposal map of the input image is inferred by this learned model. The initial distance map is defined by the proposal map: the higher probability, the smaller distance. In each iteration step of the geodesic propagation, the seed is selected as the one with the smallest distance from the undetermined superpixels. We learn a classifier as an indicator to indicate whether to propagate labels between two neighboring superpixels. The training samples of the indicator are annotated neighboring pairs from the similar image set. The geodesic distances of its neighbors are updated according to the combination of the texture and boundary features and the indication value. Experiments on three datasets show that our method outperforms the traditional learning based methods and the previous label transfer method for the semantic segmentation work

    The feasibility of using feature-flow and label transfer system to segment medical images with deformed anatomy in orthopedic surgery

    Get PDF
    In computer-aided surgical systems, to obtain high fidelity three-dimensional models, we require accurate segmentation of medical images. State-of-art medical image segmentation methods have been used successfully in particular applications, but they have not been demonstrated to work well over a wide range of deformities. For this purpose, I studied and evaluated medical image segmentation using the feature-flow based Label Transfer System described by Liu and colleagues. This system has produced promising results in parsing images of natural scenes. Its ability to deal with variations in shapes of objects is desirable. In this paper, we altered this system and assessed its feasibility of automatic segmentation. Experiments showed that this system achieved better recognition rates than those in natural-scene parsing applications, but the high recognition rates were not consistent across different images. Although this system is not considered clinically practical, we may improve it and incorporate it with other medical segmentation tools

    A computational framework for unsupervised analysis of everyday human activities

    Get PDF
    In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner. A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity. Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments.Ph.D.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Reh

    On the Role of Context at Different Scales in Scene Parsing

    No full text
    Scene parsing can be formulated as a labeling problem where each visual data element, e.g., each pixel of an image or each 3D point in a point cloud, is assigned a semantic class label. One can approach this problem by training a classifier and predicting a class label for the data elements purely based on their local properties. This approach, however, does not take into account any kind of contextual information between different elements in the image or point cloud. For example, in an application where we are interested in labeling roadside objects, the fact that most of the utility poles are connected to some power wires can be very helpful in disambiguating them from other similar looking classes. Recurrence of certain class combinations can be also considered as a good contextual hint since they are very likely to co-occur again. These forms of high-level contextual information are often formulated using pairwise and higher-order Conditional Random Fields (CRFs). A CRF is a probabilistic graphical model that encodes the contextual relationships between the data elements in a scene. In this thesis, we study the potential of contextual information at different scales (ranges) in scene parsing problems. First, we propose a model that utilizes the local context of the scene via a pairwise CRF. Our model acquires contextual interactions between different classes by assessing their misclassification rates using only the local properties of data. In other words, no extra training is required for obtaining the class interaction information. Next, we expand the context field of view from a local range to a longer range, and make use of higher-order models to encode more complex contextual cues. More specifically, we introduce a new model to employ geometric higher-order terms in a CRF for semantic labeling of 3D point cloud data. Despite the potential of the above models at capturing the contextual cues in the scene, there are higher-level context cues that cannot be encoded via pairwise and higher-order CRFs. For instance, a vehicle is very unlikely to appear in a sea scene, or buildings are frequently observed in a street scene. Such information can be described using scene context and are modeled using global image descriptors. In particular, through an image retrieval procedure, we find images whose content is similar to that of the query image, and use them for scene parsing. Another problem of the above methods is that they rely on a computationally expensive training process for the classification using the local properties of data elements, which needs to be repeated every time the training data is modified. We address this issue by proposing a fast and efficient approach that exempts us from the cumbersome training task, by transferring the ground-truth information directly from the training data to the test data

    ์˜๋ฏธ๋ก ์  ์˜์ƒ ๋ถ„ํ• ์„ ์œ„ํ•œ ๋งฅ๋ฝ ์ธ์‹ ๊ธฐ๋ฐ˜ ํ‘œํ˜„ ํ•™์Šต

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 2. ์ด๊ฒฝ๋ฌด.Semantic segmentation, segmenting all the objects and identifying their categories, is a fundamental and important problem in computer vision. Traditional approaches to semantic segmentation are based on two main elements: visual appearance features and semantic context. Visual appearance features such as color, edge, shape and so on, are a primary source of information for reasoning the objects in an image. However, image data are sometimes unable to fully capture diversity in the object classes, since the appearance of the objects presented in real world scenes is affected by imaging conditions such as illumination, texture, occlusion, and viewpoint. Therefore, semantic context, obtained from not only the presence but also the location of other objects, can help to disambiguate the visual appearance in semantic segmentation tasks. The modern contextualized semantic segmentation systems have successfully improved segmentation performance by refining inconsistently labeled pixels via modeling of contextual interactions. However, they considered semantic context and visual appearance features independently due to the absence of the suitable representation model. Motivated by this issue, this dissertation proposes a novel framework for learning semantic context-aware representations in which appearance features is enhanced and enriched by semantic context and vice versa. The first part of the dissertation will be devoted to semantic context-aware appearance modeling for semantic segmentation. Adaptive context aggregation network is studied to capture semantic context adequately while multiple steps of reasoning. Secondly, semantic context will be reinforced by utilizing visual appearance. Graph and example-based context model is presented for estimating contextual relationships according to the visual appearance of objects. Finally, we propose a Multiscale Conditional Random Fields (CRFs), for integrating context-aware appearance and appearance-aware semantic context to produce accurate segmentations. Experimental evaluations show the effectiveness of the proposed context-aware representations on various challenging datasets.1 Introduction 1 1.1 Backgrounds 3 1.2 Context Modeling for Semantic Segmentation Systems 4 1.3 Dissertation Goal and Contribution 6 1.4 Organization of Dissertation 7 2 Adaptive Context Aggregation Network 11 2.1 Introduction 11 2.2 Related Works 13 2.3 Proposed Method 15 2.3.1 Embedding Network 15 2.3.2 Deeply Supervised Context Aggregation Network 16 2.4 Experiments 20 2.4.1 PASCAL VOC 2012 dataset 22 2.4.2 SIFT Flow dataset 23 2.5 Summary 25 3 Second-order Semantic Relationships 27 3.1 Introduction 27 3.2 Related Work 30 3.3 Our Approach 32 3.3.1 Overview 32 3.3.2 Retrieval System 34 3.3.3 Graph Construction 35 3.3.4 Context Exemplar Description 35 3.3.5 Context Link Prediction 37 3.4 Inference 40 3.5 Experiements 42 3.6 Summary 52 4 High-order Semantic Relationships 53 4.1 Introduction 53 4.2 Related work 55 4.3 The high-order semantic relation transfer algorithm 58 4.3.1 Problem statement 58 4.3.2 Objective function 59 4.3.3 Approximate algorithm 61 4.4 Semantic segmentation through semantic relation transfer 65 4.4.1 Scene retrieval 65 4.4.2 Inference 65 4.5 Experiements 67 4.6 Summary 73 5 Multiscale CRF formulation 75 5.1 Introduction 75 5.2 Proposed Method 76 5.2.1 Multiscale Potentials 77 5.2.2 Non Convex Optimization 79 5.3 Experiments 79 5.3.1 SiftFlow dataset 79 6 Conclusion 83 6.1 Summary of the dissertation 83 6.2 Future Works 84 Abstract (In Korean) 98Docto
    • โ€ฆ
    corecore