140,224 research outputs found

    Semantic Object Parsing with Local-Global Long Short-Term Memory

    Full text link
    Semantic object parsing is a fundamental task for understanding objects in detail in computer vision community, where incorporating multi-level contextual information is critical for achieving such fine-grained pixel-level recognition. Prior methods often leverage the contextual information through post-processing predicted confidence maps. In this work, we propose a novel deep Local-Global Long Short-Term Memory (LG-LSTM) architecture to seamlessly incorporate short-distance and long-distance spatial dependencies into the feature learning over all pixel positions. In each LG-LSTM layer, local guidance from neighboring positions and global guidance from the whole image are imposed on each position to better exploit complex local and global contextual information. Individual LSTMs for distinct spatial dimensions are also utilized to intrinsically capture various spatial layouts of semantic parts in the images, yielding distinct hidden and memory cells of each position for each dimension. In our parsing approach, several LG-LSTM layers are stacked and appended to the intermediate convolutional layers to directly enhance visual features, allowing network parameters to be learned in an end-to-end way. The long chains of sequential computation by stacked LG-LSTM layers also enable each pixel to sense a much larger region for inference benefiting from the memorization of previous dependencies in all positions along all dimensions. Comprehensive evaluations on three public datasets well demonstrate the significant superiority of our LG-LSTM over other state-of-the-art methods.Comment: 10 page

    Multi-Level Visual Alphabets

    Get PDF
    A central debate in visual perception theory is the argument for indirect versus direct perception; i.e., the use of intermediate, abstract, and hierarchical representations versus direct semantic interpretation of images through interaction with the outside world. We present a content-based representation that combines both approaches. The previously developed Visual Alphabet method is extended with a hierarchy of representations, each level feeding into the next one, but based on features that are not abstract but directly relevant to the task at hand. Explorative benchmark experiments are carried out on face images to investigate and explain the impact of the key parameters such as pattern size, number of prototypes, and distance measures used. Results show that adding an additional middle layer improves results, by encoding the spatial co-occurrence of lower-level pattern prototypes

    A Relaxation Scheme for Mesh Locality in Computer Vision.

    Get PDF
    Parallel processing has been considered as the key to build computer systems of the future and has become a mainstream subject in Computer Science. Computer Vision applications are computationally intensive that require parallel approaches to exploit the intrinsic parallelism. This research addresses this problem for low-level and intermediate-level vision problems. The contributions of this dissertation are a unified scheme based on probabilistic relaxation labeling that captures localities of image data and the ability of using this scheme to develop efficient parallel algorithms for Computer Vision problems. We begin with investigating the problem of skeletonization. The technique of pattern match that exhausts all the possible interaction patterns between a pixel and its neighboring pixels captures the locality of this problem, and leads to an efficient One-pass Parallel Asymmetric Thinning Algorithm (OPATA\sb8). The use of 8-distance in this algorithm, or chessboard distance, not only improves the quality of the resulting skeletons, but also improves the efficiency of the computation. This new algorithm plays an important role in a hierarchical route planning system to extract high level typological information of cross-country mobility maps which greatly speeds up the route searching over large areas. We generalize the neighborhood interaction description method to include more complicated applications such as edge detection and image restoration. The proposed probabilistic relaxation labeling scheme exploit parallelism by discovering local interactions in neighboring areas and by describing them effectively. The proposed scheme consists of a transformation function and a dictionary construction method. The non-linear transformation function is derived from Markov Random Field theory. It efficiently combines evidences from neighborhood interactions. The dictionary construction method provides an efficient way to encode these localities. A case study applies the scheme to the problem of edge detection. The relaxation step of this edge-detection algorithm greatly reduces noise effects, gets better edge localization such as line ends and corners, and plays a crucial rule in refining edge outputs. The experiments on both synthetic and natural images show that our algorithm converges quickly, and is robust in noisy environment
    corecore