31,069 research outputs found

    Learning Robust Object Recognition Using Composed Scenes from Generative Models

    Full text link
    Recurrent feedback connections in the mammalian visual system have been hypothesized to play a role in synthesizing input in the theoretical framework of analysis by synthesis. The comparison of internally synthesized representation with that of the input provides a validation mechanism during perceptual inference and learning. Inspired by these ideas, we proposed that the synthesis machinery can compose new, unobserved images by imagination to train the network itself so as to increase the robustness of the system in novel scenarios. As a proof of concept, we investigated whether images composed by imagination could help an object recognition system to deal with occlusion, which is challenging for the current state-of-the-art deep convolutional neural networks. We fine-tuned a network on images containing objects in various occlusion scenarios, that are imagined or self-generated through a deep generator network. Trained on imagined occluded scenarios under the object persistence constraint, our network discovered more subtle and localized image features that were neglected by the original network for object classification, obtaining better separability of different object classes in the feature space. This leads to significant improvement of object recognition under occlusion for our network relative to the original network trained only on un-occluded images. In addition to providing practical benefits in object recognition under occlusion, this work demonstrates the use of self-generated composition of visual scenes through the synthesis loop, combined with the object persistence constraint, can provide opportunities for neural networks to discover new relevant patterns in the data, and become more flexible in dealing with novel situations.Comment: Accepted by 14th Conference on Computer and Robot Visio

    Deep Regionlets for Object Detection

    Full text link
    In this paper, we propose a novel object detection framework named "Deep Regionlets" by establishing a bridge between deep neural networks and conventional detection schema for accurate generic object detection. Motivated by the abilities of regionlets for modeling object deformation and multiple aspect ratios, we incorporate regionlets into an end-to-end trainable deep learning framework. The deep regionlets framework consists of a region selection network and a deep regionlet learning module. Specifically, given a detection bounding box proposal, the region selection network provides guidance on where to select regions to learn the features from. The regionlet learning module focuses on local feature selection and transformation to alleviate local variations. To this end, we first realize non-rectangular region selection within the detection framework to accommodate variations in object appearance. Moreover, we design a "gating network" within the regionlet leaning module to enable soft regionlet selection and pooling. The Deep Regionlets framework is trained end-to-end without additional efforts. We perform ablation studies and conduct extensive experiments on the PASCAL VOC and Microsoft COCO datasets. The proposed framework outperforms state-of-the-art algorithms, such as RetinaNet and Mask R-CNN, even without additional segmentation labels.Comment: Accepted to ECCV 201

    Rectangular Layouts and Contact Graphs

    Get PDF
    Contact graphs of isothetic rectangles unify many concepts from applications including VLSI and architectural design, computational geometry, and GIS. Minimizing the area of their corresponding {\em rectangular layouts} is a key problem. We study the area-optimization problem and show that it is NP-hard to find a minimum-area rectangular layout of a given contact graph. We present O(n)-time algorithms that construct O(n2)O(n^2)-area rectangular layouts for general contact graphs and O(nlogā”n)O(n\log n)-area rectangular layouts for trees. (For trees, this is an O(logā”n)O(\log n)-approximation algorithm.) We also present an infinite family of graphs (rsp., trees) that require Ī©(n2)\Omega(n^2) (rsp., Ī©(nlogā”n)\Omega(n\log n)) area. We derive these results by presenting a new characterization of graphs that admit rectangular layouts using the related concept of {\em rectangular duals}. A corollary to our results relates the class of graphs that admit rectangular layouts to {\em rectangle of influence drawings}.Comment: 28 pages, 13 figures, 55 references, 1 appendi

    The Partial Visibility Representation Extension Problem

    Full text link
    For a graph GG, a function Ļˆ\psi is called a \emph{bar visibility representation} of GG when for each vertex vāˆˆV(G)v \in V(G), Ļˆ(v)\psi(v) is a horizontal line segment (\emph{bar}) and uvāˆˆE(G)uv \in E(G) iff there is an unobstructed, vertical, Īµ\varepsilon-wide line of sight between Ļˆ(u)\psi(u) and Ļˆ(v)\psi(v). Graphs admitting such representations are well understood (via simple characterizations) and recognizable in linear time. For a directed graph GG, a bar visibility representation Ļˆ\psi of GG, additionally, puts the bar Ļˆ(u)\psi(u) strictly below the bar Ļˆ(v)\psi(v) for each directed edge (u,v)(u,v) of GG. We study a generalization of the recognition problem where a function Ļˆā€²\psi' defined on a subset Vā€²V' of V(G)V(G) is given and the question is whether there is a bar visibility representation Ļˆ\psi of GG with Ļˆ(v)=Ļˆā€²(v)\psi(v) = \psi'(v) for every vāˆˆVā€²v \in V'. We show that for undirected graphs this problem together with closely related problems are \NP-complete, but for certain cases involving directed graphs it is solvable in polynomial time.Comment: Appears in the Proceedings of the 24th International Symposium on Graph Drawing and Network Visualization (GD 2016

    Cascaded Segmentation-Detection Networks for Word-Level Text Spotting

    Full text link
    We introduce an algorithm for word-level text spotting that is able to accurately and reliably determine the bounding regions of individual words of text "in the wild". Our system is formed by the cascade of two convolutional neural networks. The first network is fully convolutional and is in charge of detecting areas containing text. This results in a very reliable but possibly inaccurate segmentation of the input image. The second network (inspired by the popular YOLO architecture) analyzes each segment produced in the first stage, and predicts oriented rectangular regions containing individual words. No post-processing (e.g. text line grouping) is necessary. With execution time of 450 ms for a 1000-by-560 image on a Titan X GPU, our system achieves the highest score to date among published algorithms on the ICDAR 2015 Incidental Scene Text dataset benchmark.Comment: 7 pages, 8 figure

    Note on y-truncation: a simple approach to generating bounded distributions for environmental applications

    Get PDF
    It may sometimes be desirable to introduce bounds into probability distributions to formalise the presence of upper or lower physical limits to data to which the distribution has been applied. For example, an upper bound in raindrop sizes might be represented by introducing an upper bound to an exponential drop-size distribution. However, the standard method of truncating unbounded probability distributions yields distributions with non-zero probability density at the resulting bounds. In reality it is likely that physical bounding processes in nature increase in intensity as the bound is approached, causing a progressive decline in observation relative frequency to zero at the bound. Truncation below a y-axis point is proposed as a simple alternative means of creating more natural truncated probability distributions for application to data of this type. The resulting ā€œy-truncatedā€ distributions have similarities with the traditional truncated distributions but probability densities have the desirable feature of always declining to zero at the bounds. In addition, the y-truncation approach can also serve in its own right as a mean

    Epitaxial Frustration in Deposited Packings of Rigid Disks and Spheres

    Full text link
    We use numerical simulation to investigate and analyze the way that rigid disks and spheres arrange themselves when compressed next to incommensurate substrates. For disks, a movable set is pressed into a jammed state against an ordered fixed line of larger disks, where the diameter ratio of movable to fixed disks is 0.8. The corresponding diameter ratio for the sphere simulations is 0.7, where the fixed substrate has the structure of a (001) plane of a face-centered cubic array. Results obtained for both disks and spheres exhibit various forms of density-reducing packing frustration next to the incommensurate substrate, including some cases displaying disorder that extends far from the substrate. The disk system calculations strongly suggest that the most efficient (highest density) packings involve configurations that are periodic in the lateral direction parallel to the substrate, with substantial geometric disruption only occurring near the substrate. Some evidence has also emerged suggesting that for the sphere systems a corresponding structure doubly periodic in the lateral directions would yield the highest packing density; however all of the sphere simulations completed thus far produced some residual "bulk" disorder not obviously resulting from substrate mismatch. In view of the fact that the cases studied here represent only a small subset of all that eventually deserve attention, we end with discussion of the directions in which first extensions of the present simulations might profitably be pursued.Comment: 28 pages, 14 figures; typos fixed; a sentence added to 4th paragraph of sect 5 in responce to a referee's comment
    • ā€¦
    corecore