18,168 research outputs found

    Fourteenth Biennial Status Report: März 2017 - February 2019

    No full text

    The Sound Manifesto

    Full text link
    Computing practice today depends on visual output to drive almost all user interaction. Other senses, such as audition, may be totally neglected, or used tangentially, or used in highly restricted specialized ways. We have excellent audio rendering through D-A conversion, but we lack rich general facilities for modeling and manipulating sound comparable in quality and flexibility to graphics. We need co-ordinated research in several disciplines to improve the use of sound as an interactive information channel. Incremental and separate improvements in synthesis, analysis, speech processing, audiology, acoustics, music, etc. will not alone produce the radical progress that we seek in sonic practice. We also need to create a new central topic of study in digital audio research. The new topic will assimilate the contributions of different disciplines on a common foundation. The key central concept that we lack is sound as a general-purpose information channel. We must investigate the structure of this information channel, which is driven by the co-operative development of auditory perception and physical sound production. Particular audible encodings, such as speech and music, illuminate sonic information by example, but they are no more sufficient for a characterization than typography is sufficient for a characterization of visual information.Comment: To appear in the conference on Critical Technologies for the Future of Computing, part of SPIE's International Symposium on Optical Science and Technology, 30 July to 4 August 2000, San Diego, C

    SAVOIAS: A Diverse, Multi-Category Visual Complexity Dataset

    Full text link
    Visual complexity identifies the level of intricacy and details in an image or the level of difficulty to describe the image. It is an important concept in a variety of areas such as cognitive psychology, computer vision and visualization, and advertisement. Yet, efforts to create large, downloadable image datasets with diverse content and unbiased groundtruthing are lacking. In this work, we introduce Savoias, a visual complexity dataset that compromises of more than 1,400 images from seven image categories relevant to the above research areas, namely Scenes, Advertisements, Visualization and infographics, Objects, Interior design, Art, and Suprematism. The images in each category portray diverse characteristics including various low-level and high-level features, objects, backgrounds, textures and patterns, text, and graphics. The ground truth for Savoias is obtained by crowdsourcing more than 37,000 pairwise comparisons of images using the forced-choice methodology and with more than 1,600 contributors. The resulting relative scores are then converted to absolute visual complexity scores using the Bradley-Terry method and matrix completion. When applying five state-of-the-art algorithms to analyze the visual complexity of the images in the Savoias dataset, we found that the scores obtained from these baseline tools only correlate well with crowdsourced labels for abstract patterns in the Suprematism category (Pearson correlation r=0.84). For the other categories, in particular, the objects and advertisement categories, low correlation coefficients were revealed (r=0.3 and 0.56, respectively). These findings suggest that (1) state-of-the-art approaches are mostly insufficient and (2) Savoias enables category-specific method development, which is likely to improve the impact of visual complexity analysis on specific application areas, including computer vision.Comment: 10 pages, 4 figures, 4 table

    MeshAdv: Adversarial Meshes for Visual Recognition

    Full text link
    Highly expressive models such as deep neural networks (DNNs) have been widely applied to various applications. However, recent studies show that DNNs are vulnerable to adversarial examples, which are carefully crafted inputs aiming to mislead the predictions. Currently, the majority of these studies have focused on perturbation added to image pixels, while such manipulation is not physically realistic. Some works have tried to overcome this limitation by attaching printable 2D patches or painting patterns onto surfaces, but can be potentially defended because 3D shape features are intact. In this paper, we propose meshAdv to generate "adversarial 3D meshes" from objects that have rich shape features but minimal textural variation. To manipulate the shape or texture of the objects, we make use of a differentiable renderer to compute accurate shading on the shape and propagate the gradient. Extensive experiments show that the generated 3D meshes are effective in attacking both classifiers and object detectors. We evaluate the attack under different viewpoints. In addition, we design a pipeline to perform black-box attack on a photorealistic renderer with unknown rendering parameters.Comment: Published in IEEE CVPR201

    The Laminar Architecture of Visual Cortex and Image Processing Technology

    Full text link
    The mammalian neocortex is organized into layers which include circuits that form functional columns in cortical maps. A major unsolved problem concerns how bottom-up, top-down, and horizontal interactions are organized within cortical layers to generate adaptive behaviors. This article summarizes a model, called the LAMINART model, of how these interactions help visual cortex to realize: (1) the binding process whereby cortex groups distributed data into coherent object representations; (2) the attentional process whereby cortex selectively processes important events; and (3) the developmental and learning processes whereby cortex stably grows and tunes its circuits to match environmental constraints. Such Laminar Computing completes perceptual groupings that realize the property of Analog Coherence, whereby winning groupings bind together their inducing features without losing their ability to represent analog values of these features. Laminar Computing also efficiently unifies the computational requirements of preattentive filtering and grouping with those of attentional selection. It hereby shows how Adaptive Resonance Theory (ART) principles may be realized within the laminar circuits of neocortex. Applications include boundary segmentation and surface filling-in algorithms for processing Synthetic Aperture Radar images.Defense Advanced Research Projects Agency and the Office of Naval Research (N00014-95-1-0409); Office of Naval Research (N00014-95-1-0657

    The effect of transparency on recognition of overlapping objects

    Get PDF
    Are overlapping objects easier to recognize when the objects are transparent or opaque? It is important to know whether the transparency of X-ray images of luggage contributes to the difficulty in searching those images for targets. Transparency provides extra information about objects that would normally be occluded but creates potentially ambiguous depth relations at the region of overlap. Two experiments investigated the threshold durations at which adult participants could accurately name pairs of overlapping objects that were opaque or transparent. In Experiment 1, the transparent displays included monocular cues to relative depth. Recognition of the back object was possible at shorter durations for transparent displays than for opaque displays. In Experiment 2, the transparent displays had no monocular depth cues. There was no difference in the duration at which the back object was recognized across transparent and opaque displays. The results of the two experiments suggest that transparent displays, even though less familiar than opaque displays, do not make object recognition more difficult, and possibly show a benefit. These findings call into question the importance of edge junctions in object recognitio

    Efficient Analysis of Complex Diagrams using Constraint-Based Parsing

    Full text link
    This paper describes substantial advances in the analysis (parsing) of diagrams using constraint grammars. The addition of set types to the grammar and spatial indexing of the data make it possible to efficiently parse real diagrams of substantial complexity. The system is probably the first to demonstrate efficient diagram parsing using grammars that easily be retargeted to other domains. The work assumes that the diagrams are available as a flat collection of graphics primitives: lines, polygons, circles, Bezier curves and text. This is appropriate for future electronic documents or for vectorized diagrams converted from scanned images. The classes of diagrams that we have analyzed include x,y data graphs and genetic diagrams drawn from the biological literature, as well as finite state automata diagrams (states and arcs). As an example, parsing a four-part data graph composed of 133 primitives required 35 sec using Macintosh Common Lisp on a Macintosh Quadra 700.Comment: 9 pages, Postscript, no fonts, compressed, uuencoded. Composed in MSWord 5.1a for the Mac. To appear in ICDAR '95. Other versions at ftp://ftp.ccs.neu.edu/pub/people/futrell

    How does the Cerebral Cortex Work? Learning, Attention, and Grouping by the Laminar Circuits of Visual Cortex

    Full text link
    The organization of neocortex into layers is one of its most salient anatomical features. These layers include circuits that form functional columns in cortical maps. A major unsolved problem concerns how bottom-up, top-down, and horizontal interactions are organized within cortical layers to generate adaptive behaviors. This article models how these interactions help visual co1tex to realize: (I) the binding process whereby cortex groups distributed data into coherent object representations; (2) the attentional process whereby cortex selectively processes important events; and (3) the developmental and learning processes whereby cortex shapes its circuits to match environmental constraints. New computational ideas about feedback systems suggest how neocortex develops and learns in a stable way, and why top-down attention requires converging bottom-up inputs to fully activate cortical cells, whereas perceptual groupings do not.Defense Advanced Research Projects Agency; National Science Foundation (IRI-97-20333); Office of Naval Research (N00014-95-1-0409, N00014-95-1-0657

    Deformable Prototypes for Encoding Shape Categories in Image Databases

    Full text link
    We describe a method for shape-based image database search that uses deformable prototypes to represent categories. Rather than directly comparing a candidate shape with all shape entries in the database, shapes are compared in terms of the types of nonrigid deformations (differences) that relate them to a small subset of representative prototypes. To solve the shape correspondence and alignment problem, we employ the technique of modal matching, an information-preserving shape decomposition for matching, describing, and comparing shapes despite sensor variations and nonrigid deformations. In modal matching, shape is decomposed into an ordered basis of orthogonal principal components. We demonstrate the utility of this approach for shape comparison in 2-D image databases.Office of Naval Research (Young Investigator Award N00014-06-1-0661
    • …
    corecore