135,095 research outputs found

    Deformable Prototypes for Encoding Shape Categories in Image Databases

    Full text link
    We describe a method for shape-based image database search that uses deformable prototypes to represent categories. Rather than directly comparing a candidate shape with all shape entries in the database, shapes are compared in terms of the types of nonrigid deformations (differences) that relate them to a small subset of representative prototypes. To solve the shape correspondence and alignment problem, we employ the technique of modal matching, an information-preserving shape decomposition for matching, describing, and comparing shapes despite sensor variations and nonrigid deformations. In modal matching, shape is decomposed into an ordered basis of orthogonal principal components. We demonstrate the utility of this approach for shape comparison in 2-D image databases.Office of Naval Research (Young Investigator Award N00014-06-1-0661

    Iterative graph cuts for image segmentation with a nonlinear statistical shape prior

    Full text link
    Shape-based regularization has proven to be a useful method for delineating objects within noisy images where one has prior knowledge of the shape of the targeted object. When a collection of possible shapes is available, the specification of a shape prior using kernel density estimation is a natural technique. Unfortunately, energy functionals arising from kernel density estimation are of a form that makes them impossible to directly minimize using efficient optimization algorithms such as graph cuts. Our main contribution is to show how one may recast the energy functional into a form that is minimizable iteratively and efficiently using graph cuts.Comment: Revision submitted to JMIV (02/24/13

    The nature of the animacy organization in human ventral temporal cortex

    Full text link
    The principles underlying the animacy organization of the ventral temporal cortex (VTC) remain hotly debated, with recent evidence pointing to an animacy continuum rather than a dichotomy. What drives this continuum? According to the visual categorization hypothesis, the continuum reflects the degree to which animals contain animal-diagnostic features. By contrast, the agency hypothesis posits that the continuum reflects the degree to which animals are perceived as (social) agents. Here, we tested both hypotheses with a stimulus set in which visual categorizability and agency were dissociated based on representations in convolutional neural networks and behavioral experiments. Using fMRI, we found that visual categorizability and agency explained independent components of the animacy continuum in VTC. Modeled together, they fully explained the animacy continuum. Finally, clusters explained by visual categorizability were localized posterior to clusters explained by agency. These results show that multiple organizing principles, including agency, underlie the animacy continuum in VTC.Comment: 16 pages, 5 figures, code+data at - https://doi.org/10.17605/OSF.IO/VXWG9 Update - added supplementary results and edited abstrac

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors

    Full text link
    The impressive performance of deep convolutional neural networks in single-view 3D reconstruction suggests that these models perform non-trivial reasoning about the 3D structure of the output space. However, recent work has challenged this belief, showing that complex encoder-decoder architectures perform similarly to nearest-neighbor baselines or simple linear decoder models that exploit large amounts of per category data in standard benchmarks. On the other hand settings where 3D shape must be inferred for new categories with few examples are more natural and require models that generalize about shapes. In this work we demonstrate experimentally that naive baselines do not apply when the goal is to learn to reconstruct novel objects using very few examples, and that in a \emph{few-shot} learning setting, the network must learn concepts that can be applied to new categories, avoiding rote memorization. To address deficiencies in existing approaches to this problem, we propose three approaches that efficiently integrate a class prior into a 3D reconstruction model, allowing to account for intra-class variability and imposing an implicit compositional structure that the model should learn. Experiments on the popular ShapeNet database demonstrate that our method significantly outperform existing baselines on this task in the few-shot setting

    Straight to Shapes: Real-time Detection of Encoded Shapes

    Full text link
    Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects' shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order concepts such as view similarity, pose variation and occlusion. To achieve this, we use a denoising convolutional auto-encoder to establish an embedding space, and place the decoder after a fast end-to-end network trained to regress directly to the encoded shape vectors. This yields what to the best of our knowledge is the first real-time shape prediction network, running at ~35 FPS on a high-end desktop. With higher-order shape reasoning well-integrated into the network pipeline, the network shows the useful practical quality of generalising to unseen categories similar to the ones in the training set, something that most existing approaches fail to handle.Comment: 16 pages including appendix; Published at CVPR 201

    Stereoscopic Sketchpad: 3D Digital Ink

    Get PDF
    --Context-- This project looked at the development of a stereoscopic 3D environment in which a user is able to draw freely in all three dimensions. The main focus was on the storage and manipulation of the ‘digital ink’ with which the user draws. For a drawing and sketching package to be effective it must not only have an easy to use user interface, it must be able to handle all input data quickly and efficiently so that the user is able to focus fully on their drawing. --Background-- When it comes to sketching in three dimensions the majority of applications currently available rely on vector based drawing methods. This is primarily because the applications are designed to take a users two dimensional input and transform this into a three dimensional model. Having the sketch represented as vectors makes it simpler for the program to act upon its geometry and thus convert it to a model. There are a number of methods to achieve this aim including Gesture Based Modelling, Reconstruction and Blobby Inflation. Other vector based applications focus on the creation of curves allowing the user to draw within or on existing 3D models. They also allow the user to create wire frame type models. These stroke based applications bring the user closer to traditional sketching rather than the more structured modelling methods detailed. While at present the field is inundated with vector based applications mainly focused upon sketch-based modelling there are significantly less voxel based applications. The majority of these applications focus on the deformation and sculpting of voxmaps, almost the opposite of drawing and sketching, and the creation of three dimensional voxmaps from standard two dimensional pixmaps. How to actually sketch freely within a scene represented by a voxmap has rarely been explored. This comes as a surprise when so many of the standard 2D drawing programs in use today are pixel based. --Method-- As part of this project a simple three dimensional drawing program was designed and implemented using C and C++. This tool is known as Sketch3D and was created using a Model View Controller (MVC) architecture. Due to the modular nature of Sketch3Ds system architecture it is possible to plug a range of different data structures into the program to represent the ink in a variety of ways. A series of data structures have been implemented and were tested for efficiency. These structures were a simple list, a 3D array, and an octree. They have been tested for: the time it takes to insert or remove points from the structure; how easy it is to manipulate points once they are stored; and also how the number of points stored effects the draw and rendering times. One of the key issues brought up by this project was devising a means by which a user is able to draw in three dimensions while using only two dimensional input devices. The method settled upon and implemented involves using the mouse or a digital pen to sketch as one would in a standard 2D drawing package but also linking the up and down keyboard keys to the current depth. This allows the user to move in and out of the scene as they draw. A couple of user interface tools were also developed to assist the user. A 3D cursor was implemented and also a toggle, which when on, highlights all of the points intersecting the depth plane on which the cursor currently resides. These tools allow the user to see exactly where they are drawing in relation to previously drawn lines. --Results-- The tests conducted on the data structures clearly revealed that the octree was the most effective data structure. While not the most efficient in every area, it manages to avoid the major pitfalls of the other structures. The list was extremely quick to render and draw to the screen but suffered severely when it comes to finding and manipulating points already stored. In contrast the three dimensional array was able to erase or manipulate points effectively while the draw time rendered the structure effectively useless, taking huge amounts of time to draw each frame. The focus of this research was on how a 3D sketching package would go about storing and accessing the digital ink. This is just a basis for further research in this area and many issues touched upon in this paper will require a more in depth analysis. The primary area of this future research would be the creation of an effective user interface and the introduction of regular sketching package features such as the saving and loading of images
    • …
    corecore