8,816 research outputs found

    What do we perceive in a glance of a real-world scene?

    Get PDF
    What do we see when we glance at a natural scene and how does it change as the glance becomes longer? We asked naive subjects to report in a free-form format what they saw when looking at briefly presented real-life photographs. Our subjects received no specific information as to the content of each stimulus. Thus, our paradigm differs from previous studies where subjects were cued before a picture was presented and/or were probed with multiple-choice questions. In the first stage, 90 novel grayscale photographs were foveally shown to a group of 22 native-English-speaking subjects. The presentation time was chosen at random from a set of seven possible times (from 27 to 500 ms). A perceptual mask followed each photograph immediately. After each presentation, subjects reported what they had just seen as completely and truthfully as possible. In the second stage, another group of naive individuals was instructed to score each of the descriptions produced by the subjects in the first stage. Individual scores were assigned to more than a hundred different attributes. We show that within a single glance, much object- and scene-level information is perceived by human subjects. The richness of our perception, though, seems asymmetrical. Subjects tend to have a propensity toward perceiving natural scenes as being outdoor rather than indoor. The reporting of sensory- or feature-level information of a scene (such as shading and shape) consistently precedes the reporting of the semantic-level information. But once subjects recognize more semantic-level components of a scene, there is little evidence suggesting any bias toward either scene-level or object-level recognition

    Observed methods of cuneiform tablet reconstruction in virtual and real world environments

    Get PDF
    The reconstruction of fragmented artefacts is a tedious process that consumes many valuable work hours of scholars' time. We believe that such work can be made more efficient via new techniques in interactive virtual environments. The purpose of this research is to explore approaches to the reconstruction of cuneiform tablets in the real and virtual environment, and to address the potential barriers to virtual reconstruction of fragments. In this paper we present the results of an experiment exploring the reconstruction strategies employed by individual users working with tablet fragments in real and virtual environments. Our findings have identified physical factors that users find important to the reconstruction process and further explored the subjective usefulness of stereoscopic 3D in the reconstruction process. Our results, presented as dynamic graphs of interaction, compare the precise order of movement and rotation interactions, and the frequency of interaction achieved by successful and unsuccessful participants with some surprising insights. We present evidence that certain interaction styles and behaviours characterise success in the reconstruction process

    Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis

    Full text link
    We introduce a data-driven approach to complete partial 3D shapes through a combination of volumetric deep neural networks and 3D shape synthesis. From a partially-scanned input shape, our method first infers a low-resolution -- but complete -- output. To this end, we introduce a 3D-Encoder-Predictor Network (3D-EPN) which is composed of 3D convolutional layers. The network is trained to predict and fill in missing data, and operates on an implicit surface representation that encodes both known and unknown space. This allows us to predict global structure in unknown areas at high accuracy. We then correlate these intermediary results with 3D geometry from a shape database at test time. In a final pass, we propose a patch-based 3D shape synthesis method that imposes the 3D geometry from these retrieved shapes as constraints on the coarsely-completed mesh. This synthesis process enables us to reconstruct fine-scale detail and generate high-resolution output while respecting the global mesh structure obtained by the 3D-EPN. Although our 3D-EPN outperforms state-of-the-art completion method, the main contribution in our work lies in the combination of a data-driven shape predictor and analytic 3D shape synthesis. In our results, we show extensive evaluations on a newly-introduced shape completion benchmark for both real-world and synthetic data

    Cumulative object categorization in clutter

    Get PDF
    In this paper we present an approach based on scene- or part-graphs for geometrically categorizing touching and occluded objects. We use additive RGBD feature descriptors and hashing of graph configuration parameters for describing the spatial arrangement of constituent parts. The presented experiments quantify that this method outperforms our earlier part-voting and sliding window classification. We evaluated our approach on cluttered scenes, and by using a 3D dataset containing over 15000 Kinect scans of over 100 objects which were grouped into general geometric categories. Additionally, color, geometric, and combined features were compared for categorization tasks

    Web App for Tools Inventory Management with Predictive Categorization

    Get PDF
    The title of this project ‘Web App for Tools Inventory Management with Predictive Categorization' is proposed by Mr. Muhamad Hamzah bin Razali. The main purpose of this project is to develop a web application namely ‘Drillclinic’ that can digitalize inventory management process for tools management, by having predictive tool categorization and assigning Data Matrix code to each real-world Tool object

    Matterport3D: Learning from RGB-D Data in Indoor Environments

    Full text link
    Access to large, diverse RGB-D datasets is critical for training RGB-D scene understanding algorithms. However, existing datasets still cover only a limited number of views or a restricted scale of spaces. In this paper, we introduce Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D and 3D semantic segmentations. The precise global alignment and comprehensive, diverse panoramic set of views over entire buildings enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification
    corecore