1,674 research outputs found

    Improving Bags-of-Words model for object categorization

    Get PDF
    In the past decade, Bags-of-Words (BOW) models have become popular for the task of object recognition, owing to their good performance and simplicity. Some of the most effective recent methods for computer-based object recognition work by detecting and extracting local image features, before quantizing them according to a codebook rule such as k-means clustering, and classifying these with conventional classifiers such as Support Vector Machines and Naive Bayes. In this thesis, a Spatial Object Recognition Framework is presented that consists of the four main contributions of the research. The first contribution, frequent keypoint pattern discovery, works by combining pairs and triplets of frequent keypoints in order to discover intermediate representations for object classes. Based on the same frequent keypoints principle, algorithms for locating the region-of-interest in training images is then discussed. Extensions to the successful Spatial Pyramid Matching scheme, in order to better capture spatial relationships, are then proposed. The pairs frequency histogram and shapes frequency histogram work by capturing more redefined spatial information between local image features. Finally, alternative techniques to Spatial Pyramid Matching for capturing spatial information are presented. The proposed techniques, variations of binned log-polar histograms, divides the image into grids of different scale and different orientation. Thus captures the distribution of image features both in distance and orientation explicitly. Evaluations on the framework are focused on several recent and popular datasets, including image retrieval, object recognition, and object categorization. Overall, while the effectiveness of the framework is limited in some of the datasets, the proposed contributions are nevertheless powerful improvements of the BOW model

    Data Management Challenges for Internet-scale 3D Search Engines

    Full text link
    This paper describes the most significant data-related challenges involved in building internet-scale 3D search engines. The discussion centers on the most pressing data management issues in this domain, including model acquisition, support for multiple file formats, asset versioning, data integrity errors, the data lifecycle, intellectual property, and the legality of web crawling. The paper also discusses numerous issues that fall under the rubric of trustworthy computing, including privacy, security, inappropriate content, and copying/remixing of assets. The goal of the paper is to provide an overview of these general issues, illustrated by empirical data drawn from the internet's largest operational search engine. While numerous works have been published on 3D information retrieval, this paper is the first to discuss the real-world challenges that arise in building practical search engines at scale.Comment: Second version, distributed by SIGIR Foru

    Vehicle make and model recognition for intelligent transportation monitoring and surveillance.

    Get PDF
    Vehicle Make and Model Recognition (VMMR) has evolved into a significant subject of study due to its importance in numerous Intelligent Transportation Systems (ITS), such as autonomous navigation, traffic analysis, traffic surveillance and security systems. A highly accurate and real-time VMMR system significantly reduces the overhead cost of resources otherwise required. The VMMR problem is a multi-class classification task with a peculiar set of issues and challenges like multiplicity, inter- and intra-make ambiguity among various vehicles makes and models, which need to be solved in an efficient and reliable manner to achieve a highly robust VMMR system. In this dissertation, facing the growing importance of make and model recognition of vehicles, we present a VMMR system that provides very high accuracy rates and is robust to several challenges. We demonstrate that the VMMR problem can be addressed by locating discriminative parts where the most significant appearance variations occur in each category, and learning expressive appearance descriptors. Given these insights, we consider two data driven frameworks: a Multiple-Instance Learning-based (MIL) system using hand-crafted features and an extended application of deep neural networks using MIL. Our approach requires only image level class labels, and the discriminative parts of each target class are selected in a fully unsupervised manner without any use of part annotations or segmentation masks, which may be costly to obtain. This advantage makes our system more intelligent, scalable, and applicable to other fine-grained recognition tasks. We constructed a dataset with 291,752 images representing 9,170 different vehicles to validate and evaluate our approach. Experimental results demonstrate that the localization of parts and distinguishing their discriminative powers for categorization improve the performance of fine-grained categorization. Extensive experiments conducted using our approaches yield superior results for images that were occluded, under low illumination, partial camera views, or even non-frontal views, available in our real-world VMMR dataset. The approaches presented herewith provide a highly accurate VMMR system for rea-ltime applications in realistic environments.\\ We also validate our system with a significant application of VMMR to ITS that involves automated vehicular surveillance. We show that our application can provide law inforcement agencies with efficient tools to search for a specific vehicle type, make, or model, and to track the path of a given vehicle using the position of multiple cameras

    Perceptual Cues and Subjective Organization in a Virtual Information Workspace

    Get PDF
    The key to effectively using the immense body of data on the Internet is an efficient method of organizing relevant information. Researchers and designers are beginning to promote the advantages of three-dimensional (3D) models of information storage and retrieval; however, the potential benefits of perceptual depth cues have not been systematically studied. The present study used a computer task to examine the effectiveness of three types of virtual desktops. A two-dimensional (2D) virtual desktop display, lacking in the cues that give the illusion of depth, was compared to two different 3D virtual desktops, both of which used perceptual cues to convey a sense of depth. One of the 3D desktop conditions conveyed motion parallax through an automatic rotation. It was expected that performance would increase as the number of perceptual cues increased. The present study also examined the potential benefits of organizing and retrieving documents from a subjectively organized versus a preconstructed, or fixed, information space. An organization that individuals create for their own use may be difficult for others to use. Thus, subjective organization of documents was expected to promote better performance than a fixed organization scheme, which is exactly what the data showed. There was a very strong performance benefit to those who organized their own desktops. Contrary to the other hypothesis, the 2D arrangement was more beneficial to users than either the 3D or 3D with motion arrangements. The 2D advantage may be the result of a number of factors. First, although people live in a 3D world they navigate more on 2D planes. Also, people may naturally encode spatial information in a descriptive or symbolic manner, as opposed to creating a spatial analog in the mind\u27s eye. Designers should not blindly attempt to create interfaces that mimic the real world. The choice between a 2D and 3D interface should be based upon the type of task to which the interface will be applied. Information storage/recall tasks, including the present task, will most likely benefit from a 2D interface. Other tasks that make greater use of navigation in 3D space may be better suited to 3D displays

    Combining perceptual features with diffusion distance for face recognition

    Get PDF
    corecore