826 research outputs found

    Using Apache Lucene to Search Vector of Locally Aggregated Descriptors

    Full text link
    Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on metric space using conventional text search engines, such as Apache Lucene. This technique is based on comparing the permutations of some reference objects in place of the original metric distance. However, the Achilles heel of STR approach is the need to reorder the result set of the search according to the metric distance. This forces to use a support database to store the original objects, which requires efficient random I/O on a fast secondary memory (such as flash-based storages). In this paper, we propose to extend the Surrogate Text Representation to specifically address a class of visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD). This approach is based on representing the individual sub-vectors forming the VLAD vector with the STR, providing a finer representation of the vector and enabling us to get rid of the reordering phase. The experiments on a publicly available dataset show that the extended STR outperforms the baseline STR achieving satisfactory performance near to the one obtained with the original VLAD vectors.Comment: In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 4: VISAPP, p. 383-39

    A Vision System for Automating Municipal Waste Collection

    Get PDF
    This thesis describes an industry need to make municipal waste collection more efficient. In an attempt to solve this need Waterloo Controls Inc. and a research team at UWO are exploring the idea of combining a vision system and a robotic arm to complete the waste collection process. The system as a whole is described during the introduction section of this report, but the specific goal of this thesis was the development of the vision system component. This component is the main contribution of this thesis and consists of a candidate selection step followed by a verification step

    Electronic Image Stabilization for Mobile Robotic Vision Systems

    Get PDF
    When a camera is affixed on a dynamic mobile robot, image stabilization is the first step towards more complex analysis on the video feed. This thesis presents a novel electronic image stabilization (EIS) algorithm for small inexpensive highly dynamic mobile robotic platforms with onboard camera systems. The algorithm combines optical flow motion parameter estimation with angular rate data provided by a strapdown inertial measurement unit (IMU). A discrete Kalman filter in feedforward configuration is used for optimal fusion of the two data sources. Performance evaluations are conducted by a simulated video truth model (capturing the effects of image translation, rotation, blurring, and moving objects), and live test data. Live data was collected from a camera and IMU affixed to the DAGSI Whegs™ mobile robotic platform as it navigated through a hallway. Template matching, feature detection, optical flow, and inertial measurement techniques are compared and analyzed to determine the most suitable algorithm for this specific type of image stabilization. Pyramidal Lucas-Kanade optical flow using Shi-Tomasi good features in combination with inertial measurement is the EIS algorithm found to be superior. In the presence of moving objects, fusion of inertial measurement reduces optical flow root-mean-squared (RMS) error in motion parameter estimates by 40%. No previous image stabilization algorithm to date directly fuses optical flow estimation with inertial measurement by way of Kalman filtering

    FACE CLASSIFICATION FOR AUTHENTICATION APPROACH BY USING WAVELET TRANSFORM AND STATISTICAL FEATURES SELECTION

    Get PDF
    This thesis consists of three parts: face localization, features selection and classification process. Three methods were proposed to locate the face region in the input image. Two of them based on pattern (template) Matching Approach, and the other based on clustering approach. Five datasets of faces namely: YALE database, MIT-CBCL database, Indian database, BioID database and Caltech database were used to evaluate the proposed methods. For the first method, the template image is prepared previously by using a set of faces. Later, the input image is enhanced by applying n-means kernel to decrease the image noise. Then Normalized Correlation (NC) is used to measure the correlation coefficients between the template image and the input image regions. For the second method, instead of using n-means kernel, an optimized metrics are used to measure the difference between the template image and the input image regions. In the last method, the Modified K-Means Algorithm was used to remove the non-face regions in the input image. The above-mentioned three methods showed accuracy of localization between 98% and 100% comparing with the existed methods. In the second part of the thesis, Discrete Wavelet Transform (DWT) utilized to transform the input image into number of wavelet coefficients. Then, the coefficients of weak statistical energy less than certain threshold were removed, and resulted in decreasing the primary wavelet coefficients number up to 98% out of the total coefficients. Later, only 40% statistical features were extracted from the hight energy features by using the variance modified metric. During the experimental (ORL) Dataset was used to test the proposed statistical method. Finally, Cluster-K-Nearest Neighbor (C-K-NN) was proposed to classify the input face based on the training faces images. The results showed a significant improvement of 99.39% in the ORL dataset and 100% in the Face94 dataset classification accuracy. Moreover, a new metrics were introduced to quantify the exactness of classification and some errors of the classification can be corrected. All the above experiments were implemented in MATLAB environment

    Vision-Based Production of Personalized Video

    No full text
    In this paper we present a novel vision-based system for the automated production of personalised video souvenirs for visitors in leisure and cultural heritage venues. Visitors are visually identified and tracked through a camera network. The system produces a personalized DVD souvenir at the end of a visitor’s stay allowing visitors to relive their experiences. We analyze how we identify visitors by fusing facial and body features, how we track visitors, how the tracker recovers from failures due to occlusions, as well as how we annotate and compile the final product. Our experiments demonstrate the feasibility of the proposed approach
    corecore