202 research outputs found

    Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields

    Full text link
    This work presents a first evaluation of using spatio-temporal receptive fields from a recently proposed time-causal spatio-temporal scale-space framework as primitives for video analysis. We propose a new family of video descriptors based on regional statistics of spatio-temporal receptive field responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain and from object recognition to dynamic texture recognition. The time-recursive formulation enables computationally efficient time-causal recognition. The experimental evaluation demonstrates competitive performance compared to state-of-the-art. Especially, it is shown that binary versions of our dynamic texture descriptors achieve improved performance compared to a large range of similar methods using different primitives either handcrafted or learned from data. Further, our qualitative and quantitative investigation into parameter choices and the use of different sets of receptive fields highlights the robustness and flexibility of our approach. Together, these results support the descriptive power of this family of time-causal spatio-temporal receptive fields, validate our approach for dynamic texture recognition and point towards the possibility of designing a range of video analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure

    An evaluation of recent local image descriptors for real-world applications of image matching

    Get PDF
    This paper discusses and compares the best and most recent local descriptors, evaluating them on increasingly complex image matching tasks, encompassing planar and non-planar scenarios under severe viewpoint changes. This evaluation, aimed at assessing descriptor suitability for real-world applications, leverages the concept of approximated overlap error as a means to naturally extend to non-planar scenes the standard metric used for planar scenes. According to the evaluation results, most descriptors exhibit a gradual performance degradation in the transition from planar to non-planar scenes. The best descriptors are those capable of capturing well not only the local image context, but also the global scene structure. Data-driven approaches are shown to have reached the matching robustness and accuracy of the best hand-crafted descriptor

    Rethinking the sGLOH descriptor

    Get PDF
    sGLOH (shifting GLOH) is a histogram-based keypoint descriptor that can be associated to multiple quantized rotations of the keypoint patch without any recomputation. This property can be exploited to define the best distance between two descriptor vectors, thus avoiding computing the dominant orientation. In addition, sGLOH can reject incongruous correspondences by adding a global constraint on the rotations either as an a priori knowledge or based on the data. This paper thoroughly reconsiders sGLOH and improves it in terms of robustness, speed and descriptor dimension. The revised sGLOH embeds more quantized rotations, thus yielding more correct matches. A novel fast matching scheme is also designed, which significantly reduces both computation time and memory usage. In addition, a new binarization technique based on comparisons inside each descriptor histogram is defined, yielding a more compact, faster, yet robust alternative. Results on an exhaustive comparative experimental evaluation show that the revised sGLOH descriptor incorporating the above ideas and combining them according to task requirements, improves in most cases the state of the art in both image matching and object recognition

    Joint Adaptive Median Binary Patterns for texture classification

    Get PDF
    a b s t r a c t This paper addresses the challenging problem of the recognition and classification of textured surfaces given a single instance acquired under unknown pose, scale and illumination conditions. We propose a novel texture descriptor, the Adaptive Median Binary Pattern (AMBP) based on an adaptive analysis window of local patterns. The principal idea of the AMBP is to convert a small local image patch to a binary pattern using adaptive threshold selection that switches between the central pixel value as used in the Local Binary Pattern (LBP) and the median as in Median Binary Pattern (MBP), but within a variable sized analysis window depending on the local microstructure of the texture. The variability of the local adaptive window is included as joint information to increase the discriminative properties. A new multiscale scheme is also proposed in this paper to handle the texture resolution problem. AMBP is evaluated in relation to other recent binary pattern techniques and many other texture analysis methods on three large texture corpora with and without noise added, CUReT, Outex_TC00012 and KTH_TIPS2. Generally, the proposed method performs better than the best state-of-the-art techniques in the noiseless case and significantly outperforms all of them in the presence of impulse noise

    Registration and Recognition in 3D

    Get PDF
    The simplest Computer Vision algorithm can tell you what color it sees when you point it at an object, but asking that computer what it is looking at is a much harder problem. Camera and LiDAR (Light Detection And Ranging) sensors generally provide streams pixel of values and sophisticated algorithms must be engineered to recognize objects or the environment. There has been significant effort expended by the computer vision community on recognizing objects in color images; however, LiDAR sensors, which sense depth values for pixels instead of color, have been studied less. Recently we have seen a renewed interest in depth data with the democratization provided by consumer depth cameras. Detecting objects in depth data is more challenging in some ways because of the lack of texture and increased complexity of processing unordered point sets. We present three systems that contribute to solving the object recognition problem from the LiDAR perspective. They are: calibration, registration, and object recognition systems. We propose a novel calibration system that works with both line and raster based LiDAR sensors, and calibrates them with respect to image cameras. Our system can be extended to calibrate LiDAR sensors that do not give intensity information. We demonstrate a novel system that produces registrations between different LiDAR scans by transforming the input point cloud into a Constellation Extended Gaussian Image (CEGI) and then uses this CEGI to estimate the rotational alignment of the scans independently. Finally we present a method for object recognition which uses local (Spin Images) and global (CEGI) information to recognize cars in a large urban dataset. We present real world results from these three systems. Compelling experiments show that object recognition systems can gain much information using only 3D geometry. There are many object recognition and navigation algorithms that work on images; the work we propose in this thesis is more complimentary to those image based methods than competitive. This is an important step along the way to more intelligent robots

    Keyframe-based monocular SLAM: design, survey, and future directions

    Get PDF
    Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery

    Local Binary Patterns in Focal-Plane Processing. Analysis and Applications

    Get PDF
    Feature extraction is the part of pattern recognition, where the sensor data is transformed into a more suitable form for the machine to interpret. The purpose of this step is also to reduce the amount of information passed to the next stages of the system, and to preserve the essential information in the view of discriminating the data into different classes. For instance, in the case of image analysis the actual image intensities are vulnerable to various environmental effects, such as lighting changes and the feature extraction can be used as means for detecting features, which are invariant to certain types of illumination changes. Finally, classification tries to make decisions based on the previously transformed data. The main focus of this thesis is on developing new methods for the embedded feature extraction based on local non-parametric image descriptors. Also, feature analysis is carried out for the selected image features. Low-level Local Binary Pattern (LBP) based features are in a main role in the analysis. In the embedded domain, the pattern recognition system must usually meet strict performance constraints, such as high speed, compact size and low power consumption. The characteristics of the final system can be seen as a trade-off between these metrics, which is largely affected by the decisions made during the implementation phase. The implementation alternatives of the LBP based feature extraction are explored in the embedded domain in the context of focal-plane vision processors. In particular, the thesis demonstrates the LBP extraction with MIPA4k massively parallel focal-plane processor IC. Also higher level processing is incorporated to this framework, by means of a framework for implementing a single chip face recognition system. Furthermore, a new method for determining optical flow based on LBPs, designed in particular to the embedded domain is presented. Inspired by some of the principles observed through the feature analysis of the Local Binary Patterns, an extension to the well known non-parametric rank transform is proposed, and its performance is evaluated in face recognition experiments with a standard dataset. Finally, an a priori model where the LBPs are seen as combinations of n-tuples is also presentedSiirretty Doriast