908 research outputs found

    A survey of visual preprocessing and shape representation techniques

    Get PDF
    Many recent theories and methods proposed for visual preprocessing and shape representation are summarized. The survey brings together research from the fields of biology, psychology, computer science, electrical engineering, and most recently, neural networks. It was motivated by the need to preprocess images for a sparse distributed memory (SDM), but the techniques presented may also prove useful for applying other associative memories to visual pattern recognition. The material of this survey is divided into three sections: an overview of biological visual processing; methods of preprocessing (extracting parts of shape, texture, motion, and depth); and shape representation and recognition (form invariance, primitives and structural descriptions, and theories of attention)

    A unified framework for detecting groups and application to shape recognition

    Get PDF
    A unified a contrario detection method is proposed to solve three classical problems in clustering analysis. The first one is to evaluate the validity of a cluster candidate. The second problem is that meaningful clusters can contain or be contained in other meaningful clusters. A rule is needed to define locally optimal clusters by inclusion. The third problem is the definition of a correct merging rule between meaningful clusters, permitting to decide whether they should stay separate or unit. The motivation of this theory is shape recognition. Matching algorithms usually compute correspondences between more or less local features (called shape elements) between images to be compared. This paper intends to form spatially coherent groups between matching shape elements into a shape. Each pair of matching shape elements indeed leads to a unique transformation (similarity or affine map.) As an application, the present theory on the choice of the right clusters is used to group these shape elements into shapes by detecting clusters in the transformation space

    3D Motion Estimation By Evidence Gathering

    No full text
    In this paper we introduce an algorithm for 3D motion estimation in point clouds that is based on Chasles’ kinematic theorem. The proposed algorithm estimates 3D motion parameters directly from the data by exploiting the geometry of rigid transformation using an evidence gathering technique in a Hough-voting-like approach. The algorithm provides an alternative to the feature description and matching pipelines commonly used by numerous 3D object recognition and registration algorithms, as it does not involve keypoint detection and feature descriptor computation and matching. To the best of our knowledge, this is the first research to use kinematics theorems in an evidence gathering framework for motion estimation and surface matching without the use of any given correspondences. Moreover, we propose a method for voting for 3D motion parameters using a one-dimensional accumulator space, which enables voting for motion parameters more efficiently than other methods that use up to 7-dimensional accumulator spaces

    Hough Transform Implementation For Event-Based Systems: Concepts and Challenges

    Get PDF
    Hough transform (HT) is one of the most well-known techniques in computer vision that has been the basis of many practical image processing algorithms. HT however is designed to work for frame-based systems such as conventional digital cameras. Recently, event-based systems such as Dynamic Vision Sensor (DVS) cameras, has become popular among researchers. Event-based cameras have a significantly high temporal resolution (1 μs), but each pixel can only detect change and not color. As such, the conventional image processing algorithms cannot be readily applied to event-based output streams. Therefore, it is necessary to adapt the conventional image processing algorithms for event-based cameras. This paper provides a systematic explanation, starting from extending conventional HT to 3D HT, adaptation to event-based systems, and the implementation of the 3D HT using Spiking Neural Networks (SNNs). Using SNN enables the proposed solution to be easily realized on hardware using FPGA, without requiring CPU or additional memory. In addition, we also discuss techniques for optimal SNN-based implementation using efficient number of neurons for the required accuracy and resolution along each dimension, without increasing the overall computational complexity. We hope that this will help to reduce the gap between event-based and frame-based systems

    Model-driven and Data-driven Approaches for some Object Recognition Problems

    Get PDF
    Recognizing objects from images and videos has been a long standing problem in computer vision. The recent surge in the prevalence of visual cameras has given rise to two main challenges where, (i) it is important to understand different sources of object variations in more unconstrained scenarios, and (ii) rather than describing an object in isolation, efficient learning methods for modeling object-scene `contextual' relations are required to resolve visual ambiguities. This dissertation addresses some aspects of these challenges, and consists of two parts. First part of the work focuses on obtaining object descriptors that are largely preserved across certain sources of variations, by utilizing models for image formation and local image features. Given a single instance of an object, we investigate the following three problems. (i) Representing a 2D projection of a 3D non-planar shape invariant to articulations, when there are no self-occlusions. We propose an articulation invariant distance that is preserved across piece-wise affine transformations of a non-rigid object `parts', under a weak perspective imaging model, and then obtain a shape context-like descriptor to perform recognition; (ii) Understanding the space of `arbitrary' blurred images of an object, by representing an unknown blur kernel of a known maximum size using a complete set of orthonormal basis functions spanning that space, and showing that subspaces resulting from convolving a clean object and its blurred versions with these basis functions are equal under some assumptions. We then view the invariant subspaces as points on a Grassmann manifold, and use statistical tools that account for the underlying non-Euclidean nature of the space of these invariants to perform recognition across blur; (iii) Analyzing the robustness of local feature descriptors to different illumination conditions. We perform an empirical study of these descriptors for the problem of face recognition under lighting change, and show that the direction of image gradient largely preserves object properties across varying lighting conditions. The second part of the dissertation utilizes information conveyed by large quantity of data to learn contextual information shared by an object (or an entity) with its surroundings. (i) We first consider a supervised two-class problem of detecting lane markings from road video sequences, where we learn relevant feature-level contextual information through a machine learning algorithm based on boosting. We then focus on unsupervised object classification scenarios where, (ii) we perform clustering using maximum margin principles, by deriving some basic properties on the affinity of `a pair of points' belonging to the same cluster using the information conveyed by `all' points in the system, and (iii) then consider correspondence-free adaptation of statistical classifiers across domain shifting transformations, by generating meaningful `intermediate domains' that incrementally convey potential information about the domain change

    Automatic visual recognition using parallel machines

    Get PDF
    Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

    Faithful completion of images of scenic landmarks using internet images

    Get PDF
    Abstract—Previous works on image completion typically aim to produce visually plausible results rather than factually correct ones. In this paper, we propose an approach to faithfully complete the missing regions of an image. We assume that the input image is taken at a well-known landmark, so similar images taken at the same location can be easily found on the Internet. We first download thousands of images from the Internet using a text label provided by the user. Next, we apply two-step filtering to reduce them to a small set of candidate images for use as source images for completion. For each candidate image, a co-matching algorithm is used to find correspondences of both points and lines between the candidate image and the input image. These are used to find an optimal warp relating the two images. A completion result is obtained by blending the warped candidate image into the missing region of the input image. The completion results are ranked according to combination score, which considers both warping and blending energy, and the highest ranked ones are shown to the user. Experiments and results demonstrate that our method can faithfully complete images

    Use of Consumer-grade Depth Cameras in Mobile Robot Navigation

    Get PDF
    Simultaneous Localization And Mapping (SLAM) stands as one of the core techniques used by robots for autonomous navigation. Cameras combining Red-Green-Blue (RGB) color information and depth (D) information are called RGB-D cameras or depth cam- eras. RGB-D cameras can provide rich information for indoor mobile robot navigation. Microsoft’s Kinect device, a representative low cost RGB-D camera product, has attracted tremendous attention from researchers in recent years, for its relatively high quality of depth measurement. By analyzing the multi-data stream of both color and depth, better 3D plane detectors, local shape registration techniques can be designed to improve the quality of mobile robot navigation. In the first part of this work, models of the Kinect’s cameras and projector are es- tablished, which can be applied for calibration and characterization of the Kinect device. Experiments show both variable depth resolution and Kinect’s own optical noises in depth values calculation. Based on Kinect’s models and characterization, this project implements an optimized 3D matching system for SLAM, from processing of RGB-D data to further algorithms design. The developed system includes the following parts: (1) raw data pre- processing and de-noising, improving the quality of integrated environment depth maps. (2) 3D planes surfaces detection and fitting with RANSAC algorithms; also providing ap- plications and illustrative examples about multi-scale-multi-planes detections algorithms which designed for common indoor environment. The proposed approach is validated on scene and object reconstruction. RGB-D features matching under uncertainty and noise in a large scale of data, forms the basis of future application in mobile robot naviga- tion. Experimental results have shown that system performance improvement is valid and feasible
    • …