3,185 research outputs found

    Human-Centric Machine Vision

    Get PDF
    Recently, the algorithms for the processing of the visual information have greatly evolved, providing efficient and effective solutions to cope with the variability and the complexity of real-world environments. These achievements yield to the development of Machine Vision systems that overcome the typical industrial applications, where the environments are controlled and the tasks are very specific, towards the use of innovative solutions to face with everyday needs of people. The Human-Centric Machine Vision can help to solve the problems raised by the needs of our society, e.g. security and safety, health care, medical imaging, and human machine interface. In such applications it is necessary to handle changing, unpredictable and complex situations, and to take care of the presence of humans

    Latent Constrained Correlation Filter

    Get PDF
    Correlation filters are special classifiers designed for shift-invariant object recognition, which are robust to pattern distortions. The recent literature shows that combining a set of sub-filters trained based on a single or a small group of images obtains the best performance. The idea is equivalent to estimating variable distribution based on the data sampling (bagging), which can be interpreted as finding solutions (variable distribution approximation) directly from sampled data space. However, this methodology fails to account for the variations existed in the data. In this paper, we introduce an intermediate step—solution sampling—after the data sampling step to form a subspace, in which an optimal solution can be estimated. More specifically, we propose a new method, named latent constrained correlation filters (LCCF), by mapping the correlation filters to a given latent subspace, and develop a new learning framework in the latent subspace that embeds distribution-related constraints into the original problem. To solve the optimization problem, we introduce a subspace-based alternating direction method of multipliers, which is proven to converge at the saddle point. Our approach is successfully applied to three different tasks, including eye localization, car detection, and object tracking. Extensive experiments demonstrate that LCCF outperforms the state-of-the-art methods.1 1 The source code will be publicly available. https://github.com/bczhangbczhang

    Use of Coherent Point Drift in computer vision applications

    Get PDF
    This thesis presents the novel use of Coherent Point Drift in improving the robustness of a number of computer vision applications. CPD approach includes two methods for registering two images - rigid and non-rigid point set approaches which are based on the transformation model used. The key characteristic of a rigid transformation is that the distance between points is preserved, which means it can be used in the presence of translation, rotation, and scaling. Non-rigid transformations - or affine transforms - provide the opportunity of registering under non-uniform scaling and skew. The idea is to move one point set coherently to align with the second point set. The CPD method finds both the non-rigid transformation and the correspondence distance between two point sets at the same time without having to use a-priori declaration of the transformation model used. The first part of this thesis is focused on speaker identification in video conferencing. A real-time, audio-coupled video based approach is presented, which focuses more on the video analysis side, rather than the audio analysis that is known to be prone to errors. CPD is effectively utilised for lip movement detection and a temporal face detection approach is used to minimise false positives if face detection algorithm fails to perform. The second part of the thesis is focused on multi-exposure and multi-focus image fusion with compensation for camera shake. Scale Invariant Feature Transforms (SIFT) are first used to detect keypoints in images being fused. Subsequently this point set is reduced to remove outliers, using RANSAC (RANdom Sample Consensus) and finally the point sets are registered using CPD with non-rigid transformations. The registered images are then fused with a Contourlet based image fusion algorithm that makes use of a novel alpha blending and filtering technique to minimise artefacts. The thesis evaluates the performance of the algorithm in comparison to a number of state-of-the-art approaches, including the key commercial products available in the market at present, showing significantly improved subjective quality in the fused images. The final part of the thesis presents a novel approach to Vehicle Make & Model Recognition in CCTV video footage. CPD is used to effectively remove skew of vehicles detected as CCTV cameras are not specifically configured for the VMMR task and may capture vehicles at different approaching angles. A LESH (Local Energy Shape Histogram) feature based approach is used for vehicle make and model recognition with the novelty that temporal processing is used to improve reliability. A number of further algorithms are used to maximise the reliability of the final outcome. Experimental results are provided to prove that the proposed system demonstrates an accuracy in excess of 95% when tested on real CCTV footage with no prior camera calibration

    3D Shape Descriptor-Based Facial Landmark Detection: A Machine Learning Approach

    Get PDF
    Facial landmark detection on 3D human faces has had numerous applications in the literature such as establishing point-to-point correspondence between 3D face models which is itself a key step for a wide range of applications like 3D face detection and authentication, matching, reconstruction, and retrieval, to name a few. Two groups of approaches, namely knowledge-driven and data-driven approaches, have been employed for facial landmarking in the literature. Knowledge-driven techniques are the traditional approaches that have been widely used to locate landmarks on human faces. In these approaches, a user with sucient knowledge and experience usually denes features to be extracted as the landmarks. Data-driven techniques, on the other hand, take advantage of machine learning algorithms to detect prominent features on 3D face models. Besides the key advantages, each category of these techniques has limitations that prevent it from generating the most reliable results. In this work we propose to combine the strengths of the two approaches to detect facial landmarks in a more ecient and precise way. The suggested approach consists of two phases. First, some salient features of the faces are extracted using expert systems. Afterwards, these points are used as the initial control points in the well-known Thin Plate Spline (TPS) technique to deform the input face towards a reference face model. Second, by exploring and utilizing multiple machine learning algorithms another group of landmarks are extracted. The data-driven landmark detection step is performed in a supervised manner providing an information-rich set of training data in which a set of local descriptors are computed and used to train the algorithm. We then, use the detected landmarks for establishing point-to-point correspondence between the 3D human faces mainly using an improved version of Iterative Closest Point (ICP) algorithms. Furthermore, we propose to use the detected landmarks for 3D face matching applications

    A Review: Person Identification using Retinal Fundus Images

    Get PDF
    In this paper a review on biometric person identification has been discussed using features from retinal fundus image. Retina recognition is claimed to be the best person identification method among the biometric recognition systems as the retina is practically impossible to forge. It is found to be most stable, reliable and most secure among all other biometric systems. Retina inherits the property of uniqueness and stability. The features used in the recognition process are either blood vessel features or non-blood vessel features. But the vascular pattern is the most prominent feature utilized by most of the researchers for retina based person identification. Processes involved in this authentication system include pre-processing, feature extraction and feature matching. Bifurcation and crossover points are widely used features among the blood vessel features. Non-blood vessel features include luminance, contrast, and corner points etc. This paper summarizes and compares the different retina based authentication system. Researchers have used publicly available databases such as DRIVE, STARE, VARIA, RIDB, ARIA, AFIO, DRIDB, and SiMES for testing their methods. Various quantitative measures such as accuracy, recognition rate, false rejection rate, false acceptance rate, and equal error rate are used to evaluate the performance of different algorithms. DRIVE database provides 100\% recognition for most of the methods. Rest of the database the accuracy of recognition is more than 90\%

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
    • …
    corecore