147,378 research outputs found

    Precise eye localization using HOG descriptors

    Full text link
    In this paper, we present a novel algorithm for precise eye detection. First, a couple of AdaBoost classifiers trained with Haar-like features are used to preselect possible eye locations. Then, a Support Vector Machine machine that uses Histograms of Oriented Gradients descriptors is used to obtain the best pair of eyes among all possible combinations of preselected eyes. Finally, we compare the eye detection results with three state-of-the-art works and a commercial software. The results show that our algorithm achieves the highest accuracy on the FERET and FRGCv1 databases, which is the most complete comparative presented so far. © Springer-Verlag 2010.This work has been partially supported by the grant TEC2009-09146 of the Spanish Government.Monzó Ferrer, D.; Albiol Colomer, A.; Sastre, J.; Albiol Colomer, AJ. (2011). Precise eye localization using HOG descriptors. Machine Vision and Applications. 22(3):471-480. https://doi.org/10.1007/s00138-010-0273-0S471480223Riopka, T., Boult, T.: The eyes have it. In: Proceedings of ACM SIGMM Multimedia Biometrics Methods and Applications Workshop, Berkeley, CA, pp. 9–16 (2003)Kim C., Choi C.: Image covariance-based subspace method for face recognition. Pattern Recognit. 40(5), 1592–1604 (2007)Wang, P., Green, M., Ji, Q., Wayman, J.: Automatic eye detection and its validation. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 3, San Diego, CA, pp. 164–171 (2005)Amir A., Zimet L., Sangiovanni-Vincentelli A., Kao S.: An embedded system for an eye-detection sensor. Comput. Vis. Image Underst. 98(1), 104–123 (2005)Zhu Z., Ji Q.: Robust real-time eye detection and tracking under variable lighting conditions and various face orientations. Comput. Vis. Image Underst. 98(1), 124–154 (2005)Huang, W., Mariani, R.: Face detection and precise eyes location. In: Proceedings of the International Conference on Pattern Recognition, vol. 4, Washington, DC, USA, pp. 722–727 (2000)Brunelli R., Poggio T.: Face recognition: features versus templates. IEEE Trans. Pattern Anal. Mach. Intell. 15(10), 1042–1052 (1993)Guan, Y.: Robust eye detection from facial image based on multi-cue facial information. In: Proceedings of IEEE International Conference on Control and Automation, pp. 1775–1778 (2007)Rizon, M., Kawaguchi, T.: Automatic eye detection using intensity and edge information. In: Proceedings of TENCON, vol. 2, Kuala Lumpur, Malaysia, pp. 415–420 (2000)Han, C., Liao, H., Yu, K., Chen, L.: Fast face detection via morphology-based pre-processing. In: Proceedings of the 9th International Conference on Image Analysis and Processing, vol. 2. Springer, London, UK, pp. 469–476 (1997)Song J., Chi Z., Liu J.: A robust eye detection method using combined binary edge and intensity information. Pattern Recognit. 39(6), 1110–1125 (2006)Campadelli, P., Lanzarotti, R., Lipori, G.: Precise eye localization through a general-to-specific model definition. In: Proceedings of the British Machine Vision Conference, Edinburgh, Scotland, pp. 187–196 (2006)Smeraldi F., Carmona O., Bign J.: Saccadic search with gabor features applied to eye detection and real-time head tracking. Image Vis. Comput. 18(4), 323–329 (1998)Sirohey S. A., Rosenfeld A.: Eye detection in a face image using linear and nonlinear filters. Pattern Recognit. 34(7), 1367–1391 (2001)Ma, Y., Ding, X., Wang, Z., Wang, N.: Robust precise eye location under probabilistic framework. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, pp. 339–344 (2004)Lu, H., Zhang, W., Yang D.: Eye detection based on rectangle features and pixel-pattern-based texture features. In: Proceedings of the International Symposium on Intelligent Signal Processing and Communication Systems, pp. 746–749 (2007)Jin, L., Yuan, X., Satoh, S., Li, J., Xia, L.: A hybrid classifier for precise and robust eye detection. In: Proceedings of the International Conference on Pattern Recognition, vol. 4, Hong Kong, pp. 731–735 (2006)Vapnik V. N.: The Nature of Statistical Learning Theory. Springer, New York Inc, New York, NY (1995)Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, Hawaii, pp. 511–518 (2001)Fasel I., Fortenberry B., Movellan J.: A generative framework for real time object detection and classification. Comput. Vis. Image Underst. 98(1), 182–210 (2005)Huang J., Wechsler H.: Visual routines for eye location using learning and evolution. IEEE Trans. Evolut. Comput. 4(1), 73–82 (2000)Behnke S.: Face localization and tracking in the neural abstraction pyramid. Neural Comput. Appl. 14(2), 97–103 (2005)Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the 9th European Conference on Computer Vision, vol. 2, San Diego, CA, pp. 886–893 (2005)Albiol A., Monzo D., Martin A., Sastre J., Albiol A.: Face recognition using hog-ebgm. Pattern Recognit. Lett. 29(10), 1537–1543 (2008)Lowe D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)Bicego, M., Lagorio, A., Grosso, E., Tistarelli M.: On the use of SIFT features for face authentication. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition Workshop, New York, p. 35 (2006)Yang M.-H., Kriegman D., Ahuja N.: Detecting faces in images: a survey. Trans. Pattern Anal. Mach. Intell. 24(1), 34–58 (2002)Jain A., Murty M., Flynn P.: Data clustering: a review. ACM Comput. Syst. 31(3), 264–323 (1999)Mikolajczyk K., Schmid C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)Humanscan, BioID database. http://www.bioid.comPeer, P.: CVL Face database, University of Ljubjana. http://www.fri.uni-lj.si/enPhillips P. J., Moon H., Rizvi S. A., Rauss P. J.: The feret evaluation methodology for face-recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 22(10), 1090–1104 (2000)Phillips, P.J., Flynn, P.J., Scruggs, T., Bowyer, K.W., Jin, C., Hoffman, K., Marques, J., Jaesik, M., Worek, W.: Overview of the face recognition grand challenge. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, San Diego, CA, pp. 947–954 (2005)Jesorsky, O., Kirchberg, K.J., Frischholz, R.: Robust face detection using the hausdorff distance. In: Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication, Springer, London, UK, pp. 90–95 (2001)Neurotechnologija, Biometrical and Artificial Intelligence Technologies, Verilook SDK. http://www.neurotechnologija.comWitten I., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn: Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2005)Turk M., Pentland A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991

    Camera distortion self-calibration using the plumb-line constraint and minimal Hough entropy

    Full text link
    In this paper we present a simple and robust method for self-correction of camera distortion using single images of scenes which contain straight lines. Since the most common distortion can be modelled as radial distortion, we illustrate the method using the Harris radial distortion model, but the method is applicable to any distortion model. The method is based on transforming the edgels of the distorted image to a 1-D angular Hough space, and optimizing the distortion correction parameters which minimize the entropy of the corresponding normalized histogram. Properly corrected imagery will have fewer curved lines, and therefore less spread in Hough space. Since the method does not rely on any image structure beyond the existence of edgels sharing some common orientations and does not use edge fitting, it is applicable to a wide variety of image types. For instance, it can be applied equally well to images of texture with weak but dominant orientations, or images with strong vanishing points. Finally, the method is performed on both synthetic and real data revealing that it is particularly robust to noise.Comment: 9 pages, 5 figures Corrected errors in equation 1

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Computer Analysis of Architecture Using Automatic Image Understanding

    Full text link
    In the past few years, computer vision and pattern recognition systems have been becoming increasingly more powerful, expanding the range of automatic tasks enabled by machine vision. Here we show that computer analysis of building images can perform quantitative analysis of architecture, and quantify similarities between city architectural styles in a quantitative fashion. Images of buildings from 18 cities and three countries were acquired using Google StreetView, and were used to train a machine vision system to automatically identify the location of the imaged building based on the image visual content. Experimental results show that the automatic computer analysis can automatically identify the geographical location of the StreetView image. More importantly, the algorithm was able to group the cities and countries and provide a phylogeny of the similarities between architectural styles as captured by StreetView images. These results demonstrate that computer vision and pattern recognition algorithms can perform the complex cognitive task of analyzing images of buildings, and can be used to measure and quantify visual similarities and differences between different styles of architectures. This experiment provides a new paradigm for studying architecture, based on a quantitative approach that can enhance the traditional manual observation and analysis. The source code used for the analysis is open and publicly available

    Improved depth recovery in consumer depth cameras via disparity space fusion within cross-spectral stereo.

    Get PDF
    We address the issue of improving depth coverage in consumer depth cameras based on the combined use of cross-spectral stereo and near infra-red structured light sensing. Specifically we show that fusion of disparity over these modalities, within the disparity space image, prior to disparity optimization facilitates the recovery of scene depth information in regions where structured light sensing fails. We show that this joint approach, leveraging disparity information from both structured light and cross-spectral sensing, facilitates the joint recovery of global scene depth comprising both texture-less object depth, where conventional stereo otherwise fails, and highly reflective object depth, where structured light (and similar) active sensing commonly fails. The proposed solution is illustrated using dense gradient feature matching and shown to outperform prior approaches that use late-stage fused cross-spectral stereo depth as a facet of improved sensing for consumer depth cameras

    A robust and efficient video representation for action recognition

    Get PDF
    This paper introduces a state-of-the-art video representation and applies it to efficient action recognition and detection. We first propose to improve the popular dense trajectory features by explicit camera motion estimation. More specifically, we extract feature point matches between frames using SURF descriptors and dense optical flow. The matches are used to estimate a homography with RANSAC. To improve the robustness of homography estimation, a human detector is employed to remove outlier matches from the human body as human motion is not constrained by the camera. Trajectories consistent with the homography are considered as due to camera motion, and thus removed. We also use the homography to cancel out camera motion from the optical flow. This results in significant improvement on motion-based HOF and MBH descriptors. We further explore the recent Fisher vector as an alternative feature encoding approach to the standard bag-of-words histogram, and consider different ways to include spatial layout information in these encodings. We present a large and varied set of evaluations, considering (i) classification of short basic actions on six datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that our improved trajectory features significantly outperform previous dense trajectories, and that Fisher vectors are superior to bag-of-words encodings for video recognition tasks. In all three tasks, we show substantial improvements over the state-of-the-art results

    Object Level Deep Feature Pooling for Compact Image Representation

    Full text link
    Convolutional Neural Network (CNN) features have been successfully employed in recent works as an image descriptor for various vision tasks. But the inability of the deep CNN features to exhibit invariance to geometric transformations and object compositions poses a great challenge for image search. In this work, we demonstrate the effectiveness of the objectness prior over the deep CNN features of image regions for obtaining an invariant image representation. The proposed approach represents the image as a vector of pooled CNN features describing the underlying objects. This representation provides robustness to spatial layout of the objects in the scene and achieves invariance to general geometric transformations, such as translation, rotation and scaling. The proposed approach also leads to a compact representation of the scene, making each image occupy a smaller memory footprint. Experiments show that the proposed representation achieves state of the art retrieval results on a set of challenging benchmark image datasets, while maintaining a compact representation.Comment: Deep Vision 201

    A new framework for sign language recognition based on 3D handshape identification and linguistic modeling

    Full text link
    Current approaches to sign recognition by computer generally have at least some of the following limitations: they rely on laboratory conditions for sign production, are limited to a small vocabulary, rely on 2D modeling (and therefore cannot deal with occlusions and off-plane rotations), and/or achieve limited success. Here we propose a new framework that (1) provides a new tracking method less dependent than others on laboratory conditions and able to deal with variations in background and skin regions (such as the face, forearms, or other hands); (2) allows for identification of 3D hand configurations that are linguistically important in American Sign Language (ASL); and (3) incorporates statistical information reflecting linguistic constraints in sign production. For purposes of large-scale computer-based sign language recognition from video, the ability to distinguish hand configurations accurately is critical. Our current method estimates the 3D hand configuration to distinguish among 77 hand configurations linguistically relevant for ASL. Constraining the problem in this way makes recognition of 3D hand configuration more tractable and provides the information specifically needed for sign recognition. Further improvements are obtained by incorporation of statistical information about linguistic dependencies among handshapes within a sign derived from an annotated corpus of almost 10,000 sign tokens
    corecore