197,130 research outputs found

    Valvekaameratel pÔhineva inimseire tÀiustamine pildi resolutsiooni parandamise ning nÀotuvastuse abil

    Get PDF
    Due to importance of security in the society, monitoring activities and recognizing specific people through surveillance video camera is playing an important role. One of the main issues in such activity rises from the fact that cameras do not meet the resolution requirement for many face recognition algorithms. In order to solve this issue, in this work we are proposing a new system which super resolve the image. First, we are using sparse representation with the specific dictionary involving many natural and facial images to super resolve images. As a second method, we are using deep learning convulutional network. Image super resolution is followed by Hidden Markov Model and Singular Value Decomposition based face recognition. The proposed system has been tested on many well-known face databases such as FERET, HeadPose, and Essex University databases as well as our recently introduced iCV Face Recognition database (iCV-F). The experimental results shows that the recognition rate is increasing considerably after applying the super resolution by using facial and natural image dictionary. In addition, we are also proposing a system for analysing people movement on surveillance video. People including faces are detected by using Histogram of Oriented Gradient features and Viola-jones algorithm. Multi-target tracking system with discrete-continuouos energy minimization tracking system is then used to track people. The tracking data is then in turn used to get information about visited and passed locations and face recognition results for tracked people

    Learning Convolutional Neural Network For Face Verification

    Get PDF
    Convolutional neural networks (ConvNet) have improved the state of the art in many applications. Face recognition tasks, for example, have seen a significantly improved performance due to ConvNets. However, less attention has been given to video-based face recognition. Here, we make three contributions along these lines. First, we proposed a ConvNet-based system for long-term face tracking from videos. Through taking advantage of pre-trained deep learning models on big data, we developed a novel system for accurate video face tracking in the unconstrained environments depicting various people and objects moving in and out of the frame. In the proposed system, we presented a Detection-Verification-Tracking method (DVT) which accomplishes the long-term face tracking task through the collaboration of face detection, face verification, and (short-term) face tracking. An online trained detector based on cascaded convolutional neural networks localizes all faces appeared in the frames, and an online trained face verifier based on deep convolutional neural networks and similarity metric learning decides if any face or which face corresponds to the query person. An online trained tracker follows the face from frame to frame. When validated on a sitcom episode and a TV show, the DVT method outperforms tracking-learning-detection (TLD) and face-TLD in terms of recall and precision. The proposed system is tested on many other types of videos and shows very promising results. Secondly, as the availability of large-scale training dataset has a significant effect on the performance of ConvNet-based recognition methods, we presented a successful automatic video collection approach to generate a large-scale video training dataset. We designed a procedure for generating a face verification dataset from videos based on the long-term face tracking algorithm, DVT. In this procedure, the streams can be collected from videos, and labeled automatically without human annotation intervention. Using this procedure, we assembled a widely scalable dataset, FaceSequence. FaceSequence includes 1.5M streams capturing ~500K individuals. A key distinction between this dataset and the existing video datasets is that FaceSequence is generated from publicly available videos and labeled automatically, hence widely scalable at no annotation cost. Lastly, we introduced a stream-based ConvNet architecture for video face verification task. The proposed network is designed to optimize the differentiable error function, referred to as stream loss, using unlabeled temporal face sequences. Using the unlabeled video dataset, FaceSequence, we trained our network to minimize the stream loss. The network achieves verification accuracy comparable to the state of the art on the LFW and YTF datasets with much smaller model complexity. In comparison to VGG, our method demonstrates a significant improvement in TAR/FAR, considering the fact that the VGG dataset is highly puried and includes a small label noise. We also fine-tuned the network using the IJB-A dataset. The validation results show competitive verifiation accuracy compared with the best previous video face verification results

    Efficient online subspace learning with an indefinite kernel for visual tracking and recognition

    Get PDF
    We propose an exact framework for online learning with a family of indefinite (not positive) kernels. As we study the case of nonpositive kernels, we first show how to extend kernel principal component analysis (KPCA) from a reproducing kernel Hilbert space to Krein space. We then formulate an incremental KPCA in Krein space that does not require the calculation of preimages and therefore is both efficient and exact. Our approach has been motivated by the application of visual tracking for which we wish to employ a robust gradient-based kernel. We use the proposed nonlinear appearance model learned online via KPCA in Krein space for visual tracking in many popular and difficult tracking scenarios. We also show applications of our kernel framework for the problem of face recognition

    Computing driver tiredness and fatigue in automobile via eye tracking and body movements

    Get PDF
    The aim of this paper is to classify the driver tiredness and fatigue in automobile via eye tracking and body movements using deep learning based Convolutional Neural Network (CNN) algorithm. Vehicle driver face localization serves as one of the most widely used real-world applications in fields like toll control, traffic accident scene analysis, and suspected vehicle tracking. The research proposed a CNN classifier for simultaneously localizing the region of human face and eye positioning. The classifier, rather than bounding rectangles, gives bounding quadrilaterals, which gives a more precise indication for vehicle driver face localization. The adjusted regions are preprocessed to remove noise and passed to the CNN classifier for real time processing. The preprocessing of the face features extracts connected components, filters them by size, and groups them into face expressions. The employed CNN is the well-known technology for human face recognition. One we aim to extract the facial landmarks from the frames, we will then leverage classification models and deep learning based convolutional neural networks that predict the state of the driver as 'Alert' or 'Drowsy' for each of the frames extracted. The CNN model could predict the output state labels (Alert/Drowsy) for each frame, but we wanted to take care of sequential image frames as that is extremely important while predicting the state of an individual. The process completes, if all regions have a sufficiently high score or a fixed number of retries are exhausted. The output consists of the detected human face type, the list of regions including the extracted mouth and eyes with recognition reliability through CNN with an accuracy of 98.57% with 100 epochs of training and testing

    Unconstrained Face Recognition

    Get PDF
    Although face recognition has been actively studied over the past decade, the state-of-the-art recognition systems yield satisfactory performance only under controlled scenarios and recognition accuracy degrades significantly when confronted with unconstrained situations due to variations such as illumintion, pose, etc. In this dissertation, we propose novel approaches that are able to recognize human faces under unconstrained situations. Part I presents algorithms for face recognition under illumination/pose variations. For face recognition across illuminations, we present a generalized photometric stereo approach by modeling all face appearances belonging to all humans under all lighting conditions. Using a linear generalization, we achieve a factorization of the observation matrix consisting of face appearances of different individuals, each under a different illumination. We resolve ambiguities in factorization using surface integrability and symmetry constraints. In addition, an illumination-invariant identity descriptor is provided to perform face recognition across illuminations. We further extend the generalized photometric stereo approach to an illuminating light field approach, which is able to recognize faces under pose and illumination variations. Face appearance lies in a high-dimensional nonlinear manifold. In Part II, we introduce machine learning approaches based on reproducing kernel Hilbert space (RKHS) to capture higher-order statistical characteristics of the nonlinear appearance manifold. In particular, we analyze principal components of the RKHS in a probabilistic manner and compute distances such as the Chernoff distance, the Kullback-Leibler divergence between two Gaussian densities in RKHS. Part III is on face tracking and recognition from video. We first present an enhanced tracking algorithm that models online appearance changes in a video sequence using a mixture model and produces good tracking results in various challenging scenarios. For video-based face recognition, while conventional approaches treat tracking and recognition separately, we present a simultaneous tracking-and-recognition approach. This simultaneous approach solved using the sequential importance sampling algorithm improves accuracy in both tracking and recognition. Finally, we propose a unifying framework called probabilistic identity characterization able to perform face recognition under registration/illumination/pose variation and from a still image, a group of still images, or a video sequence

    A Face Recognition Method Using Deep Learning To Identify Mask And Unmask Objects

    Get PDF
    At the present, the use of face masks is growing day by day and it is mandated in most places across the world. People are encouraged to cover their faces when in public areas to avoid the spread of infection which can minimize the transmission of Covid-19 by 65 percent (according to the public health officials). So, it is important to detect people not wearing face masks. Additionally, face recognition has been applied to a wide area for security verification purposes since its performance, accuracy, and reliability [15] are better than any other traditional techniques like fingerprints, passwords, PINs, and so on. In recent years, facial recognition is becoming a challenging task because of various occlusions or masks like the existence of sunglasses, scarves, hats, and the use of make-up or disguise ingredients. So, the face recognition accuracy rate is affected by these types of masks. Moreover, the use of face masks has made conventional facial recognition technology ineffective in many scenarios, such as face authentication, security check, tracking school, and unlocking phones and laptops. As a result, we proposed a solution, Masked Facial Recognition (MFR) which can identify masked and unmasked people so individuals wearing a face mask do not need to take it out to authenticate themselves. We used the Deep Learning model, Inception ResNet V1 to train our model. The CASIA dataset [17] is applied for training images and the LFW (Labeled Faces in the Wild) dataset [18] with artificial marked faces are used for model evaluation purposes. The training and testing masked datasets are created using a Computer Vision-based approach (Dlib). We received an accuracy of around 96 percent for our three different trained models. As a result, the purposed work could be utilized effortlessly for both masked and unmasked face recognition and detection systems that are designed for safety and security verification purposes without any challenges

    A framework for sign language recognition using support vector machines and active learning for skin segmentation and boosted temporal sub-units

    Get PDF
    This dissertation describes new techniques that can be used in a sign language recognition (SLR) system, and more generally in human gesture systems. Any SLR system consists of three main components: Skin detector, Tracker, and Recognizer. The skin detector is responsible for segmenting skin objects like the face and hands from video frames. The tracker keeps track of the hand location (more specifically the bounding box) and detects any occlusions that might happen between any skin objects. Finally, the recognizer tries to classify the performed sign into one of the sign classes in our vocabulary using the set of features and information provided by the tracker. In this work, we propose a new technique for skin segmentation using SVM (support vector machine) active learning combined with region segmentation information. Having segmented the face and hands, we need to track them across the frames. So, we have developed a unified framework for segmenting and tracking skin objects and detecting occlusions, where both components of segmentation and tracking help each other. Good tracking helps to reduce the search space for skin objects, and accurate segmentation increases the overall tracker accuracy. Instead of dealing with the whole sign for recognition, the sign can be broken down into elementary subunits, which are far less in number than the total number of signs in the vocabulary. This motivated us to propose a novel algorithm to model and segment these subunits, then try to learn the informative combinations of subunits/features using a boosting framework. Our results reached above 90% recognition rate using very few training samples

    Machine learning paradigms for modeling spatial and temporal information in multimedia data mining

    Get PDF
    Multimedia data mining and knowledge discovery is a fast emerging interdisciplinary applied research area. There is tremendous potential for effective use of multimedia data mining (MDM) through intelligent analysis. Diverse application areas are increasingly relying on multimedia under-standing systems. Advances in multimedia understanding are related directly to advances in signal processing, computer vision, machine learning, pattern recognition, multimedia databases, and smart sensors. The main mission of this special issue is to identify state-of-the-art machine learning paradigms that are particularly powerful and effective for modeling and combining temporal and spatial media cues such as audio, visual, and face information and for accomplishing tasks of multimedia data mining and knowledge discovery. These models should be able to bridge the gap between low-level audiovisual features which require signal processing and high-level semantics. A number of papers have been submitted to the special issue in the areas of imaging, artificial intelligence; and pattern recognition and five contributions have been selected covering state-of-the-art algorithms and advanced related topics. The first contribution by D. Xiang et al. “Evaluation of data quality and drought monitoring capability of FY-3A MERSI data” describes some basic parameters and major technical indicators of the FY-3A, and evaluates data quality and drought monitoring capability of the Medium-Resolution Imager (MERSI) onboard the FY-3A. The second contribution by A. Belatreche et al. “Computing with biologically inspired neural oscillators: application to color image segmentation” investigates the computing capabilities and potential applications of neural oscillators, a biologically inspired neural model, to gray scale and color image segmentation, an important task in image understanding and object recognition. The major contribution of this paper is the ability to use neural oscillators as a learning scheme for solving real world engineering problems. The third paper by A. Dargazany et al. entitled “Multibandwidth Kernel-based object tracking” explores new methods for object tracking using the mean shift (MS). A bandwidth-handling MS technique is deployed in which the tracker reach the global mode of the density function not requiring a specific staring point. It has been proven via experiments that the Gradual Multibandwidth Mean Shift tracking algorithm can converge faster than the conventional kernel-based object tracking (known as the mean shift). The fourth contribution by S. Alzu’bi et al. entitled “3D medical volume segmentation using hybrid multi-resolution statistical approaches” studies new 3D volume segmentation using multiresolution statistical approaches based on discrete wavelet transform and hidden Markov models. This system commonly reduced the percentage error achieved using the traditional 2D segmentation techniques by several percent. Furthermore, a contribution by G. Cabanes et al. entitled “Unsupervised topographic learning for spatiotemporal data mining” proposes a new unsupervised algorithm, suitable for the analysis of noisy spatiotemporal Radio Frequency Identification (RFID) data. The new unsupervised algorithm depicted in this article is an efficient data mining tool for behavioral studies based on RFID technology. It has the ability to discover and compare stable patterns in a RFID signal, and is appropriate for continuous learning. Finally, we would like to thank all those who helped to make this special issue possible, especially the authors and the reviewers of the articles. Our thanks go to the Hindawi staff and personnel, the journal Manager in bringing about the issue and giving us the opportunity to edit this special issue

    CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search

    Full text link
    The success of deep learning based face recognition systems has given rise to serious privacy concerns due to their ability to enable unauthorized tracking of users in the digital world. Existing methods for enhancing privacy fail to generate naturalistic images that can protect facial privacy without compromising user experience. We propose a novel two-step approach for facial privacy protection that relies on finding adversarial latent codes in the low-dimensional manifold of a pretrained generative model. The first step inverts the given face image into the latent space and finetunes the generative model to achieve an accurate reconstruction of the given image from its latent code. This step produces a good initialization, aiding the generation of high-quality faces that resemble the given identity. Subsequently, user-defined makeup text prompts and identity-preserving regularization are used to guide the search for adversarial codes in the latent space. Extensive experiments demonstrate that faces generated by our approach have stronger black-box transferability with an absolute gain of 12.06% over the state-of-the-art facial privacy protection approach under the face verification task. Finally, we demonstrate the effectiveness of the proposed approach for commercial face recognition systems. Our code is available at https://github.com/fahadshamshad/Clip2Protect.Comment: Accepted in CVPR 2023. Project page: https://fahadshamshad.github.io/Clip2Protect

    Person annotation in video sequences

    Get PDF
    In the recent years, the demand for video tools to automatically annotate and classify large audiovisual datasets has increased considerably. One specific task in this field applies to TV broadcast videos, to determine who and when a person appears in a video sequence. This work starts from the base of the ALBAYZIN evaluation series presented in the IberSPEECH-RTVE 2018 in Barcelona, and the purpose of this thesis is trying to improve the results obtained and compare the different face detection and tracking methods. We will evaluate the performance of classic face detection techniques and other techniques based on machine learning on a closed dataset of 34 known people. The rest of characters on the audiovisual document will be labelled as "unknown". We will work with small videos and images of each known character to build his/her model and finally, evaluate the performance of the ALBAYZIN algorithm over a 2h video called "La noche en 24H" whose format is like a news program. We will analyze the results and the type of errors and scenarios we encountered as well as the solutions we propose for each of them if there is any. In this work, We will only focus on a monomodal basis of face recognition and tracking
    • 

    corecore