31 research outputs found

    Learning Convolutional Neural Network For Face Verification

    Get PDF
    Convolutional neural networks (ConvNet) have improved the state of the art in many applications. Face recognition tasks, for example, have seen a significantly improved performance due to ConvNets. However, less attention has been given to video-based face recognition. Here, we make three contributions along these lines. First, we proposed a ConvNet-based system for long-term face tracking from videos. Through taking advantage of pre-trained deep learning models on big data, we developed a novel system for accurate video face tracking in the unconstrained environments depicting various people and objects moving in and out of the frame. In the proposed system, we presented a Detection-Verification-Tracking method (DVT) which accomplishes the long-term face tracking task through the collaboration of face detection, face verification, and (short-term) face tracking. An online trained detector based on cascaded convolutional neural networks localizes all faces appeared in the frames, and an online trained face verifier based on deep convolutional neural networks and similarity metric learning decides if any face or which face corresponds to the query person. An online trained tracker follows the face from frame to frame. When validated on a sitcom episode and a TV show, the DVT method outperforms tracking-learning-detection (TLD) and face-TLD in terms of recall and precision. The proposed system is tested on many other types of videos and shows very promising results. Secondly, as the availability of large-scale training dataset has a significant effect on the performance of ConvNet-based recognition methods, we presented a successful automatic video collection approach to generate a large-scale video training dataset. We designed a procedure for generating a face verification dataset from videos based on the long-term face tracking algorithm, DVT. In this procedure, the streams can be collected from videos, and labeled automatically without human annotation intervention. Using this procedure, we assembled a widely scalable dataset, FaceSequence. FaceSequence includes 1.5M streams capturing ~500K individuals. A key distinction between this dataset and the existing video datasets is that FaceSequence is generated from publicly available videos and labeled automatically, hence widely scalable at no annotation cost. Lastly, we introduced a stream-based ConvNet architecture for video face verification task. The proposed network is designed to optimize the differentiable error function, referred to as stream loss, using unlabeled temporal face sequences. Using the unlabeled video dataset, FaceSequence, we trained our network to minimize the stream loss. The network achieves verification accuracy comparable to the state of the art on the LFW and YTF datasets with much smaller model complexity. In comparison to VGG, our method demonstrates a significant improvement in TAR/FAR, considering the fact that the VGG dataset is highly puried and includes a small label noise. We also fine-tuned the network using the IJB-A dataset. The validation results show competitive verifiation accuracy compared with the best previous video face verification results

    Recent Advances in Deep Learning Techniques for Face Recognition

    Full text link
    In recent years, researchers have proposed many deep learning (DL) methods for various tasks, and particularly face recognition (FR) made an enormous leap using these techniques. Deep FR systems benefit from the hierarchical architecture of the DL methods to learn discriminative face representation. Therefore, DL techniques significantly improve state-of-the-art performance on FR systems and encourage diverse and efficient real-world applications. In this paper, we present a comprehensive analysis of various FR systems that leverage the different types of DL techniques, and for the study, we summarize 168 recent contributions from this area. We discuss the papers related to different algorithms, architectures, loss functions, activation functions, datasets, challenges, improvement ideas, current and future trends of DL-based FR systems. We provide a detailed discussion of various DL methods to understand the current state-of-the-art, and then we discuss various activation and loss functions for the methods. Additionally, we summarize different datasets used widely for FR tasks and discuss challenges related to illumination, expression, pose variations, and occlusion. Finally, we discuss improvement ideas, current and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp. 99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613

    An end-to-end review of gaze estimation and its interactive applications on handheld mobile devices

    Get PDF
    In recent years we have witnessed an increasing number of interactive systems on handheld mobile devices which utilise gaze as a single or complementary interaction modality. This trend is driven by the enhanced computational power of these devices, higher resolution and capacity of their cameras, and improved gaze estimation accuracy obtained from advanced machine learning techniques, especially in deep learning. As the literature is fast progressing, there is a pressing need to review the state of the art, delineate the boundary, and identify the key research challenges and opportunities in gaze estimation and interaction. This paper aims to serve this purpose by presenting an end-to-end holistic view in this area, from gaze capturing sensors, to gaze estimation workflows, to deep learning techniques, and to gaze interactive applications.PostprintPeer reviewe

    Visual Concept Detection in Images and Videos

    Get PDF
    The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research

    MATLAB

    Get PDF
    A well-known statement says that the PID controller is the "bread and butter" of the control engineer. This is indeed true, from a scientific standpoint. However, nowadays, in the era of computer science, when the paper and pencil have been replaced by the keyboard and the display of computers, one may equally say that MATLAB is the "bread" in the above statement. MATLAB has became a de facto tool for the modern system engineer. This book is written for both engineering students, as well as for practicing engineers. The wide range of applications in which MATLAB is the working framework, shows that it is a powerful, comprehensive and easy-to-use environment for performing technical computations. The book includes various excellent applications in which MATLAB is employed: from pure algebraic computations to data acquisition in real-life experiments, from control strategies to image processing algorithms, from graphical user interface design for educational purposes to Simulink embedded systems

    Intelligent Circuits and Systems

    Get PDF
    ICICS-2020 is the third conference initiated by the School of Electronics and Electrical Engineering at Lovely Professional University that explored recent innovations of researchers working for the development of smart and green technologies in the fields of Energy, Electronics, Communications, Computers, and Control. ICICS provides innovators to identify new opportunities for the social and economic benefits of society.  This conference bridges the gap between academics and R&D institutions, social visionaries, and experts from all strata of society to present their ongoing research activities and foster research relations between them. It provides opportunities for the exchange of new ideas, applications, and experiences in the field of smart technologies and finding global partners for future collaboration. The ICICS-2020 was conducted in two broad categories, Intelligent Circuits & Intelligent Systems and Emerging Technologies in Electrical Engineering

    Scientific Advances in STEM: From Professor to Students

    Get PDF
    This book collects the publications of the special Topic Scientific advances in STEM: from Professor to students. The aim is to contribute to the advancement of the Science and Engineering fields and their impact on the industrial sector, which requires a multidisciplinary approach. University generates and transmits knowledge to serve society. Social demands continuously evolve, mainly because of cultural, scientific, and technological development. Researchers must contextualize the subjects they investigate to their application to the local industry and community organizations, frequently using a multidisciplinary point of view, to enhance the progress in a wide variety of fields (aeronautics, automotive, biomedical, electrical and renewable energy, communications, environmental, electronic components, etc.). Most investigations in the fields of science and engineering require the work of multidisciplinary teams, representing a stockpile of research projects in different stages (final year projects, master’s or doctoral studies). In this context, this Topic offers a framework for integrating interdisciplinary research, drawing together experimental and theoretical contributions in a wide variety of fields

    Applications of Power Electronics:Volume 2

    Get PDF
    corecore