1,040 research outputs found

    FaceFilter: Audio-visual speech separation using still images

    Full text link
    The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network. Unlike previous works that used lip movement on video clips or pre-enrolled speaker information as an auxiliary conditional feature, we use a single face image of the target speaker. In this task, the conditional feature is obtained from facial appearance in cross-modal biometric task, where audio and visual identity representations are shared in latent space. Learnt identities from facial images enforce the network to isolate matched speakers and extract the voices from mixed speech. It solves the permutation problem caused by swapped channel outputs, frequently occurred in speech separation tasks. The proposed method is far more practical than video-based speech separation since user profile images are readily available on many platforms. Also, unlike speaker-aware separation methods, it is applicable on separation with unseen speakers who have never been enrolled before. We show strong qualitative and quantitative results on challenging real-world examples.Comment: Under submission as a conference paper. Video examples: https://youtu.be/ku9xoLh62

    A review of content-based video retrieval techniques for person identification

    Get PDF
    The rise of technology spurs the advancement in the surveillance field. Many commercial spaces reduced the patrol guard in favor of Closed-Circuit Television (CCTV) installation and even some countries already used surveillance drone which has greater mobility. In recent years, the CCTV Footage have also been used for crime investigation by law enforcement such as in Boston Bombing 2013 incident. However, this led us into producing huge unmanageable footage collection, the common issue of Big Data era. While there is more information to identify a potential suspect, the massive size of data needed to go over manually is a very laborious task. Therefore, some researchers proposed using Content-Based Video Retrieval (CBVR) method to enable to query a specific feature of an object or a human. Due to the limitations like visibility and quality of video footage, only certain features are selected for recognition based on Chicago Police Department guidelines. This paper presents the comprehensive reviews on CBVR techniques used for clothing, gender and ethnic recognition of the person of interest and how can it be applied in crime investigation. From the findings, the three recognition types can be combined to create a Content-Based Video Retrieval system for person identification

    First impressions: A survey on vision-based apparent personality trait analysis

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft

    Real-time face analysis for gender recognition on video sequences

    Get PDF
    2016 - 2017This research work has been produced with the aim of performing gender recognition in real-time on face images extracted from real video sequences. The task may appear easy for a human, but it is not so simple for a computer vision algorithm. Even on still images, the gender recognition classifiers have to deal with challenging problems mainly due to the possible face variations, in terms of age, ethnicity, pose, scale, occlusions and so on. Additional challenges have to be taken into account when the face analysis is performed on images acquired in real scenarios with traditional surveillance cameras. Indeed, the people are unaware of the presence of the camera and their sudden movements, together with the low quality of the images, further stress the noise on the faces, which are affected by motion blur, different orientations and various scales. Moreover, the need of providing a single classification of a person (and not for each face image) in real-time imposes to design a fast gender recognition algorithm, able to track a person in different frames and to give the information about the gender quickly. The real-time constraint acquires even more relevance considering that one of the goals of this research work is to design an algorithm suitable for an embedded vision architecture. Finally, the task becomes even more challenging since there are not standard benchmarks and protocols for the evaluation of gender recognition algorithms. In this thesis the attention has been firstly concentrated on the analysis of still images, in order to understand which are the most effective features for gender recognition. To this aim, a face alignment algorithm has been applied to the face images so as to normalize the pose and optimize the performance of the subsequent processing steps. Then two methods have been proposed for gender recognition on still images. First, a multi-expert which combines the decisions of classifiers fed with handcrafted features has been evaluated. The pixel intensity values of face images, namely the raw features, the LBP histograms and the HOG features have been used to train three experts which takes their decision by taking into account, respectively, the information about color, texture and shape of a human face. The decisions of the single linear SVMs have been combined with a weighted voting rule, which demonstrated to be the most effective for the problem at hand. Second, a SVM classifier with a chi-squared kernel based on trainable COSFIRE filters has been fused with an expert which rely on SURF features extracted in correspondence of certain facial landmarks. The complementarity of the two experts has been demonstrated and the decisions have been combined with a stacked classification scheme. An experimental evaluation of all the methods has been carried out on the GENDER-FERET and the LFW datasets with a standard protocol, so allowing the possibility to perform a fair comparison of the results. Such evaluation proved that the couple COSFIRE-SURF is the one which achieves the best accuracy in all the cases (accuracy of 94.7% on GENDER-FERET and 99.4% on LFW), even compared with other state of the art methods. Anyway, the performance achieved by the multi-expert which rely on the fusion of RAW, LBP and HOG classifiers can also be considered very satisfying (accuracy of 93.0% on GENDER-FERET and 98.4% on LFW)...[edited by Author]XXX cicl

    A Gender Recognition System Using Facial Images with High Dimensional Data

    Get PDF
    Gender recognition has been seen as an interesting research area that plays important roles in many fields of study. Studies from MIT and Microsoft clearly showed that the female gender was poorly recognized especially among dark-skinned nationals. The focus of this paper is to present a technique that categorise gender among dark-skinned people. The classification was done using SVM on sets of images gathered locally and publicly. Analysis includes; face detection using Viola-Jones algorithm, extraction of Histogram of Oriented Gradient and Rotation Invariant LBP (RILBP) features and trained with SVM classifier. PCA was performed on both the HOG and RILBP descriptors to extract high dimensional features. Various success rates were recorded, however, PCA on RILBP performed best with an accuracy of 99.6% and 99.8% respectively on the public and local datasets. This system will be of immense benefit in application areas like social interaction and targeted advertisement

    Convolutional Neural Networks Exploiting Attributes of Biological Neurons

    Full text link
    In this era of artificial intelligence, deep neural networks like Convolutional Neural Networks (CNNs) have emerged as front-runners, often surpassing human capabilities. These deep networks are often perceived as the panacea for all challenges. Unfortunately, a common downside of these networks is their ''black-box'' character, which does not necessarily mirror the operation of biological neural systems. Some even have millions/billions of learnable (tunable) parameters, and their training demands extensive data and time. Here, we integrate the principles of biological neurons in certain layer(s) of CNNs. Specifically, we explore the use of neuro-science-inspired computational models of the Lateral Geniculate Nucleus (LGN) and simple cells of the primary visual cortex. By leveraging such models, we aim to extract image features to use as input to CNNs, hoping to enhance training efficiency and achieve better accuracy. We aspire to enable shallow networks with a Push-Pull Combination of Receptive Fields (PP-CORF) model of simple cells as the foundation layer of CNNs to enhance their learning process and performance. To achieve this, we propose a two-tower CNN, one shallow tower and the other as ResNet 18. Rather than extracting the features blindly, it seeks to mimic how the brain perceives and extracts features. The proposed system exhibits a noticeable improvement in the performance (on an average of 5%10%5\%-10\%) on CIFAR-10, CIFAR-100, and ImageNet-100 datasets compared to ResNet-18. We also check the efficiency of only the Push-Pull tower of the network.Comment: 20 pages, 6 figure
    corecore