4,242 research outputs found

    Face Recognition from Weakly Labeled Data

    Get PDF
    Recognizing the identity of a face or a person in the media usually requires lots of training data to design robust classifiers, which demands a great amount of human effort for annotation. Alternatively, the weakly labeled data is publicly available, but the labels can be ambiguous or noisy. For instance, names in the caption of a news photo provide possible candidates for faces appearing in the image. Names in the screenplays are only weakly associated with faces in the videos. Since weakly labeled data is not explicitly labeled by humans, robust learning methods that use weakly labeled data should suppress the impact of noisy instances or automatically resolve the ambiguities in noisy labels. We propose a method for character identification in a TV-series. The proposed method uses automatically extracted labels by associating the faces with names in the transcripts. Such weakly labeled data often has erroneous labels resulting from errors in detecting a face and synchronization. Our approach achieves robustness to noisy labeling by utilizing several features. We construct track nodes from face and person tracks and utilize information from facial and clothing appearances. We discover the video structure for effective inference by constructing a minimum-distance spanning tree (MST) from the track nodes. Hence, track nodes of similar appearance become adjacent to each other and are likely to have the same identity. The non-local cost aggregation step thus serves as a noise suppression step to reliably recognize the identity of the characters in the video. Another type of weakly labeled data results from labeling ambiguities. In other words, a training sample can have more than one label, and typically one of the labels is the true label. For instance, a news photo is usually accompanied by the captions, and the names provided in the captions can be used as the candidate labels for the faces appearing in the photo. Learning an effective subject classifier from the ambiguously labeled data is called ambiguously labeled learning. We propose a matrix completion framework for predicting the actual labels from the ambiguously labeled instances, and a standard supervised classifier that subsequently learns from the disambiguated labels to classify new data. We generalize this matrix completion framework to handle the issue of labeling imbalance that avoids domination by dominant labels. Besides, an iterative candidate elimination step is integrated with the proposed approach to improve the ambiguity resolution. Recently, video-based face recognition techniques have received significant attention since faces in a video provide diverse exemplars for constructing a robust representation of the target (i.e., subject of interest). Nevertheless, the target face in the video is usually annotated with minimum human effort (i.e., a single bounding box in a video frame). Although face tracking techniques can be utilized to associate faces in a single video shot, it is ineffective for associating faces across multiple video shots. To fully utilize faces of a target in multiples-shot videos, we propose a target face association (TFA) method to obtain a set of images of the target face, and these associated images are then utilized to construct a robust representation of the target for improving the performance of video-based face recognition task. One of the most important applications of video-based face recognition is outdoor video surveillance using a camera network. Face recognition in outdoor environment is a challenging task due to illumination changes, pose variations, and occlusions. We present the taxonomy of camera networks and discuss several techniques for continuous tracking of faces acquired by an outdoor camera network as well as a face matching algorithm. Finally, we demonstrate the real-time video surveillance system using pan-tilt-zoom (PTZ) cameras to perform pedestrian tracking, localization, face detection, and face recognition

    A facial expression for anxiety.

    Get PDF
    Anxiety and fear are often confounded in discussions of human emotions. However, studies of rodent defensive reactions under naturalistic conditions suggest anxiety is functionally distinct from fear. Unambiguous threats, such as predators, elicit flight from rodents (if an escape-route is available), whereas ambiguous threats (e.g., the odor of a predator) elicit risk assessment behavior, which is associated with anxiety as it is preferentially modulated by anti-anxiety drugs. However, without human evidence, it would be premature to assume that rodent-based psychological models are valid for humans. We tested the human validity of the risk assessment explanation for anxiety by presenting 8 volunteers with emotive scenarios and asking them to pose facial expressions. Photographs and videos of these expressions were shown to 40 participants who matched them to the scenarios and labeled each expression. Scenarios describing ambiguous threats were preferentially matched to the facial expression posed in response to the same scenario type. This expression consisted of two plausible environmental-scanning behaviors (eye darts and head swivels) and was labeled as anxiety, not fear. The facial expression elicited by unambiguous threat scenarios was labeled as fear. The emotion labels generated were then presented to another 18 participants who matched them back to photographs of the facial expressions. This back-matching of labels to faces also linked anxiety to the environmental-scanning face rather than fear face. Results therefore suggest that anxiety produces a distinct facial expression and that it has adaptive value in situations that are ambiguously threatening, supporting a functional, risk-assessing explanation for human anxiet


    Get PDF
    In recent years, the theory of sparse representation has emerged as a powerful tool for efficient processing of data in non-traditional ways. This is mainly due to the fact that most signals and images of interest tend to be sparse or compressible in some dictionary. In other words, they can be well approximated by a linear combination of a few elements (also known as atoms) of a dictionary. This dictionary can either be an analytic dictionary composed of wavelets or Fourier basis or it can be directly trained from data. It has been observed that dictionaries learned directly from data provide better representation and hence can improve the performance of many practical applications such as restoration and classification. In this dissertation, we study dictionary learning and recognition under supervised, unsupervised, and semi-supervised settings. In the supervised case, we propose an approach to recognize humans in unconstrained videos, where the main challenge is exploiting the identity information in multiple frames and the accompanying dynamic signature. These identity cues include face, body, and motion. Our approach is based on video-dictionaries for face and body. We design video-dictionaries to implicitly encode temporal, pose, and illumination information. Next, we propose a novel multivariate sparse representation method that jointly represents all the video data by a sparse linear combination of training data. To increase the ability of our algorithm to learn nonlinearities, we apply kernel methods to learn the dictionaries. Next, we address the problem of matching faces across changes in pose in unconstrained videos. Our approach consists of two methods based on 3D rotation and sparse representation that compensate for changes in pose. We demonstrate the superior performance of our approach over several state-of-the-art algorithms through extensive experiments on unconstrained video datasets. In the unsupervised case, we present an approach that simultaneously clusters images and learns dictionaries from the clusters. The method learns dictionaries in the Radon transform domain. The main feature of the proposed approach is that it provides in-plane rotation and scale invariant clustering, which is useful in many applications such as Content Based Image Retrieval (CBIR). We demonstrate through experiments that the proposed rotation and scale invariant clustering provides not only good retrieval performances but also substantial improvements and robustness compared to traditional Gabor-based and several state-of-the-art shape-based methods. We then extend the dictionary learning problem to a generalized semi-supervised formulation, where each training sample is provided with a set of possible labels and only one label among them is the true one. Such applications can be found in image and video collections where one often has only partially labeled data. For instance, given an image with multiple faces and a caption specifying the names, we can be sure that each of the faces belong to one of the names specified, while the exact identity of each face is not known. Labeling involves significant amount of human effort and is expensive. This has motivated researchers to develop learning algorithms from partially labeled training data. In this work, we develop dictionary learning algorithms that utilize such partially labeled data. The proposed method aims to solve the problem of ambiguously labeled multiclass-classification using an iterative algorithm. The dictionaries are updated using either soft (EM-based) or hard decision rules. Extensive evaluations on existing datasets demonstrate that the proposed method performs significantly better than state-of-the-art approaches for learning from ambiguously labeled data. As sparsity plays a major role in our research, we further present a sparse representation-based approach to find the salient views of 3D objects. The salient views are categorized into two groups. The first are boundary representative views that have several visible sides and object surfaces that may be attractive to humans. The second are side representative views that best represent side views of the approximating convex shape. The side representative views are class-specific views and possess the most representative power compared to other within-class views. Using the concept of characteristic view class, we first present a sparse representation-based approach for estimating the boundary representative views. With the estimated boundaries, we determine the side representative views based on a minimum reconstruction error criterion. Furthermore, to evaluate our method, we introduce the notion of geometric dictionaries built from salient views for applications in 3D object recognition, retrieval and sparse-to-full reconstruction. By a series of experiments on four publicly available 3D object datasets, we demonstrate the effectiveness of our approach over state-of-the-art algorithms and baseline methods

    Learning from Partial Labels

    Get PDF
    We address the problem of partially-labeled multiclass classification, where instead of a single label per instance, the algorithm is given a candidate set of labels, only one of which is correct. Our setting is motivated by a common scenario in many image and video collections, where only partial access to labels is available. The goal is to learn a classifier that can disambiguate the partially-labeled training instances, and generalize to unseen data. We define an intuitive property of the data distribution that sharply characterizes the ability to learn in this setting and show that effective learning is possible even when all the data is only partially labeled. Exploiting this property of the data, we propose a convex learning formulation based on minimization of a loss function appropriate for the partial label setting. We analyze the conditions under which our loss function is asymptotically consistent, as well as its generalization and transductive performance. We apply our framework to identifying faces culled from web news sources and to naming characters in TV series and movies; in particular, we annotated and experimented on a very large video data set and achieve 6% error for character naming on 16 episodes of the TV series Lost

    Deep Feature Representation and Similarity Matrix based Noise Label Refinement Method for Efficient Face Annotation

    Get PDF
    Face annotation is a naming procedure that assigns the correct name to a person emerging from an image. Faces that are manually annotated by people in online applications include incorrect labels, giving rise to the issue of label ambiguity. This may lead to mislabelling in face annotation. Consequently, an efficient method is still essential to enhance the reliability of face annotation. Hence, in this work, a novel method named the Similarity Matrix-based Noise Label Refinement (SMNLR) is proposed, which effectively predicts the accurate label from the noisy labelled facial images. To enhance the performance of the proposed method, the deep learning technique named Convolutional Neural Networks (CNN) is used for feature representation. Several experiments are conducted to evaluate the effectiveness of the proposed face annotation method using the LFW, IMFDB and Yahoo datasets. The experimental results clearly illustrate the robustness of the proposed SMNLR method in dealing with noisy labelled faces
    • …