3,994 research outputs found

    Unsupervised Network Pretraining via Encoding Human Design

    Full text link
    Over the years, computer vision researchers have spent an immense amount of effort on designing image features for the visual object recognition task. We propose to incorporate this valuable experience to guide the task of training deep neural networks. Our idea is to pretrain the network through the task of replicating the process of hand-designed feature extraction. By learning to replicate the process, the neural network integrates previous research knowledge and learns to model visual objects in a way similar to the hand-designed features. In the succeeding finetuning step, it further learns object-specific representations from labeled data and this boosts its classification power. We pretrain two convolutional neural networks where one replicates the process of histogram of oriented gradients feature extraction, and the other replicates the process of region covariance feature extraction. After finetuning, we achieve substantially better performance than the baseline methods.Comment: 9 pages, 11 figures, WACV 2016: IEEE Conference on Applications of Computer Visio

    Automatic attendance capturing using histogram of oriented gradients on facial images

    Get PDF
    Abstract: Humans mostly use faces to identify/recognise individuals and the recent improvement in the capability of computing now allow recognition and detection automatically. However, there still exist quite a number of problems in the automatic recognition of facial images. Histogram of Oriented Gradients (HOG) has been recently adopted and seen as a standard for efficient face recognition and object detection generally. In this paper, we investigate and discuss a simple but effective approach to capturing student’s attendance register in a lecture hall by making use of HOG features for detecting and recognising students face at different moods, orientations, and illuminations. Our experiment detection and recognition output show a good performance on our facial image database obtained from the University of Johannesburg, this performance is due to HOG descriptors attributes which are robust to changes in rotation and illuminations. Our system will help to save instructional staff/lecturer time by eliminating manual calling of students name and also help monitor students

    Mineral texture identification using local binary patterns equipped with a Classification and Recognition Updating System (CARUS)

    Get PDF
    In this paper, a rotation-invariant local binary pattern operator equipped with a local contrast measure (riLBPc) is employed to characterize the type of mineral twinning by inspecting the texture properties of crystals. The proposed method uses photomicrographs of minerals and produces LBP histograms, which might be compared with those included in a predefined database using the Kullback–Leibler divergence-based metric. The paper proposes a new LBP-based scheme for concurrent classification and recognition tasks, followed by a novel online updating routine to enhance the locally developed mineral LBP database. The discriminatory power of the proposed Classification and Recognition Updating System (CARUS) for texture identification scheme is verified for plagioclase, orthoclase, microcline, and quartz minerals with sensitivity (TPR) near 99.9%, 87%, 99.9%, and 96%, and accuracy (ACC) equal to about 99%, 97%, 99%, and 99%, respectively. According to the results, the introduced CARUS system is a promising approach that can be applied in a variety of different fields dealing with classification and feature recognition tasks. © 2022 by the authors

    Towards Realistic Facial Expression Recognition

    Get PDF
    Automatic facial expression recognition has attracted significant attention over the past decades. Although substantial progress has been achieved for certain scenarios (such as frontal faces in strictly controlled laboratory settings), accurate recognition of facial expression in realistic environments remains unsolved for the most part. The main objective of this thesis is to investigate facial expression recognition in unconstrained environments. As one major problem faced by the literature is the lack of realistic training and testing data, this thesis presents a web search based framework to collect realistic facial expression dataset from the Web. By adopting an active learning based method to remove noisy images from text based image search results, the proposed approach minimizes the human efforts during the dataset construction and maximizes the scalability for future research. Various novel facial expression features are then proposed to address the challenges imposed by the newly collected dataset. Finally, a spectral embedding based feature fusion framework is presented to combine the proposed facial expression features to form a more descriptive representation. This thesis also systematically investigates how the number of frames of a facial expression sequence can affect the performance of facial expression recognition algorithms, since facial expression sequences may be captured under different frame rates in realistic scenarios. A facial expression keyframe selection method is proposed based on keypoint based frame representation. Comprehensive experiments have been performed to demonstrate the effectiveness of the presented methods

    Social network extraction and analysis based on multimodal dyadic interaction

    Get PDF
    Social interactions are a very important component in people"s lives. Social network analysis has become a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. For our study, we used a set of videos belonging to New York Times" Blogging Heads opinion blog. The Social Network is represented as an oriented graph, whose directed links are determined by the Influence Model. The links" weights are a measure of the"influence" a person has over the other. The states of the Influence Model encode automatically extracted audio/visual features from our videos using state-of-the art algorithms. Our results are reported in terms of accuracy of audio/visual data fusion for speaker segmentation and centrality measures used to characterize the extracted social network
    corecore