3,994 research outputs found
Unsupervised Network Pretraining via Encoding Human Design
Over the years, computer vision researchers have spent an immense amount of
effort on designing image features for the visual object recognition task. We
propose to incorporate this valuable experience to guide the task of training
deep neural networks. Our idea is to pretrain the network through the task of
replicating the process of hand-designed feature extraction. By learning to
replicate the process, the neural network integrates previous research
knowledge and learns to model visual objects in a way similar to the
hand-designed features. In the succeeding finetuning step, it further learns
object-specific representations from labeled data and this boosts its
classification power. We pretrain two convolutional neural networks where one
replicates the process of histogram of oriented gradients feature extraction,
and the other replicates the process of region covariance feature extraction.
After finetuning, we achieve substantially better performance than the baseline
methods.Comment: 9 pages, 11 figures, WACV 2016: IEEE Conference on Applications of
Computer Visio
Automatic attendance capturing using histogram of oriented gradients on facial images
Abstract: Humans mostly use faces to identify/recognise individuals and the recent improvement in the capability of computing now allow recognition and detection automatically. However, there still exist quite a number of problems in the automatic recognition of facial images. Histogram of Oriented Gradients (HOG) has been recently adopted and seen as a standard for efficient face recognition and object detection generally. In this paper, we investigate and discuss a simple but effective approach to capturing student’s attendance register in a lecture hall by making use of HOG features for detecting and recognising students face at different moods, orientations, and illuminations. Our experiment detection and recognition output show a good performance on our facial image database obtained from the University of Johannesburg, this performance is due to HOG descriptors attributes which are robust to changes in rotation and illuminations. Our system will help to save instructional staff/lecturer time by eliminating manual calling of students name and also help monitor students
Mineral texture identification using local binary patterns equipped with a Classification and Recognition Updating System (CARUS)
In this paper, a rotation-invariant local binary pattern operator equipped with a local contrast measure (riLBPc) is employed to characterize the type of mineral twinning by inspecting the texture properties of crystals. The proposed method uses photomicrographs of minerals and produces LBP histograms, which might be compared with those included in a predefined database using the Kullback–Leibler divergence-based metric. The paper proposes a new LBP-based scheme for concurrent classification and recognition tasks, followed by a novel online updating routine to enhance the locally developed mineral LBP database. The discriminatory power of the proposed Classification and Recognition Updating System (CARUS) for texture identification scheme is verified for plagioclase, orthoclase, microcline, and quartz minerals with sensitivity (TPR) near 99.9%, 87%, 99.9%, and 96%, and accuracy (ACC) equal to about 99%, 97%, 99%, and 99%, respectively. According to the results, the introduced CARUS system is a promising approach that can be applied in a variety of different fields dealing with classification and feature recognition tasks. © 2022 by the authors
Towards Realistic Facial Expression Recognition
Automatic facial expression recognition has attracted significant attention over the past decades. Although substantial progress has been achieved for certain scenarios (such as frontal faces in strictly controlled laboratory settings), accurate recognition of facial expression in realistic environments remains unsolved for the most part. The main objective of this thesis is to investigate facial expression recognition in unconstrained environments. As one major problem faced by the literature is the lack of realistic training and testing data, this thesis presents a web search based framework to collect realistic facial expression dataset from the Web. By adopting an active learning based method to remove noisy images from text based image search results, the proposed approach minimizes the human efforts during the dataset construction and maximizes the scalability for future research. Various novel facial expression features are then proposed to address the challenges imposed by the newly collected dataset. Finally, a spectral embedding based feature fusion framework is presented to combine the proposed facial expression features to form a more descriptive representation. This thesis also systematically investigates how the number of frames of a facial expression sequence can affect the performance of facial expression recognition algorithms, since facial expression sequences may be captured under different frame rates in realistic scenarios. A facial expression keyframe selection method is proposed based on keypoint based frame representation. Comprehensive experiments have been performed to demonstrate the effectiveness of the presented methods
Social network extraction and analysis based on multimodal dyadic interaction
Social interactions are a very important component in people"s lives. Social network analysis has become a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. For our study, we used a set of videos belonging to New York Times" Blogging Heads opinion blog. The Social Network is represented as an oriented graph, whose directed links are determined by the Influence Model. The links" weights are a measure of the"influence" a person has over the other. The states of the Influence Model encode automatically extracted audio/visual features from our videos using state-of-the art algorithms. Our results are reported in terms of accuracy of audio/visual data fusion for speaker segmentation and centrality measures used to characterize the extracted social network
- …