93,770 research outputs found

    HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition

    Full text link
    We present an algorithm for simultaneous face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNN). The proposed method called, HyperFace, fuses the intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. It exploits the synergy among the tasks which boosts up their individual performances. Additionally, we propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the ResNet-101 model and achieves significant improvement in performance, and (2) Fast-HyperFace that uses a high recall fast face detector for generating region proposals to improve the speed of the algorithm. Extensive experiments show that the proposed models are able to capture both global and local information in faces and performs significantly better than many competitive algorithms for each of these four tasks.Comment: Accepted in Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Modeling of Facial Aging and Kinship: A Survey

    Full text link
    Computational facial models that capture properties of facial cues related to aging and kinship increasingly attract the attention of the research community, enabling the development of reliable methods for age progression, age estimation, age-invariant facial characterization, and kinship verification from visual data. In this paper, we review recent advances in modeling of facial aging and kinship. In particular, we provide an up-to date, complete list of available annotated datasets and an in-depth analysis of geometric, hand-crafted, and learned facial representations that are used for facial aging and kinship characterization. Moreover, evaluation protocols and metrics are reviewed and notable experimental results for each surveyed task are analyzed. This survey allows us to identify challenges and discuss future research directions for the development of robust facial models in real-world conditions

    Error-Correcting Factorization

    Full text link
    Error Correcting Output Codes (ECOC) is a successful technique in multi-class classification, which is a core problem in Pattern Recognition and Machine Learning. A major advantage of ECOC over other methods is that the multi- class problem is decoupled into a set of binary problems that are solved independently. However, literature defines a general error-correcting capability for ECOCs without analyzing how it distributes among classes, hindering a deeper analysis of pair-wise error-correction. To address these limitations this paper proposes an Error-Correcting Factorization (ECF) method, our contribution is three fold: (I) We propose a novel representation of the error-correction capability, called the design matrix, that enables us to build an ECOC on the basis of allocating correction to pairs of classes. (II) We derive the optimal code length of an ECOC using rank properties of the design matrix. (III) ECF is formulated as a discrete optimization problem, and a relaxed solution is found using an efficient constrained block coordinate descent approach. (IV) Enabled by the flexibility introduced with the design matrix we propose to allocate the error-correction on classes that are prone to confusion. Experimental results in several databases show that when allocating the error-correction to confusable classes ECF outperforms state-of-the-art approaches.Comment: Under review at TPAM

    Regression-based Hypergraph Learning for Image Clustering and Classification

    Full text link
    Inspired by the recently remarkable successes of Sparse Representation (SR), Collaborative Representation (CR) and sparse graph, we present a novel hypergraph model named Regression-based Hypergraph (RH) which utilizes the regression models to construct the high quality hypergraphs. Moreover, we plug RH into two conventional hypergraph learning frameworks, namely hypergraph spectral clustering and hypergraph transduction, to present Regression-based Hypergraph Spectral Clustering (RHSC) and Regression-based Hypergraph Transduction (RHT) models for addressing the image clustering and classification issues. Sparse Representation and Collaborative Representation are employed to instantiate two RH instances and their RHSC and RHT algorithms. The experimental results on six popular image databases demonstrate that the proposed RH learning algorithms achieve promising image clustering and classification performances, and also validate that RH can inherit the desirable properties from both hypergraph models and regression models.Comment: 11page

    Lifting Object Detection Datasets into 3D

    Full text link
    While data has certainly taken the center stage in computer vision in recent years, it can still be difficult to obtain in certain scenarios. In particular, acquiring ground truth 3D shapes of objects pictured in 2D images remains a challenging feat and this has hampered progress in recognition-based object reconstruction from a single image. Here we propose to bypass previous solutions such as 3D scanning or manual design, that scale poorly, and instead populate object category detection datasets semi-automatically with dense, per-object 3D reconstructions, bootstrapped from:(i) class labels, (ii) ground truth figure-ground segmentations and (iii) a small set of keypoint annotations. Our proposed algorithm first estimates camera viewpoint using rigid structure-from-motion and then reconstructs object shapes by optimizing over visual hull proposals guided by loose within-class shape similarity assumptions. The visual hull sampling process attempts to intersect an object's projection cone with the cones of minimal subsets of other similar objects among those pictured from certain vantage points. We show that our method is able to produce convincing per-object 3D reconstructions and to accurately estimate cameras viewpoints on one of the most challenging existing object-category detection datasets, PASCAL VOC. We hope that our results will re-stimulate interest on joint object recognition and 3D reconstruction from a single image

    Advances in Human Action Recognition: A Survey

    Full text link
    Human action recognition has been an important topic in computer vision due to its many applications such as video surveillance, human machine interaction and video retrieval. One core problem behind these applications is automatically recognizing low-level actions and high-level activities of interest. The former is usually the basis for the latter. This survey gives an overview of the most recent advances in human action recognition during the past several years, following a well-formed taxonomy proposed by a previous survey. From this state-of-the-art survey, researchers can view a panorama of progress in this area for future research

    How an Electrical Engineer Became an Artificial Intelligence Researcher, a Multiphase Active Contours Analysis

    Full text link
    This essay examines how what is considered to be artificial intelligence (AI) has changed over time and come to intersect with the expertise of the author. Initially, AI developed on a separate trajectory, both topically and institutionally, from pattern recognition, neural information processing, decision and control systems, and allied topics by focusing on symbolic systems within computer science departments rather than on continuous systems in electrical engineering departments. The separate evolutions continued throughout the author's lifetime, with some crossover in reinforcement learning and graphical models, but were shocked into converging by the virality of deep learning, thus making an electrical engineer into an AI researcher. Now that this convergence has happened, opportunity exists to pursue an agenda that combines learning and reasoning bridged by interpretable machine learning models

    Parameter Estimation in Finite Mixture Models by Regularized Optimal Transport: A Unified Framework for Hard and Soft Clustering

    Full text link
    In this short paper, we formulate parameter estimation for finite mixture models in the context of discrete optimal transportation with convex regularization. The proposed framework unifies hard and soft clustering methods for general mixture models. It also generalizes the celebrated kk\nobreakdash-means and expectation-maximization algorithms in relation to associated Bregman divergences when applied to exponential family mixture models

    Single Image Action Recognition by Predicting Space-Time Saliency

    Full text link
    We propose a novel approach based on deep Convolutional Neural Networks (CNN) to recognize human actions in still images by predicting the future motion, and detecting the shape and location of the salient parts of the image. We make the following major contributions to this important area of research: (i) We use the predicted future motion in the static image (Walker et al., 2015) as a means of compensating for the missing temporal information, while using the saliency map to represent the the spatial information in the form of location and shape of what is predicted as significant. (ii) We cast action classification in static images as a domain adaptation problem by transfer learning. We first map the input static image to a new domain that we refer to as the Predicted Optical Flow-Saliency Map domain (POF-SM), and then fine-tune the layers of a deep CNN model trained on classifying the ImageNet dataset to perform action classification in the POF-SM domain. (iii) We tested our method on the popular Willow dataset. But unlike existing methods, we also tested on a more realistic and challenging dataset of over 2M still images that we collected and labeled by taking random frames from the UCF-101 video dataset. We call our dataset the UCF Still Image dataset or UCFSI-101 in short. Our results outperform the state of the art

    Survey of state-of-the-art mixed data clustering algorithms

    Full text link
    Mixed data comprises both numeric and categorical features, and mixed datasets occur frequently in many domains, such as health, finance, and marketing. Clustering is often applied to mixed datasets to find structures and to group similar objects for further analysis. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation or averaging, to the feature values of these datasets. In this paper, we present a taxonomy for the study of mixed data clustering algorithms by identifying five major research themes. We then present a state-of-the-art review of the research works within each research theme. We analyze the strengths and weaknesses of these methods with pointers for future research directions. Lastly, we present an in-depth analysis of the overall challenges in this field, highlight open research questions and discuss guidelines to make progress in the field.Comment: 20 Pages, 2 columns, 6 Tables, 209 Reference
    • …
    corecore