1,940 research outputs found

    Multi-region probabilistic histograms for robust and scalable identity inference

    Get PDF
    We propose a scalable face matching algorithm capable of dealing with faces subject to several concurrent and uncontrolled factors, such as variations in pose, expression, illumination, resolution, as well as scale and misalignment problems. Each face is described in terms of multi-region probabilistic histograms of visual words, followed by a normalised distance calculation between the histograms of two faces. We also propose a fast histogram approximation method which dramatically reduces the computational burden with minimal impact on discrimination performance. Experiments on the “Labeled Faces in the Wild” dataset (unconstrained environments) as well as FERET (controlled variations) show that the proposed algorithm obtains performance on par with a more complex method and displays a clear advantage over predecessor systems. Furthermore, the use of multiple regions (as opposed to a single overall region) improves accuracy in most cases, especially when dealing with illumination changes and very low resolution images. The experiments also show that normalised distances can noticeably improve robustness by partially counteracting the effects of image variations

    Learning Multimodal Latent Attributes

    Get PDF
    Abstract—The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multi-modal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we (1) introduce a concept of semi-latent attribute space, expressing user-defined and latent attributes in a unified framework, and (2) propose a novel scalable probabilistic topic model for learning multi-modal semi-latent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multi-task learning, learning with label noise, N-shot transfer learning and importantly zero-shot learning

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Efficient Human Activity Recognition in Large Image and Video Databases

    Get PDF
    Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images

    Enhancing Face Recognition with Deep Learning Architectures: A Comprehensive Review

    Get PDF
    The progression of information discernment via facial identification and the emergence of innovative frameworks has exhibited remarkable strides in recent years. This phenomenon has been particularly pronounced within the realm of verifying individual credentials, a practice prominently harnessed by law enforcement agencies to advance the field of forensic science. A multitude of scholarly endeavors have been dedicated to the application of deep learning techniques within machine learning models. These endeavors aim to facilitate the extraction of distinctive features and subsequent classification, thereby elevating the precision of unique individual recognition. In the context of this scholarly inquiry, the focal point resides in the exploration of deep learning methodologies tailored for the realm of facial recognition and its subsequent matching processes. This exploration centers on the augmentation of accuracy through the meticulous process of training models with expansive datasets. Within the confines of this research paper, a comprehensive survey is conducted, encompassing an array of diverse strategies utilized in facial recognition. This survey, in turn, delves into the intricacies and challenges that underlie the intricate field of facial recognition within imagery analysis
    • …
    corecore