29,464 research outputs found

    Challenges in development of the American Sign Language Lexicon Video Dataset (ASLLVD) corpus

    Full text link
    The American Sign Language Lexicon Video Dataset (ASLLVD) consists of videos of >3,300 ASL signs in citation form, each produced by 1-6 native ASL signers, for a total of almost 9,800 tokens. This dataset, including multiple synchronized videos showing the signing from different angles, will be shared publicly once the linguistic annotations and verifications are complete. Linguistic annotations include gloss labels, sign start and end time codes, start and end handshape labels for both hands, morphological and articulatory classifications of sign type. For compound signs, the dataset includes annotations for each morpheme. To facilitate computer vision-based sign language recognition, the dataset also includes numeric ID labels for sign variants, video sequences in uncompressed-raw format, camera calibration sequences, and software for skin region extraction. We discuss here some of the challenges involved in the linguistic annotations and categorizations. We also report an example computer vision application that leverages the ASLLVD: the formulation employs a HandShapes Bayesian Network (HSBN), which models the transition probabilities between start and end handshapes in monomorphemic lexical signs. Further details and statistics for the ASLLVD dataset, as well as information about annotation conventions, are available from http://www.bu.edu/asllrp/lexicon

    Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling

    Get PDF
    We frame the task of predicting a semantic labeling as a sparse reconstruction procedure that applies a target-specific learned transfer function to a generic deep sparse code representation of an image. This strategy partitions training into two distinct stages. First, in an unsupervised manner, we learn a set of generic dictionaries optimized for sparse coding of image patches. We train a multilayer representation via recursive sparse dictionary learning on pooled codes output by earlier layers. Second, we encode all training images with the generic dictionaries and learn a transfer function that optimizes reconstruction of patches extracted from annotated ground-truth given the sparse codes of their corresponding image patches. At test time, we encode a novel image using the generic dictionaries and then reconstruct using the transfer function. The output reconstruction is a semantic labeling of the test image. Applying this strategy to the task of contour detection, we demonstrate performance competitive with state-of-the-art systems. Unlike almost all prior work, our approach obviates the need for any form of hand-designed features or filters. To illustrate general applicability, we also show initial results on semantic part labeling of human faces. The effectiveness of our approach opens new avenues for research on deep sparse representations. Our classifiers utilize this representation in a novel manner. Rather than acting on nodes in the deepest layer, they attach to nodes along a slice through multiple layers of the network in order to make predictions about local patches. Our flexible combination of a generatively learned sparse representation with discriminatively trained transfer classifiers extends the notion of sparse reconstruction to encompass arbitrary semantic labeling tasks.Comment: to appear in Asian Conference on Computer Vision (ACCV), 201

    FOCIS: A forest classification and inventory system using LANDSAT and digital terrain data

    Get PDF
    Accurate, cost-effective stratification of forest vegetation and timber inventory is the primary goal of a Forest Classification and Inventory System (FOCIS). Conventional timber stratification using photointerpretation can be time-consuming, costly, and inconsistent from analyst to analyst. FOCIS was designed to overcome these problems by using machine processing techniques to extract and process tonal, textural, and terrain information from registered LANDSAT multispectral and digital terrain data. Comparison of samples from timber strata identified by conventional procedures showed that both have about the same potential to reduce the variance of timber volume estimates over simple random sampling

    Image Semantics in the Description and Categorization of Journalistic Photographs

    Get PDF
    This paper reports a study on the description and categorization of images. The aim of the study was to evaluate existing indexing frameworks in the context of reportage photographs and to find out how the use of this particular image genre influences the results. The effect of different tasks on image description and categorization was also studied. Subjects performed keywording and free description tasks and the elicited terms were classified using the most extensive one of the reviewed frameworks. Differences were found in the terms used in constrained and unconstrained descriptions. Summarizing terms such as abstract concepts, themes, settings and emotions were used more frequently in keywording than in free description. Free descriptions included more terms referring to locations within the images, people and descriptive terms due to the narrative form the subjects used without prompting. The evaluated framework was found to lack some syntactic and semantic classes present in the data and modifications were suggested. According to the results of this study image categorization is based on high-level interpretive concepts, including affective and abstract themes. The results indicate that image genre influences categorization and keywording modifies and truncates natural image description

    A phenomenological approach to multisource data integration: Analysing infrared and visible data

    Get PDF
    A new method is described for combining multisensory data for remote sensing applications. The approach uses phenomenological models which allow the specification of discriminatory features that are based on intrinsic physical properties of imaged surfaces. Thermal and visual images of scenes are analyzed to estimate surface heat fluxes. Such analysis makes available a discriminatory feature that is closely related to the thermal capacitance of the imaged objects. This feature provides a method for labelling image regions based on physical properties of imaged objects. This approach is different from existing approaches which use the signal intensities in each channel (or an arbitrary linear or nonlinear combination of signal intensities) as features - which are then classified by a statistical or evident approach
    corecore