129 research outputs found

    Much Ado About Time: Exhaustive Annotation of Temporal Data

    Full text link
    Large-scale annotated datasets allow AI systems to learn from and build upon the knowledge of the crowd. Many crowdsourcing techniques have been developed for collecting image annotations. These techniques often implicitly rely on the fact that a new input image takes a negligible amount of time to perceive. In contrast, we investigate and determine the most cost-effective way of obtaining high-quality multi-label annotations for temporal data such as videos. Watching even a short 30-second video clip requires a significant time investment from a crowd worker; thus, requesting multiple annotations following a single viewing is an important cost-saving strategy. But how many questions should we ask per video? We conclude that the optimal strategy is to ask as many questions as possible in a HIT (up to 52 binary questions after watching a 30-second video clip in our experiments). We demonstrate that while workers may not correctly answer all questions, the cost-benefit analysis nevertheless favors consensus from multiple such cheap-yet-imperfect iterations over more complex alternatives. When compared with a one-question-per-video baseline, our method is able to achieve a 10% improvement in recall 76.7% ours versus 66.7% baseline) at comparable precision (83.8% ours versus 83.0% baseline) in about half the annotation time (3.8 minutes ours compared to 7.1 minutes baseline). We demonstrate the effectiveness of our method by collecting multi-label annotations of 157 human activities on 1,815 videos.Comment: HCOMP 2016 Camera Read

    Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

    Get PDF
    Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So how do we collect sufficiently many diverse but boring samples representing our lives? We propose a novel Hollywood in Homes approach to collect such data. Instead of shooting videos in the lab, we ensure diversity by distributing and crowdsourcing the whole process of video creation from script writing to video recording and annotation. Following this procedure we collect a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities. The dataset is composed of 9,848 annotated videos with an average length of 30 seconds, showing activities of 267 people from three continents. Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects. In total, Charades provides 27,847 video descriptions, 66,500 temporally localized intervals for 157 action classes and 41,104 labels for 46 object classes. Using this rich data, we evaluate and provide baseline results for several tasks including action recognition and automatic description generation. We believe that the realism, diversity, and casual nature of this dataset will present unique challenges and new opportunities for computer vision community

    Beyond the Camera: Neural Networks in World Coordinates

    Full text link
    Eye movement and strategic placement of the visual field onto the retina, gives animals increased resolution of the scene and suppresses distracting information. This fundamental system has been missing from video understanding with deep networks, typically limited to 224 by 224 pixel content locked to the camera frame. We propose a simple idea, WorldFeatures, where each feature at every layer has a spatial transformation, and the feature map is only transformed as needed. We show that a network built with these WorldFeatures, can be used to model eye movements, such as saccades, fixation, and smooth pursuit, even in a batch setting on pre-recorded video. That is, the network can for example use all 224 by 224 pixels to look at a small detail one moment, and the whole scene the next. We show that typical building blocks, such as convolutions and pooling, can be adapted to support WorldFeatures using available tools. Experiments are presented on the Charades, Olympic Sports, and Caltech-UCSD Birds-200-2011 datasets, exploring action recognition, fine-grained recognition, and video stabilization

    Characterizing Video Question Answering with Sparsified Inputs

    Full text link
    In Video Question Answering, videos are often processed as a full-length sequence of frames to ensure minimal loss of information. Recent works have demonstrated evidence that sparse video inputs are sufficient to maintain high performance. However, they usually discuss the case of single frame selection. In our work, we extend the setting to multiple number of inputs and other modalities. We characterize the task with different input sparsity and provide a tool for doing that. Specifically, we use a Gumbel-based learnable selection module to adaptively select the best inputs for the final task. In this way, we experiment over public VideoQA benchmarks and provide analysis on how sparsified inputs affect the performance. From our experiments, we have observed only 5.2%-5.8% loss of performance with only 10% of video lengths, which corresponds to 2-4 frames selected from each video. Meanwhile, we also observed the complimentary behaviour between visual and textual inputs, even under highly sparsified settings, suggesting the potential of improving data efficiency for video-and-language tasks

    Multiple genetic loci for bone mineral density and fractures

    Get PDF
    To access publisher full text version of this article. Please click on the hyperlink in Additional Links fieldBACKGROUND: Bone mineral density influences the risk of osteoporosis later in life and is useful in the evaluation of the risk of fracture. We aimed to identify sequence variants associated with bone mineral density and fracture. METHODS: We performed a quantitative trait analysis of data from 5861 Icelandic subjects (the discovery set), testing for an association between 301,019 single-nucleotide polymorphisms (SNPs) and bone mineral density of the hip and lumbar spine. We then tested for an association between 74 SNPs (most of which were implicated in the discovery set) at 32 loci in replication sets of Icelandic, Danish, and Australian subjects (4165, 2269, and 1491 subjects, respectively). RESULTS: Sequence variants in five genomic regions were significantly associated with bone mineral density in the discovery set and were confirmed in the replication sets (combined P values, 1.2x10(-7) to 2.0x10(-21)). Three regions are close to or within genes previously shown to be important to the biologic characteristics of bone: the receptor activator of nuclear factor-kappaB ligand gene (RANKL) (chromosomal location, 13q14), the osteoprotegerin gene (OPG) (8q24), and the estrogen receptor 1 gene (ESR1) (6q25). The two other regions are close to the zinc finger and BTB domain containing 40 gene (ZBTB40) (1p36) and the major histocompatibility complex region (6p21). The 1p36, 8q24, and 6p21 loci were also associated with osteoporotic fractures, as were loci at 18q21, close to the receptor activator of the nuclear factor-kappaB gene (RANK), and loci at 2p16 and 11p11. CONCLUSIONS: We have discovered common sequence variants that are consistently associated with bone mineral density and with low-trauma fractures in three populations of European descent. Although these variants alone are not clinically useful in the prediction of risk to the individual person, they provide insight into the biochemical pathways underlying osteoporosis

    A candidate gene study of the type I interferon pathway implicates IKBKE and IL8 as risk loci for SLE

    Get PDF
    Systemic Lupus Erythematosus (SLE) is a systemic autoimmune disease in which the type I interferon pathway has a crucial role. We have previously shown that three genes in this pathway, IRF5, TYK2 and STAT4, are strongly associated with risk for SLE. Here, we investigated 78 genes involved in the type I interferon pathway to identify additional SLE susceptibility loci. First, we genotyped 896 single-nucleotide polymorphisms in these 78 genes and 14 other candidate genes in 482 Swedish SLE patients and 536 controls. Genes with P<0.01 in the initial screen were then followed up in 344 additional Swedish patients and 1299 controls. SNPs in the IKBKE, TANK, STAT1, IL8 and TRAF6 genes gave nominal signals of association with SLE in this extended Swedish cohort. To replicate these findings we extracted data from a genomewide association study on SLE performed in a US cohort. Combined analysis of the Swedish and US data, comprising a total of 2136 cases and 9694 controls, implicates IKBKE and IL8 as SLE susceptibility loci (Pmeta=0.00010 and Pmeta=0.00040, respectively). STAT1 was also associated with SLE in this cohort (Pmeta=3.3 × 10−5), but this association signal appears to be dependent of that previously reported for the neighbouring STAT4 gene. Our study suggests additional genes from the type I interferon system in SLE, and highlights genes in this pathway for further functional analysis

    The sequences of 150,119 genomes in the UK Biobank

    Get PDF
    Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data(1,2). Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank(3). This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation
    corecore