719 research outputs found

    Crowd-sourced data and its applications for new algorithms in photographic imaging

    Get PDF
    This thesis comprises two main themes. The first of these is concerned primarily with the validity and utility of data acquired from web-based psychophysical experiments. In recent years web-based experiments, and the crowd-sourced data they can deliver, have been rising in popularity among the research community for several key reasons – primarily ease of administration and easy access to a large population of diverse participants. However, the level of control with which traditional experiments are performed, and the severe lack of control we have over web-based alternatives may lead us to believe that these benefits come at the cost of reliable data. Indeed, the results reported early in this thesis support this assumption. However, we proceed to show that it is entirely possible to crowd-source data that is comparable with lab-based results. The second theme of the thesis explores the possibilities presented by the use of crowd-sourced data, taking a popular colour naming experiment as an example. After using the crowd-sourced data to construct a model for computational colour naming, we consider the value of colour names as image descriptors, with particular relevance to illuminant estimation and object indexing. We discover that colour names represent a particularly useful quantisation of colour space, allowing us to construct compact image descriptors for object indexing. We show that these descriptors are somewhat tolerant to errors in illuminant estimation and that their perceptual relevance offers even further utility. We go on to develop a novel algorithm which delivers perceptually-relevant, illumination-invariant image descriptors based on colour names

    The problems and challenges of managing crowd sourced audio-visual evidence

    Get PDF
    A number of recent incidents, such as the Stanley Cup Riots, the uprisings in the Middle East and the London riots have demonstrated the value of crowd sourced audio-visual evidence wherein citizens submit audio-visual footage captured on mobile phones and other devices to aid governmental institutions, responder agencies and law enforcement authorities to confirm the authenticity of incidents and, in the case of criminal activity, to identify perpetrators. The use of such evidence can present a significant logistical challenge to investigators, particularly because of the potential size of data gathered through such mechanisms and the added problems of time-lining disparate sources of evidence and, subsequently, investigating the incident(s). In this paper we explore this problem and, in particular, outline the pressure points for an investigator. We identify and explore a number of particular problems related to the secure receipt of the evidence, imaging, tagging and then time-lining the evidence, and the problem of identifying duplicate and near duplicate items of audio-visual evidence

    The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

    Full text link
    We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization in multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely restricted to videos of people and vehicles in cities, CFC is sourced from a natural-world domain where targets are not easily resolvable and appearance features cannot be easily leveraged for target re-identification. With over half a million annotations in over 1,500 videos sourced from seven different sonar cameras, CFC allows researchers to train MOT and counting algorithms and evaluate generalization performance at unseen test locations. We perform extensive baseline experiments and identify key challenges and opportunities for advancing the state of the art in generalization in MOT and counting.Comment: ECCV 2022. 33 pages, 12 figure

    Improving Digital Record Annotation Capabilities with Open-sourced Ontologies and Crowd-sourced Workers

    Get PDF
    The Museum of the City of New York has undertaken a long-term project to digitize its collection of 1.5 million objects, annotate them with metadata, and make them publicly available via the Internet. At present, Museum staff annotate images using a traditional lexicon assembled from authority sources such as the Library of Congress and the Getty Art and Architecture Thesaurus, but with limited resources the Museum cannot scale to meet its goal of providing the highest levels of accessibility and discoverability of collections to researchers as well as to the general public. This project offers a cost-effective, scalable solution that 1) consolidates the current lexicon with linked open data sources by generating alignments and reconciling semantically equivalent elements, creating a super-set lexicon, and 2) divides the work of annotating into micro-tasks that can be completed by huge labor pools available through crowd-sourced marketplaces

    3-Dimensional Building Details from Aerial Photography for Internet Maps

    Get PDF
    This paper introduces the automated characterization of real estate (real property) for Internet mapping. It proposes a processing framework to achieve this task from vertical aerial photography and associated property information. A demonstration of the feasibility of an automated solution builds on test data from the Austrian City of Graz. Information is extracted from vertical aerial photography and various data products derived from that photography in the form of a true orthophoto, a dense digital surface model and digital terrain model, and a classification of land cover. Maps of cadastral property boundaries aid in defining real properties. Our goal is to develop a table for each property with descriptive numbers about the buildings, their dimensions, number of floors, number of windows, roof shapes, impervious surfaces, garages, sheds, vegetation, presence of a basement floor, and other descriptors of interest for each and every property of a city. From aerial sources, at a pixel size of 10 cm, we show that we have obtained positional accuracies in the range of a single pixel, an accuracy of areas in the 10% range, floor counts at an accuracy of 93% and window counts at 86% accuracy. We also introduce 3D point clouds of facades and their creation from vertical aerial photography, and how these point clouds can support the definition of complex facades

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

    Full text link
    While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at https://www.github.com/richzhang/PerceptualSimilarit

    Enhancement and stylization of photographs

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 89-95).A photograph captured by a digital camera may be the final product for many casual photographers. However, for professional photographers, this photograph is only the beginning: experts often spend hours on enhancing and stylizing their photographs. These enhancements range from basic exposure and contrast adjustments to dramatic alterations. It is these enhancements - along with composition and timing - that distinguish the work of professionals and casual photographers. The goal of this thesis is to narrow the gap between casual and professional photographers. We aim to empower casual users with methods for making their photographs look better. Professional photographers could also benefit from our findings: our enhancement methods produce a better starting point for professional processing. We propose and evaluate three different methods for image enhancement and stylization. First method is based on photographic intuition and is fully automatic. The second method relies on expert's input for training; after the training this method can be used to automatically predict expert adjustments for previously unseen photographs. The third method uses a grammar-based representation to sample the space of image filter and relies on user input to select novel and interesting filters.by Vladimir Leonid Bychkovsky.Ph.D
    • …
    corecore