719 research outputs found
Crowd-sourced data and its applications for new algorithms in photographic imaging
This thesis comprises two main themes. The first of these is concerned primarily with the validity and utility of data acquired from web-based psychophysical experiments.
In recent years web-based experiments, and the crowd-sourced data they can deliver, have been rising in popularity among the research community for several key reasons – primarily ease of administration and easy access to a large population of diverse participants. However, the level of control with which traditional experiments are performed, and the severe lack of control we have over web-based alternatives may lead
us to believe that these benefits come at the cost of reliable data. Indeed, the results reported early in this thesis support this assumption. However, we proceed to show that it is entirely possible to crowd-source data that is comparable with lab-based results.
The second theme of the thesis explores the possibilities presented by the use of crowd-sourced data, taking a popular colour naming experiment as an example. After
using the crowd-sourced data to construct a model for computational colour naming, we consider the value of colour names as image descriptors, with particular relevance to illuminant estimation and object indexing. We discover that colour names represent a particularly useful quantisation of colour space, allowing us to construct compact image descriptors for object indexing. We show that these descriptors are somewhat tolerant to errors in illuminant estimation and that their perceptual relevance offers even further utility. We go on to develop a novel algorithm which delivers perceptually-relevant,
illumination-invariant image descriptors based on colour names
The problems and challenges of managing crowd sourced audio-visual evidence
A number of recent incidents, such as the Stanley Cup Riots, the uprisings in the Middle East and the London riots have demonstrated the value of crowd sourced audio-visual evidence wherein citizens submit audio-visual footage captured on mobile phones and other devices to aid governmental institutions, responder agencies and law enforcement authorities to confirm the authenticity of incidents and, in the case of criminal activity, to identify perpetrators. The use of such evidence can present a significant logistical challenge to investigators, particularly because of the potential size of data gathered through such mechanisms and the added problems of time-lining disparate sources of evidence and, subsequently, investigating the incident(s). In this paper we explore this problem and, in particular, outline the pressure points for an investigator. We identify and explore a number of particular problems related to the secure receipt of the evidence, imaging, tagging and then time-lining the evidence, and the problem of identifying duplicate and near duplicate items of audio-visual evidence
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for
detecting, tracking, and counting fish in sonar videos. We identify sonar
videos as a rich source of data for advancing low signal-to-noise computer
vision applications and tackling domain generalization in multiple-object
tracking (MOT) and counting. In comparison to existing MOT and counting
datasets, which are largely restricted to videos of people and vehicles in
cities, CFC is sourced from a natural-world domain where targets are not easily
resolvable and appearance features cannot be easily leveraged for target
re-identification. With over half a million annotations in over 1,500 videos
sourced from seven different sonar cameras, CFC allows researchers to train MOT
and counting algorithms and evaluate generalization performance at unseen test
locations. We perform extensive baseline experiments and identify key
challenges and opportunities for advancing the state of the art in
generalization in MOT and counting.Comment: ECCV 2022. 33 pages, 12 figure
Improving Digital Record Annotation Capabilities with Open-sourced Ontologies and Crowd-sourced Workers
The Museum of the City of New York has undertaken a long-term project to digitize its collection of 1.5 million objects, annotate them with metadata, and make them publicly available via the Internet. At present, Museum staff annotate images using a traditional lexicon assembled from authority sources such as the Library of Congress and the Getty Art and Architecture Thesaurus, but with limited resources the Museum cannot scale to meet its goal of providing the highest levels of accessibility and discoverability of collections to researchers as well as to the general public. This project offers a cost-effective, scalable solution that 1) consolidates the current lexicon with linked open data sources by generating alignments and reconciling semantically equivalent elements, creating a super-set lexicon, and 2) divides the work of annotating into micro-tasks that can be completed by huge labor pools available through crowd-sourced marketplaces
3-Dimensional Building Details from Aerial Photography for Internet Maps
This paper introduces the automated characterization of real estate (real property) for Internet mapping. It proposes a processing framework to achieve this task from vertical aerial photography and associated property information. A demonstration of the feasibility of an automated solution builds on test data from the Austrian City of Graz. Information is extracted from vertical aerial photography and various data products derived from that photography in the form of a true orthophoto, a dense digital surface model and digital terrain model, and a classification of land cover. Maps of cadastral property boundaries aid in defining real properties. Our goal is to develop a table for each property with descriptive numbers about the buildings, their dimensions, number of floors, number of windows, roof shapes, impervious surfaces, garages, sheds, vegetation, presence of a basement floor, and other descriptors of interest for each and every property of a city. From aerial sources, at a pixel size of 10 cm, we show that we have obtained positional accuracies in the range of a single pixel, an accuracy of areas in the 10% range, floor counts at an accuracy of 93% and window counts at 86% accuracy. We also introduce 3D point clouds of facades and their creation from vertical aerial photography, and how these point clouds can support the definition of complex facades
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
While it is nearly effortless for humans to quickly assess the perceptual
similarity between two images, the underlying processes are thought to be quite
complex. Despite this, the most widely used perceptual metrics today, such as
PSNR and SSIM, are simple, shallow functions, and fail to account for many
nuances of human perception. Recently, the deep learning community has found
that features of the VGG network trained on ImageNet classification has been
remarkably useful as a training loss for image synthesis. But how perceptual
are these so-called "perceptual losses"? What elements are critical for their
success? To answer these questions, we introduce a new dataset of human
perceptual similarity judgments. We systematically evaluate deep features
across different architectures and tasks and compare them with classic metrics.
We find that deep features outperform all previous metrics by large margins on
our dataset. More surprisingly, this result is not restricted to
ImageNet-trained VGG features, but holds across different deep architectures
and levels of supervision (supervised, self-supervised, or even unsupervised).
Our results suggest that perceptual similarity is an emergent property shared
across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at
https://www.github.com/richzhang/PerceptualSimilarit
Recommended from our members
Visual Analytic Extraction of Meaning from Photo-Sharing Services for Leisure Pedestrian Routing
Present-day routing services are able to provide travel directions for users of all modes of transport. Most of them are focusing on functional journeys (i.e. journeys linking given origin and destination with minimum cost) and pay less attention to recreational trips, in particular leisure walks in an urban context. These walks have predefined time or distance and as their purpose is the process of walking itself, the attractiveness of chosen paths starts playing an important role in route selection. Conventional map data that are informing routing algorithms cannot be used for extracting street attractiveness as they do not contain a subjective component, or in other words, do not tell whether or not people enjoy their presence at a particular place. Recent research demonstrates that the crowd-sourced data available from the photo- sharing websites have a potential for being a good source of this measure, thus becoming a base for a routing system that suggests attractive leisure walks.
This PhD research looks at existing projects, which aim to utilize user-generated photographic data for journey planning, and suggests new techniques that make the estimation of street attractiveness based on this source more reliable. First, we determine the artifacts in photo- graphic datasets that may negatively impact the resulting attractiveness scores. Based on the findings, we suggest filtering methods that improve the compliance of the spatial distributions of photographs with the chosen purpose. Second, we discuss several approaches of assigning attractiveness scores to street segments and make conclusions about their differences. Finally, we experiment with the routing itself and develop a prototype system that suggests leisure walks through attractive streets in an urban area. The experiments we perform cover Central London and involve four photographic sources: Flickr, Geograph, Panoramio and Picasa.
A Visual analytic (VA) approach is used throughout the work to glean new insights. Being able to combine computation and the analytical capabilities of the human brain, this research method has proven to work well with complex data structures in a variety of tasks. The thesis contributes to VA as an example of what can be achieved by means of the visual exploration of data
Enhancement and stylization of photographs
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 89-95).A photograph captured by a digital camera may be the final product for many casual photographers. However, for professional photographers, this photograph is only the beginning: experts often spend hours on enhancing and stylizing their photographs. These enhancements range from basic exposure and contrast adjustments to dramatic alterations. It is these enhancements - along with composition and timing - that distinguish the work of professionals and casual photographers. The goal of this thesis is to narrow the gap between casual and professional photographers. We aim to empower casual users with methods for making their photographs look better. Professional photographers could also benefit from our findings: our enhancement methods produce a better starting point for professional processing. We propose and evaluate three different methods for image enhancement and stylization. First method is based on photographic intuition and is fully automatic. The second method relies on expert's input for training; after the training this method can be used to automatically predict expert adjustments for previously unseen photographs. The third method uses a grammar-based representation to sample the space of image filter and relies on user input to select novel and interesting filters.by Vladimir Leonid Bychkovsky.Ph.D
- …