10,818 research outputs found
Automatic tagging and geotagging in video collections and communities
Automatically generated tags and geotags hold great promise
to improve access to video collections and online communi-
ties. We overview three tasks offered in the MediaEval 2010
benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features
Living Knowledge
Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following
Fine-Grained Car Detection for Visual Census Estimation
Targeted socioeconomic policies require an accurate understanding of a
country's demographic makeup. To that end, the United States spends more than 1
billion dollars a year gathering census data such as race, gender, education,
occupation and unemployment rates. Compared to the traditional method of
collecting surveys across many years which is costly and labor intensive,
data-driven, machine learning driven approaches are cheaper and faster--with
the potential ability to detect trends in close to real time. In this work, we
leverage the ubiquity of Google Street View images and develop a computer
vision pipeline to predict income, per capita carbon emission, crime rates and
other city attributes from a single source of publicly available visual data.
We first detect cars in 50 million images across 200 of the largest US cities
and train a model to predict demographic attributes using the detected cars. To
facilitate our work, we have collected the largest and most challenging
fine-grained dataset reported to date consisting of over 2600 classes of cars
comprised of images from Google Street View and other web sources, classified
by car experts to account for even the most subtle of visual differences. We
use this data to construct the largest scale fine-grained detection system
reported to date. Our prediction results correlate well with ground truth
income data (r=0.82), Massachusetts department of vehicle registration, and
sources investigating crime rates, income segregation, per capita carbon
emission, and other market research. Finally, we learn interesting
relationships between cars and neighborhoods allowing us to perform the first
large scale sociological analysis of cities using computer vision techniques.Comment: AAAI 201
- …