49,942 research outputs found
How Much of the Web Is Archived?
Although the Internet Archive's Wayback Machine is the largest and most
well-known web archive, there have been a number of public web archives that
have emerged in the last several years. With varying resources, audiences and
collection development policies, these archives have varying levels of overlap
with each other. While individual archives can be measured in terms of number
of URIs, number of copies per URI, and intersection with other archives, to
date there has been no answer to the question "How much of the Web is
archived?" We study the question by approximating the Web using sample URIs
from DMOZ, Delicious, Bitly, and search engine indexes; and, counting the
number of copies of the sample URIs exist in various public web archives. Each
sample set provides its own bias. The results from our sample sets indicate
that range from 35%-90% of the Web has at least one archived copy, 17%-49% has
between 2-5 copies, 1%-8% has 6-10 copies, and 8%-63% has more than 10 copies
in public web archives. The number of URI copies varies as a function of time,
but no more than 31.3% of URIs are archived more than once per month.Comment: This is the long version of the short paper by the same title
published at JCDL'11. 10 pages, 5 figures, 7 tables. Version 2 includes minor
typographical correction
Complex Event Recognition from Images with Few Training Examples
We propose to leverage concept-level representations for complex event
recognition in photographs given limited training examples. We introduce a
novel framework to discover event concept attributes from the web and use that
to extract semantic features from images and classify them into social event
categories with few training examples. Discovered concepts include a variety of
objects, scenes, actions and event sub-types, leading to a discriminative and
compact representation for event images. Web images are obtained for each
discovered event concept and we use (pretrained) CNN features to train concept
classifiers. Extensive experiments on challenging event datasets demonstrate
that our proposed method outperforms several baselines using deep CNN features
directly in classifying images into events with limited training examples. We
also demonstrate that our method achieves the best overall accuracy on a
dataset with unseen event categories using a single training example.Comment: Accepted to Winter Applications of Computer Vision (WACV'17
- …