Search CORE

49,942 research outputs found

Discovering the representative of a search engine

Author: Adrain Santoso
Clement Yu
King-Lup Liu
Weiyi Meng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

How Much of the Web Is Archived?

Author: Ainsworth Scott G.
AlSum Ahmed
Nelson Michael L.
SalahEldeen Hany
Weigle Michele C.
Publication venue
Publication date: 05/01/2013
Field of study

Although the Internet Archive's Wayback Machine is the largest and most well-known web archive, there have been a number of public web archives that have emerged in the last several years. With varying resources, audiences and collection development policies, these archives have varying levels of overlap with each other. While individual archives can be measured in terms of number of URIs, number of copies per URI, and intersection with other archives, to date there has been no answer to the question "How much of the Web is archived?" We study the question by approximating the Web using sample URIs from DMOZ, Delicious, Bitly, and search engine indexes; and, counting the number of copies of the sample URIs exist in various public web archives. Each sample set provides its own bias. The results from our sample sets indicate that range from 35%-90% of the Web has at least one archived copy, 17%-49% has between 2-5 copies, 1%-8% has 6-10 copies, and 8%-63% has more than 10 copies in public web archives. The number of URI copies varies as a function of time, but no more than 31.3% of URIs are archived more than once per month.Comment: This is the long version of the short paper by the same title published at JCDL'11. 10 pages, 5 figures, 7 tables. Version 2 includes minor typographical correction

arXiv.org e-Print Archive

CiteSeerX

Complex Event Recognition from Images with Few Training Examples

Author: Ahsan Unaiza
Essa Irfan
Hays James
Sun Chen
Publication venue
Publication date: 17/01/2017
Field of study

We propose to leverage concept-level representations for complex event recognition in photographs given limited training examples. We introduce a novel framework to discover event concept attributes from the web and use that to extract semantic features from images and classify them into social event categories with few training examples. Discovered concepts include a variety of objects, scenes, actions and event sub-types, leading to a discriminative and compact representation for event images. Web images are obtained for each discovered event concept and we use (pretrained) CNN features to train concept classifiers. Extensive experiments on challenging event datasets demonstrate that our proposed method outperforms several baselines using deep CNN features directly in classifying images into events with limited training examples. We also demonstrate that our method achieves the best overall accuracy on a dataset with unseen event categories using a single training example.Comment: Accepted to Winter Applications of Computer Vision (WACV'17

arXiv.org e-Print Archive

Crossref