100 research outputs found
How Much of the Web Is Archived?
Although the Internet Archive's Wayback Machine is the largest and most
well-known web archive, there have been a number of public web archives that
have emerged in the last several years. With varying resources, audiences and
collection development policies, these archives have varying levels of overlap
with each other. While individual archives can be measured in terms of number
of URIs, number of copies per URI, and intersection with other archives, to
date there has been no answer to the question "How much of the Web is
archived?" We study the question by approximating the Web using sample URIs
from DMOZ, Delicious, Bitly, and search engine indexes; and, counting the
number of copies of the sample URIs exist in various public web archives. Each
sample set provides its own bias. The results from our sample sets indicate
that range from 35%-90% of the Web has at least one archived copy, 17%-49% has
between 2-5 copies, 1%-8% has 6-10 copies, and 8%-63% has more than 10 copies
in public web archives. The number of URI copies varies as a function of time,
but no more than 31.3% of URIs are archived more than once per month.Comment: This is the long version of the short paper by the same title
published at JCDL'11. 10 pages, 5 figures, 7 tables. Version 2 includes minor
typographical correction
Consuming Linked Closed Data
The growth of the Linked Data corpus will eventually pre- vent all but the most determined of consumers from including every Linked Dataset in a single undertaking. In addition, we anticipate that the need for effective revenue models for Linked Data publishing will spur the rise of Linked Closed Data, where access to datasets is restricted. We argue that these impeding changes necessitate an overhaul of our current practices for consuming Linked Data. To this end, we propose a model for consuming Linked Data, built on the notion of continuous Information Quality assessment, which brings together a range of existing research and highlights a number of avenues for future work
Using ontology engineering for understanding needs and allocating resources in web-based industrial virtual collaboration systems
In many interactions in cross-industrial and inter-industrial collaboration, analysis and understanding of relative specialist and non-specialist language is one of the most pressing challenges when trying to build multi-party, multi-disciplinary collaboration system. Hence, identifying the scope of the language used and then understanding the relationships between the language entities are key problems. In computer science, ontologies are used to provide a common vocabulary for a domain of interest together with descriptions of the meaning of terms and relationships between them, like in an encyclopedia. These, however, often lack the fuzziness required for human orientated systems. This paper uses an engineering sector business collaboration system (www.wmccm.co.uk) as a case study to illustrate the issues. The purpose of this paper is to introduce a novel ontology engineering methodology, which generates structurally enriched cross domain ontologies economically, quickly and reliably. A semantic relationship analysis of the Google Search Engine Index was devised and evaluated. Using Semantic analysis seems to generate a viable list of subject terms. A social network analysis of the semantically derived terms was conducted to generate a decision support network with rich relationships between terms. The derived ontology was quicker to generate, provided richer internal relationships and relied far less on expert contribution. More importantly, it improved the collaboration matching capability of WMCCM
- …