6,517 research outputs found
Methodologies for the Automatic Location of Academic and Educational Texts on the Internet
Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis.
This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined
Methodologies for the Automatic Location of Academic and Educational Texts on the Internet
Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis.
This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined
An Evaluation of Link Neighborhood Lexical Signatures to Rediscover Missing Web Pages
For discovering the new URI of a missing web page, lexical signatures, which
consist of a small number of words chosen to represent the "aboutness" of a
page, have been previously proposed. However, prior methods relied on computing
the lexical signature before the page was lost, or using cached or archived
versions of the page to calculate a lexical signature. We demonstrate a system
of constructing a lexical signature for a page from its link neighborhood, that
is the "backlinks", or pages that link to the missing page. After testing
various methods, we show that one can construct a lexical signature for a
missing web page using only ten backlink pages. Further, we show that only the
first level of backlinks are useful in this effort. The text that the backlinks
use to point to the missing page is used as input for the creation of a
four-word lexical signature. That lexical signature is shown to successfully
find the target URI in over half of the test cases.Comment: 24 pages, 13 figures, 8 tables, technical repor
What is the problem to which interactive multimedia is the solution?
This is something of an unusual paper. It serves as both the reason for and the result of a small number of leading academics in the field, coming together to focus on the question that serves as the title to this paper: What is the problem to which interactive multimedia is the solution? Each of the authors addresses this question from their own viewpoint, offering informed insights into the development, implementation and evaluation of multimedia. The result of their collective work was also the focus of a Western Australian Institute of Educational Research seminar, convened at Edith Cowan University on 18 October, 1994.
The question posed is deliberately rhetorical - it is asked to allow those represented here to consider what they think are the significant issues in the fast-growing field of multimedia. More directly, the question is also asked here because nobody else has considered it worth asking: for many multimedia is done because it is technically possible, not because it offers anything that is of value or provides the solution to a particular problem.
The question, then, is answered in various ways by each of the authors involved and each, in their own way, consider a range of fundamental issues concerning the nature, place and use of multimedia - both in education and in society generally. By way of an introduction, the following provides a unifying context for the various contributions made here
HTTP Mailbox - Asynchronous RESTful Communication
We describe HTTP Mailbox, a mechanism to enable RESTful HTTP communication in
an asynchronous mode with a full range of HTTP methods otherwise unavailable to
standard clients and servers. HTTP Mailbox allows for broadcast and multicast
semantics via HTTP. We evaluate a reference implementation using ApacheBench (a
server stress testing tool) demonstrating high throughput (on 1,000 concurrent
requests) and a systemic error rate of 0.01%. Finally, we demonstrate our HTTP
Mailbox implementation in a human assisted web preservation application called
"Preserve Me".Comment: 13 pages, 6 figures, 8 code blocks, 3 equations, and 3 table
Characterising Web Site Link Structure
The topological structures of the Internet and the Web have received
considerable attention. However, there has been little research on the
topological properties of individual web sites. In this paper, we consider
whether web sites (as opposed to the entire Web) exhibit structural
similarities. To do so, we exhaustively crawled 18 web sites as diverse as
governmental departments, commercial companies and university departments in
different countries. These web sites consisted of as little as a few thousand
pages to millions of pages. Statistical analysis of these 18 sites revealed
that the internal link structure of the web sites are significantly different
when measured with first and second-order topological properties, i.e.
properties based on the connectivity of an individual or a pairs of nodes.
However, examination of a third-order topological property that consider the
connectivity between three nodes that form a triangle, revealed a strong
correspondence across web sites, suggestive of an invariant. Comparison with
the Web, the AS Internet, and a citation network, showed that this third-order
property is not shared across other types of networks. Nor is the property
exhibited in generative network models such as that of Barabasi and Albert.Comment: To appear at IEEE/WSE0
- …