13,892 research outputs found

    Methodologies for the Automatic Location of Academic and Educational Texts on the Internet

    Get PDF
    Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis. This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined

    archivist: An R Package for Managing, Recording and Restoring Data Analysis Results

    Get PDF
    Everything that exists in R is an object [Chambers2016]. This article examines what would be possible if we kept copies of all R objects that have ever been created. Not only objects but also their properties, meta-data, relations with other objects and information about context in which they were created. We introduce archivist, an R package designed to improve the management of results of data analysis. Key functionalities of this package include: (i) management of local and remote repositories which contain R objects and their meta-data (objects' properties and relations between them); (ii) archiving R objects to repositories; (iii) sharing and retrieving objects (and it's pedigree) by their unique hooks; (iv) searching for objects with specific properties or relations to other objects; (v) verification of object's identity and context of it's creation. The presented archivist package extends, in a combination with packages such as knitr and Sweave, the reproducible research paradigm by creating new ways to retrieve and validate previously calculated objects. These new features give a variety of opportunities such as: sharing R objects within reports or articles; adding hooks to R objects in table or figure captions; interactive exploration of object repositories; caching function calls with their results; retrieving object's pedigree (information about how the object was created); automated tracking of the performance of considered models, restoring R libraries to the state in which object was archived.Comment: Submitted to JSS in 2015, conditionally accepte

    Second-Level Digital Divide: Mapping Differences in People's Online Skills

    Full text link
    Much of the existing approach to the digital divide suffers from an important limitation. It is based on a binary classification of Internet use by only considering whether someone is or is not an Internet user. To remedy this shortcoming, this project looks at the differences in people's level of skill with respect to finding information online. Findings suggest that people search for content in a myriad of ways and there is a large variance in how long people take to find various types of information online. Data are collected to see how user demographics, users' social support networks, people's experience with the medium, and their autonomy of use influence their level of user sophistication.Comment: 29th TPRC Conference, 200

    Computing the entropy of user navigation in the web

    Get PDF
    Navigation through the web, colloquially known as "surfing", is one of the main activities of users during web interaction. When users follow a navigation trail they often tend to get disoriented in terms of the goals of their original query and thus the discovery of typical user trails could be useful in providing navigation assistance. Herein, we give a theoretical underpinning of user navigation in terms of the entropy of an underlying Markov chain modelling the web topology. We present a novel method for online incremental computation of the entropy and a large deviation result regarding the length of a trail to realize the said entropy. We provide an error analysis for our estimation of the entropy in terms of the divergence between the empirical and actual probabilities. We then indicate applications of our algorithm in the area of web data mining. Finally, we present an extension of our technique to higher-order Markov chains by a suitable reduction of a higher-order Markov chain model to a first-order one
    corecore