Search CORE

268 research outputs found

Effective web crawlers

Author: Ali H
Publication venue: RMIT University
Publication date: 01/01/2008
Field of study

Web crawlers are the component of a search engine that must traverse the Web, gathering documents in a local repository for indexing by a search engine so that they can be ranked by their relevance to user queries. Whenever data is replicated in an autonomously updated environment, there are issues with maintaining up-to-date copies of documents. When documents are retrieved by a crawler and have subsequently been altered on the Web, the effect is an inconsistency in user search results. While the impact depends on the type and volume of change, many existing algorithms do not take the degree of change into consideration, instead using simple measures that consider any change as significant. Furthermore, many crawler evaluation metrics do not consider index freshness or the amount of impact that crawling algorithms have on user results. Most of the existing work makes assumptions about the change rate of documents on the Web, or relies on the availability of a long history of change. Our work investigates approaches to improving index consistency: detecting meaningful change, measuring the impact of a crawl on collection freshness from a user perspective, developing a framework for evaluating crawler performance, determining the effectiveness of stateless crawl ordering schemes, and proposing and evaluating the effectiveness of a dynamic crawl approach. Our work is concerned specifically with cases where there is little or no past change statistics with which predictions can be made. Our work analyses different measures of change and introduces a novel approach to measuring the impact of recrawl schemes on search engine users. Our schemes detect important changes that affect user results. Other well-known and widely used schemes have to retrieve around twice the data to achieve the same effectiveness as our schemes. Furthermore, while many studies have assumed that the Web changes according to a model, our experimental results are based on real web documents. We analyse various stateless crawl ordering schemes that have no past change statistics with which to predict which documents will change, none of which, to our knowledge, has been tested to determine effectiveness in crawling changed documents. We empirically show that the effectiveness of these schemes depends on the topology and dynamics of the domain crawled and that no one static crawl ordering scheme can effectively maintain freshness, motivating our work on dynamic approaches. We present our novel approach to maintaining freshness, which uses the anchor text linking documents to determine the likelihood of a document changing, based on statistics gathered during the current crawl. We show that this scheme is highly effective when combined with existing stateless schemes. When we combine our scheme with PageRank, our approach allows the crawler to improve both freshness and quality of a collection. Our scheme improves freshness regardless of which stateless scheme it is used in conjunction with, since it uses both positive and negative reinforcement to determine which document to retrieve. Finally, we present the design and implementation of Lara, our own distributed crawler, which we used to develop our testbed

RMIT Research Repository

Just-in-time" generation of datasets by considering structured representations of given consent for GDPR compliance

Author: Debruyne Christophe
Lewis Dave
O'Sullivan Declan
Pandit Harshvardhan J.
Publication venue: Springer
Publication date: 01/01/2020
Field of study

peer reviewe

ZENODO

Open Repository and Bibliography - Liège

International Union of Theoretical and Applied Mechanics : report 2003

Author: Campen van, D.H.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2003
Field of study

Repository TU/e

Pure OAI Repository

Sentiment Analysis: An Overview from Linguistics

Author: Taboada Maite
Publication venue
Publication date: 01/01/2016
Field of study

Sentiment analysis is a growing field at the intersection of linguistics and computer science, which attempts to automatically determine the sentiment, or positive/negative opinion, contained in text. Sentiment can be characterized as positive or negative evaluation expressed through language. Common applications of sentiment analysis include the automatic determination of whether a review posted online (of a movie, a book, or a consumer product) is positive or negative towards the item being reviewed. Sentiment analysis is now a common tool in the repertoire of social media analysis carried out by companies, marketers and political analysts. Research on sentiment analysis extracts information from positive and negative words in text, from the context of those words, and the linguistic structure of the text. This brief survey examines in particular the contributions that linguistic knowledge can make to the problem of automatically determining sentiment

Simon Fraser University Institutional Repository

Developing a distributed electronic health-record store for India

Author: Dowling Jim
Publication venue
Publication date: 01/01/2008
Field of study

The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Third international workshop on Authoring of adaptive and adaptable educational hypermedia (A3EH), Amsterdam, 18-22 July, 2005

Author
Publication venue: Universiteit van Amsterdam
Publication date: 01/01/2005
Field of study

The A3EH follows a successful series of workshops on Adaptive and Adaptable Educational Hypermedia. This workshop focuses on models, design and authoring of AEH, on assessment of AEH, conversion between AEH and evaluation of AEH. The workshop has paper presentations, poster session and panel discussions

Pure OAI Repository

Third international workshop on Authoring of adaptive and adaptable educational hypermedia (A3EH), Amsterdam, 18-22 July, 2005

Author
Publication venue: Universiteit van Amsterdam
Publication date: 01/01/2005
Field of study

Pure OAI Repository

DETUROPE 2019

Author
Publication venue: Univ. of South Bohemia in České Budějovice, Faculty of Economics, Czech Republic; Hungarian Univ. of Agriculture and Life Sciences, Georgikon Campus Keszthely, Hungary and Regional Science Association of Subotica, Serbia
Publication date: 01/01/2019
Field of study

REAL-J

Recommended from our members

Harvesting online ontologies for ontology evolution

Author: Zablith Fouad
Publication venue
Publication date: 01/01/2011
Field of study

Ontologies need to evolve to keep their domain representation adequate. However, the process of identifying new domain changes, and applying them to the ontology is tedious and time-consuming. Our hypothesis is that online ontologies can provide background knowledge to decrease user efforts during ontology evolution, by integrating new domain concepts through automated relation discovery and relevance assessment techniques, while resulting in ontologies of similar qualities to when the ontology engineers' knowledge is solely used. We propose, implement and evaluate solutions that exploit the conceptual connections and structure of online ontologies to first, automatically suggest new additions to the ontology in the form of concepts derived from domain data, and their corresponding connections to existing elements in the ontology; and second, to automatically evaluate the proposed changes in terms of relevance with respect to the ontology under evolution, by relying on a novel pattern-based technique for relevance assessment. We also present in this thesis various experiments to test the feasibility of each proposed approach separately, in addition to an overall evaluation that validates our hypothesis that user time during evolution is indeed decreased through the use of online ontologies, with comparable results to a fully manual ontology evolution

Open Research Online (The Open University)

OpenGrey Repository