4,205 research outputs found
User evaluation of a pilot terminologies server for a distributed multi-scheme environment
The present paper reports on a user-centred evaluation of a pilot terminology service developed as part of the High Level Thesaurus (HILT) project at the Centre for Digital Library Research (CDLR) in the University of Strathclyde in Glasgow. The pilot terminology service was developed as an experimental platform to investigate issues relating to mapping between various subject schemes, namely Dewey Decimal Classification (DDC), Library of Congress Subject Headings (LCSH), the Unesco thesaurus, and the MeSH thesaurus, in order to cater for cross-browsing and cross-searching across distributed digital collections and services. The aim of the evaluation reported here was to investigate users' thought processes, perceptions, and attitudes towards the pilot terminology service and to identify user requirements for developing a full-blown pilot terminology service
adPerf: Characterizing the Performance of Third-party Ads
Monetizing websites and web apps through online advertising is widespread in
the web ecosystem. The online advertising ecosystem nowadays forces publishers
to integrate ads from these third-party domains. On the one hand, this raises
several privacy and security concerns that are actively studied in recent
years. On the other hand, given the ability of today's browsers to load dynamic
web pages with complex animations and Javascript, online advertising has also
transformed and can have a significant impact on webpage performance. The
performance cost of online ads is critical since it eventually impacts user
satisfaction as well as their Internet bill and device energy consumption.
In this paper, we apply an in-depth and first-of-a-kind performance
evaluation of web ads. Unlike prior efforts that rely primarily on adblockers,
we perform a fine-grained analysis on the web browser's page loading process to
demystify the performance cost of web ads. We aim to characterize the cost by
every component of an ad, so the publisher, ad syndicate, and advertiser can
improve the ad's performance with detailed guidance. For this purpose, we
develop an infrastructure, adPerf, for the Chrome browser that classifies page
loading workloads into ad-related and main-content at the granularity of
browser activities (such as Javascript and Layout). Our evaluations show that
online advertising entails more than 15% of browser page loading workload and
approximately 88% of that is spent on JavaScript. We also track the sources and
delivery chain of web ads and analyze performance considering the origin of the
ad contents. We observe that 2 of the well-known third-party ad domains
contribute to 35% of the ads performance cost and surprisingly, top news
websites implicitly include unknown third-party ads which in some cases build
up to more than 37% of the ads performance cost
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
Evaluating the retrieval effectiveness of Web search engines using a representative query sample
Search engine retrieval effectiveness studies are usually small-scale, using
only limited query samples. Furthermore, queries are selected by the
researchers. We address these issues by taking a random representative sample
of 1,000 informational and 1,000 navigational queries from a major German
search engine and comparing Google's and Bing's results based on this sample.
Jurors were found through crowdsourcing, data was collected using specialised
software, the Relevance Assessment Tool (RAT). We found that while Google
outperforms Bing in both query types, the difference in the performance for
informational queries was rather low. However, for navigational queries, Google
found the correct answer in 95.3 per cent of cases whereas Bing only found the
correct answer 76.6 per cent of the time. We conclude that search engine
performance on navigational queries is of great importance, as users in this
case can clearly identify queries that have returned correct results. So,
performance on this query type may contribute to explaining user satisfaction
with search engines
- âŠ