Search CORE

290 research outputs found

Understanding and supporting information seeking tasks across multiple sources

Author: Amin A.K.
Publication venue
Publication date: 01/01/2010
Field of study

International Migration, Integration and Social Cohesion online publications

Understanding and supporting information seeking tasks across multiple sources

Author: Amin A.K.
Publication venue
Publication date: 01/01/2010
Field of study

International Migration, Integration and Social Cohesion online publications

Development of a machine learning-based model to autonomously estimate web page credibility

Author: Shamsi Amir Mehdi
Publication venue: 'University of Waterloo'
Publication date: 25/08/2020
Field of study

There is a broad range of information available on the Internet, some of which is considered to be more credible than others. People consider different credibility aspects while evaluating the credibility of a web page, however, many web users find it difficult to determine the credibility of all types of web pages. An autonomous system that can analyze different credibility factors extracted from a web page to estimate the page's credibility could help users to make better decisions about the perceived credibility of the web information. This research investigated the applicability of several machine learning approaches to the evaluation of web page credibility. First, six credibility categories were identified from peer-reviewed literature. Then, their related credibility features were investigated and automatically extracted from the web page content, metadata, or external resources. Three sets of features (i.e., automatically extracted credibility features, bag of words features, and combination of both) were used in classification experiments to compare their impact on the autonomous credibility estimation model performance. The Content Credibility Corpus (C3) dataset was used to develop and test the performance of the model developed in this research. XGBoost achieved the best weighted average F1 score for extracted features. In comparison, the Logistic Regression classifier had the best performance when bag of words features was used, and all features together were used as a feature vector. To begin to explore the legitimacy of this approach, a crowdsourcing task was conducted to evaluate how the output of the proposed model aligns with the credibility ratings given by human annotators. Thirty web pages were selected from the C3 dataset to find out how current users' ratings correlate to the ratings that were used as ground truth to train the model. In addition, 30 new web pages were selected to explore how generalizable the algorithm is for classifying new web pages. Participants were asked to rate the credibility of each web page base on a 5-point Likert scale. Sixty-nine crowd-sourced participants evaluated the credibility of the 60 web pages for a total of 600 ratings (10 per page). Spearman's correlation between average credibility scores given by participants and original scores in the C3 dataset indicates a moderate positive correlation: r = 0.44, p < 0.02. A contingency table was created to compare the predicted scores by the model with the rated scores by participants. Overall, the model achieved an accuracy of 80%, which indicates that the proposed model can generalize for new web pages. The model outlined in this thesis outperformed the previous work by using a promising set of features that some of them were presented in this research for the first time

University of Waterloo's Institutional Repository

Recommended from our members

Mining Scholarly Publications for Research Evaluation

Author: Herrmannova Drahomira
Publication venue
Publication date: 25/05/2018
Field of study

Scientific research can lead to breakthroughs that revolutionise society by solving long-standing problems. However, investment of public funds into research requires the ability to clearly demonstrate beneficial returns, accountability, and good management. At the same time, with the amount of scholarly literature rapidly expanding, recognising key research that presents the most important contributions to science is becoming increasingly difficult and time-consuming. This creates a need for effective and appropriate research evaluation methods. However, the question of how to evaluate the quality of research outcomes is very difficult to answer and despite decades of research, there is still no standard solution to this problem. Given this growing need for research evaluation, it is increasingly important to understand how research should be evaluated, and whether the existing methods meet this need. However, the current solutions, which are predominantly based on counting the number of interactions in the scholarly communication network, are insufficient for a number of reasons. In particular, they struggle in capturing many aspects of the academic culture and often significantly lag behind current developments. This work focuses on the evaluation of research publications and aims at creating new methods which utilise publication content. It studies the concept of research publication quality, methods assessing the performance of new research publication evaluation methods, analyses and extends the existing methods, and, most importantly, presents a new class of metrics which are based on publication manuscripts. By bridging the fields of research evaluation and text- and data-mining, this work provides tools for analysing the outcomes of research, and for relieving information overload in scholarly publishing

Open Research Online (The Open University)

Designing for quality in real-world mobile crowdsourcing systems

Author: Othman Mohammad T
Publication venue: Newcastle University
Publication date: 01/01/2021
Field of study

PhD ThesisCrowdsourcing has emerged as a popular means to collect and analyse data on a scale for problems that require human intelligence to resolve. Its prompt response and low cost have made it attractive to businesses and academic institutions. In response, various online crowdsourcing platforms, such as Amazon MTurk, Figure Eight and Prolific have successfully emerged to facilitate the entire crowdsourcing process. However, the quality of results has been a major concern in crowdsourcing literature. Previous work has identified various key factors that contribute to issues of quality and need to be addressed in order to produce high quality results. Crowd tasks design, in particular, is a major key factor that impacts the efficiency and effectiveness of crowd workers as well as the entire crowdsourcing process. This research investigates crowdsourcing task designs to collect and analyse two distinct types of data, and examines the value of creating high-quality crowdwork activities on new crowdsource enabled systems for end-users. The main contribution of this research includes 1) a set of guidelines for designing crowdsourcing tasks that support quality collection, analysis and translation of speech and eye tracking data in real-world scenarios; and 2) Crowdsourcing applications that capture real-world data and coordinate the entire crowdsourcing process to analyse and feed quality results back. Furthermore, this research proposes a new quality control method based on workers trust and self-verification. To achieve this, the research follows the case study approach with a focus on two real-world data collection and analysis case studies. The first case study, Speeching, explores real-world speech data collection, analysis, and feedback for people with speech disorder, particularly with Parkinson’s. The second case study, CrowdEyes, examines the development and use of a hybrid system combined of crowdsourcing and low-cost DIY mobile eye trackers for real-world visual data collection, analysis, and feedback. Both case studies have established the capability of crowdsourcing to obtain high quality responses comparable to that of an expert. The Speeching app, and the provision of feedback in particular were well perceived by the participants. This opens up new opportunities in digital health and wellbeing. Besides, the proposed crowd-powered eye tracker is fully functional under real-world settings. The results showed how this approach outperforms all current state-of-the-art algorithms under all conditions, which opens up the technology for wide variety of eye tracking applications in real-world settings

Newcastle University eTheses

Human resource management for the Cypriot National Guard: a critical analysis

Author: Constantinou C.
Constantinou C.
Publication venue
Publication date: 01/01/2017
Field of study

Middlesex University Research Repository

Index ordering by query-independent measures

Author: Ferguson Paul
Publication venue: Dublin City University. School of Computing
Publication date: 01/07/2006
Field of study

There is an ever-increasing amount of data that is being produced from various data sources — this data must then be organised effectively if we hope to search though it. Traditional information retrieval approaches search through all available data in a particular collection in order to find the most suitable results, however, for particularly large collections this may be extremely time consuming. Our purposed solution to this problem is to only search a limited amount of the collection at query-time, in order to speed this retrieval process up. Although, in doing this we aim to limit the loss in retrieval efficacy (in terms of accuracy of results). The way we aim to do this is to firstly identify the most “important” documents within the collection, and then sort the documents within the collection in order of their "importance” in the collection. In this way we can choose to limit the amount of information to search through, by eliminating the documents of lesser importance, which should not only make the search more efficient, but should also limit any loss in retrieval accuracy. In this thesis we investigate various different query-independent methods that may indicate the importance of a document in a collection. The more accurate the measure is at determining an important document, the more effectively we can eliminate documents from the retrieval process - improving the query-throughput of the system, as well as providing a high level of accuracy in the returned results. The effectiveness of these approaches are evaluated using the datasets provided by the terabyte track at the Text REtreival Conference (TREC)

Irish Universities

DCU Online Research Access Service

Proceedings of the 11th Toulon-Verona International Conference on Quality in Services

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The Toulon-Verona Conference was founded in 1998 by prof. Claudio Baccarani of the University of Verona, Italy, and prof. Michel Weill of the University of Toulon, France. It has been organized each year in a different place in Europe in cooperation with a host university (Toulon 1998, Verona 1999, Derby 2000, Mons 2001, Lisbon 2002, Oviedo 2003, Toulon 2004, Palermo 2005, Paisley 2006, Thessaloniki 2007, Florence, 2008). Originally focusing on higher education institutions, the research themes have over the years been extended to the health sector, local government, tourism, logistics, banking services. Around a hundred delegates from about twenty different countries participate each year and nearly one thousand research papers have been published over the last ten years, making of the conference one of the major events in the field of quality in services

Directory of Open Access Books (DOAB)

Big Data in Management Research. Exploring New Avenues

Author: Lee Colin
Publication venue: Erasmus University Rotterdam (EUR)
Publication date: 11/03/2016
Field of study

EUR Research Repository