290 research outputs found

    Development of a machine learning-based model to autonomously estimate web page credibility

    Get PDF
    There is a broad range of information available on the Internet, some of which is considered to be more credible than others. People consider different credibility aspects while evaluating the credibility of a web page, however, many web users find it difficult to determine the credibility of all types of web pages. An autonomous system that can analyze different credibility factors extracted from a web page to estimate the page's credibility could help users to make better decisions about the perceived credibility of the web information. This research investigated the applicability of several machine learning approaches to the evaluation of web page credibility. First, six credibility categories were identified from peer-reviewed literature. Then, their related credibility features were investigated and automatically extracted from the web page content, metadata, or external resources. Three sets of features (i.e., automatically extracted credibility features, bag of words features, and combination of both) were used in classification experiments to compare their impact on the autonomous credibility estimation model performance. The Content Credibility Corpus (C3) dataset was used to develop and test the performance of the model developed in this research. XGBoost achieved the best weighted average F1 score for extracted features. In comparison, the Logistic Regression classifier had the best performance when bag of words features was used, and all features together were used as a feature vector. To begin to explore the legitimacy of this approach, a crowdsourcing task was conducted to evaluate how the output of the proposed model aligns with the credibility ratings given by human annotators. Thirty web pages were selected from the C3 dataset to find out how current users' ratings correlate to the ratings that were used as ground truth to train the model. In addition, 30 new web pages were selected to explore how generalizable the algorithm is for classifying new web pages. Participants were asked to rate the credibility of each web page base on a 5-point Likert scale. Sixty-nine crowd-sourced participants evaluated the credibility of the 60 web pages for a total of 600 ratings (10 per page). Spearman's correlation between average credibility scores given by participants and original scores in the C3 dataset indicates a moderate positive correlation: r = 0.44, p < 0.02. A contingency table was created to compare the predicted scores by the model with the rated scores by participants. Overall, the model achieved an accuracy of 80%, which indicates that the proposed model can generalize for new web pages. The model outlined in this thesis outperformed the previous work by using a promising set of features that some of them were presented in this research for the first time

    Designing for quality in real-world mobile crowdsourcing systems

    Get PDF
    PhD ThesisCrowdsourcing has emerged as a popular means to collect and analyse data on a scale for problems that require human intelligence to resolve. Its prompt response and low cost have made it attractive to businesses and academic institutions. In response, various online crowdsourcing platforms, such as Amazon MTurk, Figure Eight and Prolific have successfully emerged to facilitate the entire crowdsourcing process. However, the quality of results has been a major concern in crowdsourcing literature. Previous work has identified various key factors that contribute to issues of quality and need to be addressed in order to produce high quality results. Crowd tasks design, in particular, is a major key factor that impacts the efficiency and effectiveness of crowd workers as well as the entire crowdsourcing process. This research investigates crowdsourcing task designs to collect and analyse two distinct types of data, and examines the value of creating high-quality crowdwork activities on new crowdsource enabled systems for end-users. The main contribution of this research includes 1) a set of guidelines for designing crowdsourcing tasks that support quality collection, analysis and translation of speech and eye tracking data in real-world scenarios; and 2) Crowdsourcing applications that capture real-world data and coordinate the entire crowdsourcing process to analyse and feed quality results back. Furthermore, this research proposes a new quality control method based on workers trust and self-verification. To achieve this, the research follows the case study approach with a focus on two real-world data collection and analysis case studies. The first case study, Speeching, explores real-world speech data collection, analysis, and feedback for people with speech disorder, particularly with Parkinsonā€™s. The second case study, CrowdEyes, examines the development and use of a hybrid system combined of crowdsourcing and low-cost DIY mobile eye trackers for real-world visual data collection, analysis, and feedback. Both case studies have established the capability of crowdsourcing to obtain high quality responses comparable to that of an expert. The Speeching app, and the provision of feedback in particular were well perceived by the participants. This opens up new opportunities in digital health and wellbeing. Besides, the proposed crowd-powered eye tracker is fully functional under real-world settings. The results showed how this approach outperforms all current state-of-the-art algorithms under all conditions, which opens up the technology for wide variety of eye tracking applications in real-world settings

    Index ordering by query-independent measures

    Get PDF
    There is an ever-increasing amount of data that is being produced from various data sources ā€” this data must then be organised effectively if we hope to search though it. Traditional information retrieval approaches search through all available data in a particular collection in order to find the most suitable results, however, for particularly large collections this may be extremely time consuming. Our purposed solution to this problem is to only search a limited amount of the collection at query-time, in order to speed this retrieval process up. Although, in doing this we aim to limit the loss in retrieval efficacy (in terms of accuracy of results). The way we aim to do this is to firstly identify the most ā€œimportantā€ documents within the collection, and then sort the documents within the collection in order of their "importanceā€ in the collection. In this way we can choose to limit the amount of information to search through, by eliminating the documents of lesser importance, which should not only make the search more efficient, but should also limit any loss in retrieval accuracy. In this thesis we investigate various different query-independent methods that may indicate the importance of a document in a collection. The more accurate the measure is at determining an important document, the more effectively we can eliminate documents from the retrieval process - improving the query-throughput of the system, as well as providing a high level of accuracy in the returned results. The effectiveness of these approaches are evaluated using the datasets provided by the terabyte track at the Text REtreival Conference (TREC)

    Proceedings of the 11th Toulon-Verona International Conference on Quality in Services

    Get PDF
    The Toulon-Verona Conference was founded in 1998 by prof. Claudio Baccarani of the University of Verona, Italy, and prof. Michel Weill of the University of Toulon, France. It has been organized each year in a different place in Europe in cooperation with a host university (Toulon 1998, Verona 1999, Derby 2000, Mons 2001, Lisbon 2002, Oviedo 2003, Toulon 2004, Palermo 2005, Paisley 2006, Thessaloniki 2007, Florence, 2008). Originally focusing on higher education institutions, the research themes have over the years been extended to the health sector, local government, tourism, logistics, banking services. Around a hundred delegates from about twenty different countries participate each year and nearly one thousand research papers have been published over the last ten years, making of the conference one of the major events in the field of quality in services

    Big Data in Management Research. Exploring New Avenues

    Get PDF
    • ā€¦
    corecore