12 research outputs found

    Evaluating Rank-Coherence of Crowd Rating in Customer Satisfaction

    Get PDF
    AbstractCrowd rating is a continuous and public process of data gathering that allows the display of general quantitative opinions on a topic from online anonymous networks as they are crowds. Online platforms leveraged these technologies to improve predictive tasks in marketing. However, we argue for a different employment of crowd rating as a tool of public utility to support social contexts suffering to adverse selection, like tourism. This aim needs to deal with issues in both method of measurement and analysis of data, and with common biases associated to public disclosure of rating information. We propose an evaluative method to investigate fairness of common measures of rating procedures with the peculiar perspective of assessing linearity of the ranked outcomes. This is tested on a longitudinal observational case of 7 years of customer satisfaction ratings, for a total amount of 26.888 reviews. According to the results obtained from the sampled dataset, analysed with the proposed evaluative method, there is a trade-off between loss of (potentially) biased information on ratings and fairness of the resulting rankings. However, computing an ad hoc unbiased ranking case, the ranking outcome through the time-weighted measure is not significantly different from the ad hoc unbiased case

    Chapter Multipoint vs slider: a protocol for experiments

    Get PDF
    Since the broad diffusion of Computer-Assisted survey tools (i.e. web surveys), a lively debate about innovative scales of measure arose among social scientists and practitioners. Implications are relevant for applied Statistics and evaluation research since while traditional scales collect ordinal observations, data from sliders can be interpreted as continuous. Literature, however, report excessive times of completion of the task from sliders in web surveys. This experimental protocol is aimed at testing hypotheses on the accuracy in prediction and dispersion of estimates from anonymous participants who are recruited online and randomly assigned into tasks in recognition of shades of colour. The treatment variable is two scales: a traditional multipoint 0-10 multipoint vs a slider 0-100. Shades have a unique parametrisation (true value) and participants have to guess the true value through the scale. These tasks are designed to recreate situations of uncertainty among participants while minimizing the subjective component of a perceptual assessment and maximizing information about scale-driven differences and biases. We propose to test statistical differences in the treatment variable: (i) mean absolute error from the true value (ii), time of completion of the task. To correct biases due to the variance in the number of completed tasks among participants, data about participants can be collected through both pre-tasks acceptance of web cookies and post-tasks explicit questions

    PROTOCOL: HOW TO CORRECT THE CLASSIFICATION ERROR BY ASKING TO LARGE LANGUAGE MODELS THE SIMILARITY AMONG CATEGORIES

    No full text
    Similarity between two categories is a number between 0 and 1 that abstractally represent how much the two categories overlap, objectively or subjectively. When two categories overlap, the error of classification of one to other is less severe. For example, misclassifying a wolf for dog is a less severe error than misclassifying a wolf for a cat, because wolf are more similar to dogs than cats. Nevertheless, canonical estimation of matrices of similarities for taxonomies of categories is expensive. In this protocol it is suggested why and how to estimate a similarity matrix from one or multiple Large Language Models

    Methodological Collection

    No full text

    Categorical similarity and intercategoriality

    No full text

    Multiversal Methods and Applications

    No full text

    Supplementary files for Characterisation and Calibration of Multiversal Models

    No full text
    Supplementary files for Characterisation and Calibration of Multiversal Models</p

    Chapter Multipoint vs slider: a protocol for experiments

    No full text
    Since the broad diffusion of Computer-Assisted survey tools (i.e. web surveys), a lively debate about innovative scales of measure arose among social scientists and practitioners. Implications are relevant for applied Statistics and evaluation research since while traditional scales collect ordinal observations, data from sliders can be interpreted as continuous. Literature, however, report excessive times of completion of the task from sliders in web surveys. This experimental protocol is aimed at testing hypotheses on the accuracy in prediction and dispersion of estimates from anonymous participants who are recruited online and randomly assigned into tasks in recognition of shades of colour. The treatment variable is two scales: a traditional multipoint 0-10 multipoint vs a slider 0-100. Shades have a unique parametrisation (true value) and participants have to guess the true value through the scale. These tasks are designed to recreate situations of uncertainty among participants while minimizing the subjective component of a perceptual assessment and maximizing information about scale-driven differences and biases. We propose to test statistical differences in the treatment variable: (i) mean absolute error from the true value (ii), time of completion of the task. To correct biases due to the variance in the number of completed tasks among participants, data about participants can be collected through both pre-tasks acceptance of web cookies and post-tasks explicit questions

    The polarising effect of Review Bomb

    Full text link
    This study discusses the Review Bomb, a phenomenon consisting of a massive attack by groups of Internet users on a website that displays users' review on products. It gained attention, especially on websites that aggregate numerical ratings. Although this phenomenon can be considered an example of online misinformation, it differs from conventional spam review, which happens within larger time spans. In particular, the Bomb occurs suddenly and for a short time, because in this way it leverages the notorious problem of cold-start: if reviews are submitted by a lot of fresh new accounts, it makes hard to justify preventative measures. The present research work is focused on the case of The Last of Us Part II, a video game published by Sony, that was the target of the widest phenomenon of Review Bomb, occurred in June 2020. By performing an observational analysis of a linguistic corpus of English reviews and the features of its users, this study confirms that the Bomb was an ideological attack aimed at breaking down the rating system of the platform Metacritic. Evidence supports that the bombing had the unintended consequence to induce a reaction from users, ending into a consistent polarisation of ratings towards extreme values. The results not only display the theory of polarity in online reviews, but them also provide insights for the research on the problem of cold-start detection of spam review. In particular, it illustrates the relevance of detecting users discussing contextual elements instead of the product and users with anomalous features
    corecore