    Identifying fake Amazon reviews as learning from crowds

    Customers who buy products such as books online often rely on other customers reviews more than on reviews found on specialist magazines. Unfortunately the confidence in such reviews is often misplaced due to the explosion of so-called sock puppetry-Authors writing glowing reviews of their own books. Identifying such deceptive reviews is not easy. The first contribution of our work is the creation of a collection including a number of genuinely deceptive Amazon book reviews in collaboration with crime writer Jeremy Duns, who has devoted a great deal of effort in unmasking sock puppeting among his colleagues. But there can be no certainty concerning the other reviews in the collection: All we have is a number of cues, also developed in collaboration with Duns, suggesting that a review may be genuine or deceptive. Thus this corpus is an example of a collection where it is not possible to acquire the actual label for all instances, and where clues of deception were treated as annotators who assign them heuristic labels. A number of approaches have been proposed for such cases; we adopt here the 'learning from crowds' approach proposed by Raykar et al. (2010). Thanks to Duns' certainly fake reviews, the second contribution of this work consists in the evaluation of the effectiveness of different methods of annotation, according to the performance of models trained to detect deceptive reviews. © 2014 Association for Computational Linguistics

    Fake Opinion Detection: How Similar are Crowdsourced Datasets to Real Data?

    [EN] Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is difficult, because normally it is not possible to know whether reviews are genuine. A common workaround involves collecting (supposedly) truthful reviews online and adding them to a set of deceptive reviews obtained through crowdsourcing services. Models trained this way are generally successful at discriminating between `genuine¿ online reviews and the crowdsourced deceptive reviews. It has been argued that the deceptive reviews obtained via crowdsourcing are very different from real fake reviews, but the claim has never been properly tested. In this paper, we compare (false) crowdsourced reviews with a set of `real¿ fake reviews published on line. We evaluate their degree of similarity and their usefulness in training models for the detection of untrustworthy reviews. We find that the deceptive reviews collected via crowdsourcing are significantly different from the fake reviews published online. In the case of the artificially produced deceptive texts, it turns out that their domain similarity with the targets affects the models¿ performance, much more than their untruthfulness. This suggests that the use of crowdsourced datasets for opinion spam detection may not result in models applicable to the real task of detecting deceptive reviews. As an alternative method to create large-size datasets for the fake reviews detection task, we propose methods based on the probabilistic annotation of unlabeled texts, relying on the use of meta-information generally available on the e-commerce sites. Such methods are independent from the content of the reviews and allow to train reliable models for the detection of fake reviews.Leticia Cagnina thanks CONICET for the continued financial support.     Luni, Lucca e l’Appennino nel Medioevo: ospedali e strade tra città e montagna

    L’uomo medievale era per eccellenza un homo viator. Tuttavia se le ragioni che mettevano l’uomo in cammino sono sostanzialmente individuabili nelle tre fondamentali attività del pellegrinaggio religioso, della mercatura e della spedizione militare, dobbiamo ammettere che al di fuori di queste tre categorie ci sono i molti che si spostavano quotidianamente, saltuariamente o anche poche volte nella vita, per varie esigenze concrete, ci sono i pastori nella transumanza (verticale e orizzontale) e nel pascolo vagante, i fedeli verso luoghi di culto locali, i contrabbandieri, gli emarginati. Una complessità che sbaglieremmo nel continuare a sottostimare: le strade sono solo uno dei modi per muoversi, e la rete stradale rintracciabile (ovvero solitamente le direttrici più importanti) esclude molti luoghi di spostamento, dove circolano persone, animali, oggetti, idee. In questo contributo, dunque, cercheremo di praticare un’archeologia della mobilità, soprattutto nella consapevolezza che gli elementi da noi analizzati, ospedali e in seconda battuta monasteri, rappresentano solo un tassello, da inserire in una riflessione più ampia e più articolata. Luni e Lucca sono due città medievali segnate profondamente dalla vicinanza dell’Appennino, delle sue strade e dei suoi valichi. Situate entrambe allo sbocco di valli fluviali che penetrano a fondo la catena e che costituiscono naturali direttrici da e per l’Italia settentrionale, data la loro vicinanza hanno intessuto nei secoli un intenso rapporto di reciprocità “stradale”, rappresentando, con le loro specificità, due nodi stradali di grande rilevanza: Luni anche per la sua ubicazione marittima, Lucca per la sua funzione di collettore di vie terrestri e fluviali. L’ambito geografico è quindi dato dai territori delle due città nel Medioevo, che abbiamo identificato con l’estensione delle diocesi, togliendo nel caso di Lucca le enclave meridionali situate a sud dell’Arno. L’ambito cronologico è invece esteso fino a tutto il XIII secolo, con lo scopo precipuo di comprendere un periodo nel quale le fonti scritte ci restituiscono più compiutamente il fenomeno ospedaliero

    Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning

    Supervised learning assumes that a ground truth label exists. However, the reliability of this ground truth depends on human annotators, who often disagree. Prior work has shown that this disagreement can be helpful in training models. We propose a novel method to incorporate this disagreement as information: in addition to the standard error computation, we use soft labels (i.e., probability distributions over the annotator labels) as an auxiliary task in a multi-task neural network. We measure the divergence between the predictions and the target soft labels with several loss-functions and evaluate the models on various NLP tasks. We find that the soft-label prediction auxiliary task reduces the penalty for errors on ambiguous entities and thereby mitigates overfitting. It significantly improves performance across tasks beyond the standard approach and prior work

    Short report: Cysticercosis in an Egyptian mummy of the late Ptolemaic period

    Abstract We describe here an ancient case of cysticercosis that was discovered in an Egyptian mummy of a young woman of about 20 years of age who lived in the late Ptolemaic period (second to first centuries b.c.). On removal of the stomach and its rehydration, a cystic lesion in the stomach wall was observed by naked eye. Microscopical examination of sections of this lesion revealed a cystic structure, with a wall, with numerous projecting eversions, a characteristic feature of the larval stage (cysticercus) of the human tapeworm Taenia solium (or "pig tapeworm"). Immunohistochemical testing with serum from a T. solium-infected human confirmed the identity of the cyst. This finding is the oldest on record of the antiquity of this zoonotic parasite. This observation also confirms that, in Hellenistic Egypt, the farming of swine, along with man an intermediate host of this parasite, was present, and supports other archeological evidenc

    Learning from disagreement: a survey

    Many tasks in Natural Language Processing (nlp) and Computer Vision (cv) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (ai) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumption. In this survey, we review the evidence for disagreements on nlp and cv tasks, focusing on tasks for which substantial datasets containing this information have been created. We discuss the most popular approaches to training models from datasets containing multiple judgments potentially in disagreement. We systematically compare these different approaches by training them with each of the available datasets, considering several ways to evaluate the resulting models. Finally, we discuss the results in depth, focusing on four key research questions, and assess how the type of evaluation and the characteristics of a dataset determine the answers to these questions. Our results suggest, first of all, that even if we abandon the assumption of a gold standard, it is still essential to reach a consensus on how to evaluate models. This is because the relative performance of the various training methods is critically affected by the chosen form of evaluation. Secondly, we observed a strong dataset effect. With substantial datasets, providing many judgments by high-quality coders for each item, training directly with soft labels achieved better results than training from aggregated or even gold labels. This result holds for both hard and soft evaluation. But when the above conditions do not hold, leveraging both gold and soft labels generally achieved the best results in the hard evaluation. All datasets and models employed in this paper are freely available as supplementary materials

    Mode and tempo of the Paleocene-Eocene thermal maximum in an expanded section from the Venetian pre-Alps.

    The central part of the Piave River valley in the Venetian pre-Alps of NE Italy exposes an expanded and continuous marine sediment succession that encompasses the Paleocene series and the Paleocene to Eocene transition. The Paleocene through lowermost Eocenemsuccession is >100 m thick and was depositednat middle to lower bathyal depths in a hemipelagic, near-continental setting in the central western Tethys. In the Forada section, the Paleocene succession of limestone-marl couplets is sharply interrupted by an ~3.30- m-thick unit of clays and marls (clay marl unit). The very base of this unit represents the biostratigraphic Paleocene-Eocene boundary, and the entire unit coincides with the main carbon isotope excursion of the Paleocene-Eocene thermal maximum event. Concentrations of hematite and biogenic carbonate, δ13C measurements, and abundance of radiolarians, all oscillate in a cyclical fashion and are interpreted to represent precession cycles. The main excursion interval spans fi ve complete cycles, that is, 105 ± 10 k.y. The overlying carbon isotope recovery interval, which is composed of six distinct limestone-marl couplets, is interpreted to represent six precessional cycles with a duration of 126 ± 12 k.y. The entire carbon isotope excursion interval in Forada has a total duration of ~231 ± 22 k.y., which is 5%–10% longer than previous estimates derived from open ocean sites (210–220 k.y.). Geochemical proxies for redox conditions indicate oxygenated conditions before, during, and after the carbon isotope excursion event. The Forada section exhibits a nonstepped sharp decrease in δ13C (−2.35‰) at the base of the clay marl unit. The hemipelagic, near-continental depositional setting of Forada and the sharply elevated sedimentation rates throughout the clay marl unit argue for continuous rather than interrupted deposition and show that the initial nonstepped carbon isotope shift was not caused by a hiatus. A single sample at the base of the unit lacks biogenic carbonate. Preservation of carbonate thereafter improves progressively up-section in the clay marl unit, which is consistent with a prodigiously abrupt and rapid acidifi cation of the oceans followed by a slower, successive deepening of the carbonate compensation depth. Increased sedimentation rates through the clay marl unit (approximately the main interval of the carbon isotope excursion) are consistent with an intensifi ed hydrological cycle driven by supergreenhouse conditions and enhanced weathering and transport of terrigenous material to this near-continental, hemipelagic environment in the central western Tethys. The sharp transition in lithology from the clay marl unit to the overlying limestonemarl couplets in the recovery interval and the coincident shift toward heavier δ13C values suggest that the silicate pump and continental weathering, the cause of the enhanced terrigenous fl ux to Forada, stopped abruptly. This implies that the source of the light CO2 ceased to be added to the ocean-atmosphere system at the top of the clay marl unit

    SemEval-2021 Task 12: Learning with Disagreements

    Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision. However, most supervised machine learning methods assume that a single preferred interpretation exists for each item, which is at best an idealization. The aim of the SemEval-2021 shared task on learning with disagreements (Le-Wi-Di) was to provide a unified testing framework for methods for learning from data containing multiple and possibly contradictory annotations covering the best-known datasets containing information about disagreements for interpreting language and classifying images. In this paper we describe the shared task and its results