57 research outputs found

    Identifying fake Amazon reviews as learning from crowds

    Get PDF
    Customers who buy products such as books online often rely on other customers reviews more than on reviews found on specialist magazines. Unfortunately the confidence in such reviews is often misplaced due to the explosion of so-called sock puppetry-Authors writing glowing reviews of their own books. Identifying such deceptive reviews is not easy. The first contribution of our work is the creation of a collection including a number of genuinely deceptive Amazon book reviews in collaboration with crime writer Jeremy Duns, who has devoted a great deal of effort in unmasking sock puppeting among his colleagues. But there can be no certainty concerning the other reviews in the collection: All we have is a number of cues, also developed in collaboration with Duns, suggesting that a review may be genuine or deceptive. Thus this corpus is an example of a collection where it is not possible to acquire the actual label for all instances, and where clues of deception were treated as annotators who assign them heuristic labels. A number of approaches have been proposed for such cases; we adopt here the 'learning from crowds' approach proposed by Raykar et al. (2010). Thanks to Duns' certainly fake reviews, the second contribution of this work consists in the evaluation of the effectiveness of different methods of annotation, according to the performance of models trained to detect deceptive reviews. © 2014 Association for Computational Linguistics

    Fake Opinion Detection: How Similar are Crowdsourced Datasets to Real Data?

    Full text link
    [EN] Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is difficult, because normally it is not possible to know whether reviews are genuine. A common workaround involves collecting (supposedly) truthful reviews online and adding them to a set of deceptive reviews obtained through crowdsourcing services. Models trained this way are generally successful at discriminating between `genuine¿ online reviews and the crowdsourced deceptive reviews. It has been argued that the deceptive reviews obtained via crowdsourcing are very different from real fake reviews, but the claim has never been properly tested. In this paper, we compare (false) crowdsourced reviews with a set of `real¿ fake reviews published on line. We evaluate their degree of similarity and their usefulness in training models for the detection of untrustworthy reviews. We find that the deceptive reviews collected via crowdsourcing are significantly different from the fake reviews published online. In the case of the artificially produced deceptive texts, it turns out that their domain similarity with the targets affects the models¿ performance, much more than their untruthfulness. This suggests that the use of crowdsourced datasets for opinion spam detection may not result in models applicable to the real task of detecting deceptive reviews. As an alternative method to create large-size datasets for the fake reviews detection task, we propose methods based on the probabilistic annotation of unlabeled texts, relying on the use of meta-information generally available on the e-commerce sites. Such methods are independent from the content of the reviews and allow to train reliable models for the detection of fake reviews.Leticia Cagnina thanks CONICET for the continued financial support. This work was funded by MINECO/FEDER (Grant No. SomEMBED TIN2015-71147-C2-1-P). The work of Paolo Rosso was partially funded by the MISMIS-FAKEnHATE Spanish MICINN research project (PGC2018-096212-B-C31). Massimo Poesio was in part supported by the UK Economic and Social Research Council (Grant Number ES/M010236/1).Fornaciari, T.; Cagnina, L.; Rosso, P.; Poesio, M. (2020). Fake Opinion Detection: How Similar are Crowdsourced Datasets to Real Data?. Language Resources and Evaluation. 54(4):1019-1058. https://doi.org/10.1007/s10579-020-09486-5S10191058544Baeza-Yates, R. (2018). Bias on the web. Communications of the ACM, 61(6), 54–61.Banerjee, S., & Chua, A. Y. (2014). Applauses in hotel reviews: Genuine or deceptive? In: Science and Information Conference (SAI), 2014 (pp. 938–942). New York: IEEE.Bhargava, R., Baoni, A., & Sharma, Y. (2018). Composite sequential modeling for identifying fake reviews. Journal of Intelligent Systems,. https://doi.org/10.1515/jisys-2017-0501.Bickel, P. J., & Doksum, K. A. (2015). Mathematical statistics: Basic ideas and selected topics (2nd ed., Vol. 1). Boca Raton: Chapman and Hall/CRC Press.Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory (pp. 92–100). New York: ACM.Cagnina, L. C., & Rosso, P. (2017). Detecting deceptive opinions: Intra and cross-domain classification using an efficient representation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 25(Suppl. 2), 151–174. https://doi.org/10.1142/S0218488517400165.Cardoso, E. F., Silva, R. M., & Almeida, T. A. (2018). Towards automatic filtering of fake reviews. Neurocomputing, 309, 106–116. https://doi.org/10.1016/j.neucom.2018.04.074.Carpenter, B. (2008). Multilevel bayesian models of categorical data annotation. Retrieved from http://lingpipe.files.wordpress.com/2008/11/carp-bayesian-multilevel-annotation.pdf.Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.Costa, P. T., & MacCrae, R. R. (1992). Revised NEO personality inventory (NEO PI-R) and NEO five-factor inventory (NEO FFI): Professional manual. Psychological Assessment Resources.Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28(1), 20–28.Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1), 1–38.Elkan, C., & Noto, K. (2008). Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 213–220). New York: ACM.Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., & Ghosh, R. (2013). Exploiting burstiness in reviews for review spammer detection. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (Vol. 13, pp. 175–184).Feng, S., Banerjee, R., & Choi, Y. (2012). Syntactic stylometry for deception detection. In: Proceedings of the 50th annual meeting of the association for computational linguistics (Vol. 2: Short Papers, pp. 171–175). Jeju Island: Association for Computational Linguistics.Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289–1305.Fornaciari, T., & Poesio, M. (2013). Automatic deception detection in Italian court cases. Artificial intelligence and law, 21(3), 303–340. https://doi.org/10.1007/s10506-013-9140-4.Fornaciari, T., & Poesio, M. (2014). Identifying fake amazon reviews as learning from crowds. In: Proceedings of the 14th conference of the European chapter of the Association for Computational Linguistics (pp. 279–287). Gothenburg: Association for Computational Linguistics. Retrieved from http://www.aclweb.org/anthology/E14-1030.Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models., Analytical methods for social research Cambridge: Cambridge University Press.Graves, A., Jaitly, N., & Mohamed, A. R. (2013). Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 273–278). New York: IEEE.Hernández-Castañeda, Á., & Calvo, H. (2017). Deceptive text detection using continuous semantic space models. Intelligent Data Analysis, 21(3), 679–695.Hernández Fusilier, D., Guzmán, R., Móntes y Gomez, M., & Rosso, P. (2013). Using pu-learning to detect deceptive opinion spam. In: Proc. of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 38–45).Hernández Fusilier, D., Montes-y Gómez, M., Rosso, P., & Cabrera, R. G. (2015). Detecting positive and negative deceptive opinions using pu-learning. Information Processing & Management, 51(4), 433–443.Hovy, D. (2016). The enemy in your own camp: How well can we detect statistically-generated fake reviews–an adversarial study. In: The 54th annual meeting of the association for computational linguistics (p 351).Jelinek, F., Lafferty, J. D., & Mercer, R. L. (1992). Basic methods of probabilistic context free grammars. Speech recognition and understanding (pp. 345–360). New York: Springer.Jindal, N., & Liu, B. (2008). Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining (pp. 219–230). New York: ACM.Karatzoglou, A., Meyer, D., & Hornik, K. (2006). Support vector machines in R. Journal of Statistical Software, 15(9), 1–28.Kim, S., Lee, S., Park, D., & Kang, J. (2017). Constructing and evaluating a novel crowdsourcing-based paraphrased opinion spam dataset. In: Proceedings of the 26th international conference on world wide web (pp. 827–836). Geneva: International World Wide Web Conferences Steering Committee.Li, F., Huang, M., Yang, Y., & Zhu, X. (2011). Learning to identify review spam. IJCAI Proceedings-International Joint Conference on Artificial Intelligence, 22(3), 2488–2493.Li, H., Chen, Z., Liu, B., Wei, X., & Shao, J. (2014a). Spotting fake reviews via collective positive-unlabeled learning. In: 2014 IEEE international conference on data mining (ICDM) (pp. 899–904). New York: IEEE.Li, H., Fei, G., Wang, S., Liu, B., Shao, W., Mukherjee, A., & Shao, J. (2017). Bimodal distribution and co-bursting in review spam detection. In: Proceedings of the 26th international conference on world wide web (pp. 1063–1072). Geneva: International World Wide Web Conferences Steering Committee.Li, H., Liu, B., Mukherjee, A., & Shao, J. (2014b). Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas, 18(3), 467–475.Li, J., Ott, M., Cardie, C., & Hovy, E. H. (2014c). Towards a general rule for identifying deceptive opinion spam. In: ACL (Vol. 1, pp. 1566–1576).Lin, C. H., Hsu, P. Y., Cheng, M. S., Lei, H. T., & Hsu, M. C. (2017). Identifying deceptive review comments with rumor and lie theories. In: International conference in swarm intelligence (pp. 412–420). New York: Springer.Liu, B., Dai, Y., Li, X., Lee, W. S., & Yu, P. S. (2003). Building text classifiers using positive and unlabeled examples. In: Third IEEE international conference on data mining (pp. 179–186). New York: IEEE.Liu, B., Lee, W. S., Yu, P. S., & Li, X. (2002). Partially supervised classification of text documents. ICML, 2, 387–394.Martens, D., & Maalej, W. (2019). Towards understanding and detecting fake reviews in app stores. Empirical Software Engineering,. https://doi.org/10.1007/s10664-019-09706-9.Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781.Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., & Ghosh, R. (2013a). Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 632–640) New York: ACM.Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. S. (2013b). What yelp fake review filter might be doing? In: Proceedings of the seventh international AAAI conference on weblogs and social media.Negri, M., Bentivogli, L., Mehdad, Y., Giampiccolo, D., & Marchetti, A. (2011). Divide and conquer: Crowdsourcing the creation of cross-lingual textual entailment corpora. In: Proceedings of the conference on empirical methods in natural language processing (pp. 670–679). Stroudsburg: Association for Computational Linguistics.Ott, M., Cardie, C., & Hancock, J. T. (2013). Negative deceptive opinion spam. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies (pp. 497–501).Ott, M., Choi, Y., Cardie, C., & Hancock, J. (2011). Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual meeting of the association for computational linguistics: human language technologies (pp. 309–319). Portland, Oregon: Association for Computational Linguistics.Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count (LIWC): LIWC2001. Mahwah: Lawrence Erlbaum Associates.Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., et al. (2010). Learning from crowds. Journal of Machine Learning Research, 11, 1297–1322.Ren, Y., & Ji, D. (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences, 385, 213–224.Rout, J. K., Dalmia, A., Choo, K. K. R., Bakshi, S., & Jena, S. K. (2017). Revisiting semi-supervised learning for online deceptive review detection. IEEE Access, 5(1), 1319–1327.Saini, M., & Sharan, A. (2017). Ensemble learning to find deceptive reviews using personality traits and reviews specific features. Journal of Digital Information Management, 12(2), 84–94.Salloum, W., Edwards, E., Ghaffarzadegan, S., Suendermann-Oeft, D., & Miller, M. (2017). Crowdsourced continuous improvement of medical speech recognition. In: The AAAI-17 workshop on crowdsourcing, deep learning, and artificial intelligence agents.Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In: Proceedings of international conference on new methods in language processing. Retrieved from http://www.ims.uni-stuttgart.de/ftp/pub/corpora/tree-tagger1.pdf.Shehnepoor, S., Salehi, M., Farahbakhsh, R., & Crespi, N. (2017). Netspam: A network-based spam detection framework for reviews in online social media. IEEE Transactions on Information Forensics and Security, 12(7), 1585–1595.Skeppstedt, M., Peldszus, A., & Stede, M. (2018). More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing. In: Proceedings of the 5th workshop on argument mining (pp. 155–163).Strapparava, C., & Mihalcea, R. (2009). The lie detector: Explorations in the automatic recognition of deceptive language. In: Proceedings of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing.Streitfeld, D. (August 25th25{{\rm th}}, 2012). The best book reviews money can buy. The New York Times.Whitehill, J., Wu, T., Bergsma, F., Movellan, J. R., & Ruvolo, P. L. (2009). Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Advances in neural information processing systems (pp. 2035–2043). Cambridge: MIT Press.Xie, S., Wang, G., Lin, S., & Yu, P. S. (2012). Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp 823–831). New York: ACM.Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’99 (pp. 42–49). New York: ACM.Zhang, W., Bu, C., Yoshida, T., & Zhang, S. (2016). Cospa: A co-training approach for spam review identification with support vector machine. Information, 7(1), 12.Zhang, W., Du, Y., Yoshida, T., & Wang, Q. (2018). DRI-RCNN: An approach to deceptive review identification using recurrent convolutional neural network. Information Processing & Management, 54(4), 576–592.Zhou, L., Shi, Y., & Zhang, D. (2008). A Statistical Language Modeling Approach to Online Deception Detection. IEEE Transactions on Knowledge and Data Engineering, 20(8), 1077–1081

    Luni, Lucca e l’Appennino nel Medioevo: ospedali e strade tra città e montagna

    Get PDF
    L’uomo medievale era per eccellenza un homo viator. Tuttavia se le ragioni che mettevano l’uomo in cammino sono sostanzialmente individuabili nelle tre fondamentali attività del pellegrinaggio religioso, della mercatura e della spedizione militare, dobbiamo ammettere che al di fuori di queste tre categorie ci sono i molti che si spostavano quotidianamente, saltuariamente o anche poche volte nella vita, per varie esigenze concrete, ci sono i pastori nella transumanza (verticale e orizzontale) e nel pascolo vagante, i fedeli verso luoghi di culto locali, i contrabbandieri, gli emarginati. Una complessità che sbaglieremmo nel continuare a sottostimare: le strade sono solo uno dei modi per muoversi, e la rete stradale rintracciabile (ovvero solitamente le direttrici più importanti) esclude molti luoghi di spostamento, dove circolano persone, animali, oggetti, idee. In questo contributo, dunque, cercheremo di praticare un’archeologia della mobilità, soprattutto nella consapevolezza che gli elementi da noi analizzati, ospedali e in seconda battuta monasteri, rappresentano solo un tassello, da inserire in una riflessione più ampia e più articolata. Luni e Lucca sono due città medievali segnate profondamente dalla vicinanza dell’Appennino, delle sue strade e dei suoi valichi. Situate entrambe allo sbocco di valli fluviali che penetrano a fondo la catena e che costituiscono naturali direttrici da e per l’Italia settentrionale, data la loro vicinanza hanno intessuto nei secoli un intenso rapporto di reciprocità “stradale”, rappresentando, con le loro specificità, due nodi stradali di grande rilevanza: Luni anche per la sua ubicazione marittima, Lucca per la sua funzione di collettore di vie terrestri e fluviali. L’ambito geografico è quindi dato dai territori delle due città nel Medioevo, che abbiamo identificato con l’estensione delle diocesi, togliendo nel caso di Lucca le enclave meridionali situate a sud dell’Arno. L’ambito cronologico è invece esteso fino a tutto il XIII secolo, con lo scopo precipuo di comprendere un periodo nel quale le fonti scritte ci restituiscono più compiutamente il fenomeno ospedaliero

    Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning

    Get PDF
    Supervised learning assumes that a ground truth label exists. However, the reliability of this ground truth depends on human annotators, who often disagree. Prior work has shown that this disagreement can be helpful in training models. We propose a novel method to incorporate this disagreement as information: in addition to the standard error computation, we use soft labels (i.e., probability distributions over the annotator labels) as an auxiliary task in a multi-task neural network. We measure the divergence between the predictions and the target soft labels with several loss-functions and evaluate the models on various NLP tasks. We find that the soft-label prediction auxiliary task reduces the penalty for errors on ambiguous entities and thereby mitigates overfitting. It significantly improves performance across tasks beyond the standard approach and prior work

    Short report: Cysticercosis in an Egyptian mummy of the late Ptolemaic period

    Get PDF
    Abstract We describe here an ancient case of cysticercosis that was discovered in an Egyptian mummy of a young woman of about 20 years of age who lived in the late Ptolemaic period (second to first centuries b.c.). On removal of the stomach and its rehydration, a cystic lesion in the stomach wall was observed by naked eye. Microscopical examination of sections of this lesion revealed a cystic structure, with a wall, with numerous projecting eversions, a characteristic feature of the larval stage (cysticercus) of the human tapeworm Taenia solium (or "pig tapeworm"). Immunohistochemical testing with serum from a T. solium-infected human confirmed the identity of the cyst. This finding is the oldest on record of the antiquity of this zoonotic parasite. This observation also confirms that, in Hellenistic Egypt, the farming of swine, along with man an intermediate host of this parasite, was present, and supports other archeological evidenc

    Learning from disagreement: a survey

    Get PDF
    Many tasks in Natural Language Processing (nlp) and Computer Vision (cv) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (ai) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumption. In this survey, we review the evidence for disagreements on nlp and cv tasks, focusing on tasks for which substantial datasets containing this information have been created. We discuss the most popular approaches to training models from datasets containing multiple judgments potentially in disagreement. We systematically compare these different approaches by training them with each of the available datasets, considering several ways to evaluate the resulting models. Finally, we discuss the results in depth, focusing on four key research questions, and assess how the type of evaluation and the characteristics of a dataset determine the answers to these questions. Our results suggest, first of all, that even if we abandon the assumption of a gold standard, it is still essential to reach a consensus on how to evaluate models. This is because the relative performance of the various training methods is critically affected by the chosen form of evaluation. Secondly, we observed a strong dataset effect. With substantial datasets, providing many judgments by high-quality coders for each item, training directly with soft labels achieved better results than training from aggregated or even gold labels. This result holds for both hard and soft evaluation. But when the above conditions do not hold, leveraging both gold and soft labels generally achieved the best results in the hard evaluation. All datasets and models employed in this paper are freely available as supplementary materials

    Mode and tempo of the Paleocene-Eocene thermal maximum in an expanded section from the Venetian pre-Alps.

    Get PDF
    The central part of the Piave River valley in the Venetian pre-Alps of NE Italy exposes an expanded and continuous marine sediment succession that encompasses the Paleocene series and the Paleocene to Eocene transition. The Paleocene through lowermost Eocenemsuccession is >100 m thick and was depositednat middle to lower bathyal depths in a hemipelagic, near-continental setting in the central western Tethys. In the Forada section, the Paleocene succession of limestone-marl couplets is sharply interrupted by an ~3.30- m-thick unit of clays and marls (clay marl unit). The very base of this unit represents the biostratigraphic Paleocene-Eocene boundary, and the entire unit coincides with the main carbon isotope excursion of the Paleocene-Eocene thermal maximum event. Concentrations of hematite and biogenic carbonate, δ13C measurements, and abundance of radiolarians, all oscillate in a cyclical fashion and are interpreted to represent precession cycles. The main excursion interval spans fi ve complete cycles, that is, 105 ± 10 k.y. The overlying carbon isotope recovery interval, which is composed of six distinct limestone-marl couplets, is interpreted to represent six precessional cycles with a duration of 126 ± 12 k.y. The entire carbon isotope excursion interval in Forada has a total duration of ~231 ± 22 k.y., which is 5%–10% longer than previous estimates derived from open ocean sites (210–220 k.y.). Geochemical proxies for redox conditions indicate oxygenated conditions before, during, and after the carbon isotope excursion event. The Forada section exhibits a nonstepped sharp decrease in δ13C (−2.35‰) at the base of the clay marl unit. The hemipelagic, near-continental depositional setting of Forada and the sharply elevated sedimentation rates throughout the clay marl unit argue for continuous rather than interrupted deposition and show that the initial nonstepped carbon isotope shift was not caused by a hiatus. A single sample at the base of the unit lacks biogenic carbonate. Preservation of carbonate thereafter improves progressively up-section in the clay marl unit, which is consistent with a prodigiously abrupt and rapid acidifi cation of the oceans followed by a slower, successive deepening of the carbonate compensation depth. Increased sedimentation rates through the clay marl unit (approximately the main interval of the carbon isotope excursion) are consistent with an intensifi ed hydrological cycle driven by supergreenhouse conditions and enhanced weathering and transport of terrigenous material to this near-continental, hemipelagic environment in the central western Tethys. The sharp transition in lithology from the clay marl unit to the overlying limestonemarl couplets in the recovery interval and the coincident shift toward heavier δ13C values suggest that the silicate pump and continental weathering, the cause of the enhanced terrigenous fl ux to Forada, stopped abruptly. This implies that the source of the light CO2 ceased to be added to the ocean-atmosphere system at the top of the clay marl unit

    SemEval-2021 Task 12: Learning with Disagreements

    Get PDF
    Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision. However, most supervised machine learning methods assume that a single preferred interpretation exists for each item, which is at best an idealization. The aim of the SemEval-2021 shared task on learning with disagreements (Le-Wi-Di) was to provide a unified testing framework for methods for learning from data containing multiple and possibly contradictory annotations covering the best-known datasets containing information about disagreements for interpreting language and classifying images. In this paper we describe the shared task and its results
    corecore