4 research outputs found

    Detecting Sockpuppets in Deceptive Opinion Spam

    Full text link
    This paper explores the problem of sockpuppet detection in deceptive opinion spam using authorship attribution and verification approaches. Two methods are explored. The first is a feature subsampling scheme that uses the KL-Divergence on stylistic language models of an author to find discriminative features. The second is a transduction scheme, spy induction that leverages the diversity of authors in the unlabeled test set by sending a set of spies (positive samples) from the training set to retrieve hidden samples in the unlabeled test set using nearest and farthest neighbors. Experiments using ground truth sockpuppet data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on Intelligent Text Processing and Computational Linguistic

    Character N-Grams for Detecting Deceptive Controversial Opinions

    Full text link
    [EN] Controversial topics are present in the everyday life, and opinions about them can be either truthful or deceptive. Deceptive opinions are emitted to mislead other people in order to gain some advantage. In the most of the cases humans cannot detect whether the opinion is deceptive or truthful, however, computational approaches have been used successfully for this purpose. In this work, we evaluate a representation based on character n-grams features for detecting deceptive opinions. We consider opinions on the following: abortion, death penalty and personal feelings about the best friend; three domains studied in the state of the art. We found character n-grams effective for detecting deception in these controversial domains, even more than using psycholinguistic features. Our results indicate that this representation is able to capture relevant information about style and content useful for this task. This fact allows us to conclude that the proposed one is a competitive text representation with a good trade-off between simplicity and performance.We would like to thank CONACyT for partially supporting this work under grants 613411, CB-2015-01-257383, and FC-2016/2410. The work of the last author was partially funded by the Spanish MINECO under the research project SomEMBED (TIN2015-71147-C2-1-P).Sánchez-Junquera, JJ.; Luis Villaseñor Pineda; Montes Gomez, M.; Rosso, P. (2018). Character N-Grams for Detecting Deceptive Controversial Opinions. Lecture Notes in Computer Science. 11018:135-140. https://doi.org/10.1007/978-3-319-98932-7_13S13514011018Aritsugi, M., et al.: Combining word and character n-grams for detecting deceptive opinions, vol. 1, pp. 828–833. IEEE (2017)Buller, D.B., Burgoon, J.K.: Interpersonal deception theory. Commun. Theory 6(3), 203–242 (1996)Cagnina, L.C., Rosso, P.: Detecting deceptive opinions: intra and cross-domain classification using an efficient representation. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 25(Suppl. 2), 151–174 (2017)Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection, pp. 171–175. Association for Computational Linguistics (2012)Fusilier, D.H., Montes-y-Gómez, M., Rosso, P., Cabrera, R.G.: Detection of opinion spam with character n-grams. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 285–294. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18117-2_21Hernández-Castañeda, Á., Calvo, H., Gelbukh, A., Flores, J.J.G.: Cross-domain deception detection using support vector networks. Soft Comput. 21(3), 1–11 (2016)Mihalcea, R., Strapparava, C.: The lie detector: explorations in the automatic recognition of deceptive language. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 309–312. Association for Computational Linguistics (2009)Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 309–319. Association for Computational Linguistics (2011)Pérez-Rosas, V., Mihalcea, R.: Cross-cultural deception detection. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 440–445 (2014)Sapkota, U., Solorio, T., Montes-y-Gómez, M., Bethard, S.: Not all character n-grams are created equal: a study in authorship attribution. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 93–102 (2015)Vrij, A.: Detecting Lies and Deceit: Pitfalls and Opportunities. Wiley, Hoboken (2008

    Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks

    No full text
    Triaging unpromising lead molecules early in the drug discovery process is essential for accelerating its pace while avoiding the costs of unwarranted biological and clinical testing. Accordingly, medicinal chemists have been trying for decades to develop metrics-ranging from heuristic measures to machine-learning models-that could rapidly distinguish potential drugs from small molecules that lack drug-like features. However, none of these metrics has gained universal acceptance and the very idea of 'drug-likeness' has recently been put into question. Here, we evaluate drug-likeness using different sets of descriptors and different state-of-the-art classifiers, reaching an out-of-sample accuracy of 87-88%. Remarkably, because these individual classifiers yield different Bayesian error distributions, their combination and selection of minimal-variance predictions can increase the accuracy of distinguishing drug-like from non-drug-like molecules to 93%. Because total variance is comparable with its aleatoric contribution reflecting irreducible error inherent to the dataset (as opposed to the epistemic contribution due to the model itself), this level of accuracy is probably the upper limit achievable with the currently known collection of drugs. When designing new drugs, there are countless ways to create molecules, yet only a few interact with biological targets. Beker and colleagues provide here a graph neural network based metric for drug-likeness that can guide the search

    360 degree view of cross-domain opinion classification: a survey

    No full text
    corecore