224 research outputs found

    A Systematic Study of Knowledge Graph Analysis for Cross-language Plagiarism Detection

    Full text link
    This is the author’s version of a work that was accepted for publication in Information Processing and Management. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Processing and Management 52 (2016) 550–570. DOI 10.1016/j.ipm.2015.12.004Cross-language plagiarism detection aims to detect plagiarised fragments of text among documents in different languages. In this paper, we perform a systematic examination of Cross-language Knowledge Graph Analysis; an approach that represents text fragments using knowledge graphs as a language independent content model. We analyse the contributions to cross-language plagiarism detection of the different aspects covered by knowledge graphs: word sense disambiguation, vocabulary expansion, and representation by similarities with a collection of concepts. In addition, we study both the relevance of concepts and their relations when detecting plagiarism. Finally, as a key component of the knowledge graph construction, we present a new weighting scheme of relations between concepts based on distributed representations of concepts. Experimental results in Spanish–English and German–English plagiarism detection show state-of-the-art performance and provide interesting insights on the use of knowledge graphs. © 2015 Elsevier Ltd. All rights reserved.This research has been carried out in the framework of the European Commission WIQ-EI IRSES (No. 269180) and DIANA APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) projects. We would like to thank Tomas Mikolov, Martin Potthast, and Luis A. Leiva for their support and comments during this research.Franco-Salvador, M.; Rosso, P.; Montes Gomez, M. (2016). A Systematic Study of Knowledge Graph Analysis for Cross-language Plagiarism Detection. Information Processing and Management. 52(4):550-570. https://doi.org/10.1016/j.ipm.2015.12.004S55057052

    Detection of opinion spam with character n-grams

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-18117-2_21In this paper we consider the detection of opinion spam as a stylistic classi cation task because, given a particular domain, the deceptive and truthful opinions are similar in content but di ffer in the way opinions are written (style). Particularly, we propose using character ngrams as features since they have shown to capture lexical content as well as stylistic information. We evaluated our approach on a standard corpus composed of 1600 hotel reviews, considering positive and negative reviews. We compared the results obtained with character n-grams against the ones with word n-grams. Moreover, we evaluated the e ffectiveness of character n-grams decreasing the training set size in order to simulate real training conditions. The results obtained show that character n-grams are good features for the detection of opinion spam; they seem to be able to capture better than word n-grams the content of deceptive opinions and the writing style of the deceiver. In particular, results show an improvement of 2:3% and 2:1% over the word-based representations in the detection of positive and negative deceptive opinions respectively. Furthermore, character n-grams allow to obtain a good performance also with a very small training corpus. Using only 25% of the training set, a Na ve Bayes classi er showed F1 values up to 0.80 for both opinion polarities.This work is the result of the collaboration in the frame-work of the WIQEI IRSES project (Grant No. 269180) within the FP7 Marie Curie. The second author was partially supported by the LACCIR programme under project ID R1212LAC006. Accordingly, the work of the third author was in the framework the DIANA-APPLICATIONS-Finding Hidden Knowledge inTexts: Applications (TIN2012-38603-C02-01) project, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Hernández Fusilier, D.; Montes Gomez, M.; Rosso, P.; Guzmán Cabrera, R. (2015). Detection of opinion spam with character n-grams. En Computational Linguistics and Intelligent Text Processing: 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part II. Springer International Publishing. 285-294. https://doi.org/10.1007/978-3-319-18117-2_21S285294Blamey, B., Crick, T., Oatley, G.: RU:-) or:-(? character-vs. word-gram feature selection for sentiment classification of OSN corpora. Research and Development in Intelligent Systems XXIX, 207–212 (2012)Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (2002)Feng, S., Banerjee, R., Choi, Y.: Syntactic Stylometry for Deception Detection. Association for Computational Linguistics, short paper. ACL (2012)Feng, S., Xing, L., Gogar, A., Choi, Y.: Distributional Footprints of Deceptive Product Reviews. In: Proceedings of the 2012 International AAAI Conference on WebBlogs and Social Media (June 2012)Gyongyi, Z., Garcia-Molina, H., Pedersen, J.: Combating Web Spam with Trust Rank. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 576–587. VLDB Endowment (2004)Hall, M., Eibe, F., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: an Update. SIGKDD Explor. Newsl. 10–18 (2009)Hernández-Fusilier, D., Guzmán-Cabrera, R., Montes-y-Gómez, M., Rosso, P.: Using PU-learning to Detect Deceptive Opinion Spam. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, USA, pp. 38–45 (2013)Hernández-Fusilier, D., Montes-y-Gómez, M., Rosso, P., Guzmán-Cabrera, R.: Detecting Positive and Negative Deceptive Opinions using PU-learning. Information Processing & Management (2014), doi:10.1016/j.ipm.2014.11.001Jindal, N., Liu, B.: Opinion Spam and Analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230 (2008)Jindal, N., Liu, B., Lim, E.: Finding Unusual Review Patterns Using Unexpected Rules. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 210–220(October 2010)Kanaris, I., Kanaris, K., Houvardas, I., Stamatatos, E.: Word versus character n-grams for anti-spam filtering. International Journal on Artificial Intelligence Tools 16(6), 1047–1067 (2007)Lim, E.P., Nguyen, V.A., Jindal, N., Liu, B., Lauw, H.W.: Detecting Product Review Spammers Using Rating Behaviours. In: CIKM, pp. 939–948 (2010)Liu, B.: Sentiment Analysis and Opinion Mining. Synthesis Lecture on Human Language Technologies. Morgan & Claypool Publishers (2012)Mukherjee, A., Liu, B., Wang, J., Glance, N., Jindal, N.: Detecting Group Review Spam. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 93–94 (2011)Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting Spam Web Pages through Content Analysis. Transactions on Management Information Systems (TMIS), 83–92 (2006)Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding Deceptive Opinion Spam by any Stretch of the Imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 309–319 (2011)Ott, M., Cardie, C., Hancock, J.T.: Negative Deceptive Opinion Spam. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, USA, pp. 309–319 (2013)Raymond, Y.K., Lau, S.Y., Liao, R., Chi-Wai, K., Kaiquan, X., Yunqing, X., Yuefeng, L.: Text Mining and Probabilistic Modeling for Online Review Spam Detection. ACM Transactions on Management Information Systems 2(4), Article: 25, 1–30 (2011)Stamatatos, E.: On the robustness of authorship attribution based on character n-gram features. Journal of Law & Policy 21(2) (2013)Wu, G., Greene, D., Cunningham, P.: Merging Multiple Criteria to Identify Suspicious Reviews. In: RecSys 2010, pp. 241–244 (2010)Xie, S., Wang, G., Lin, S., Yu, P.S.: Review Spam Detection via Time Series Pattern Discovery. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 635–636 (2012)Zhou, L., Sh, Y., Zhang, D.: A Statistical Language Modeling Approach to Online Deception Detection. IEEE Transactions on Knowledge and Data Engineering 20(8), 1077–1081 (2008

    A Knowledge-Based Weighted KNN for Detecting Irony in Twitter

    Full text link
    [EN] In this work, we propose a variant of a well-known instancebased algorithm: WKNN. Our idea is to exploit task-dependent features in order to calculate the weight of the instances according to a novel paradigm: the Textual Attraction Force, that serves to quantify the degree of relatedness between documents. The proposed method was applied to a challenging text classification task: irony detection. We experimented with corpora in the state of the art. The obtained results show that despite being a simple approach, our method is competitive with respect to more advanced techniques.This research was funded by CONACYT project FC 2016-2410. The work of P. Rosso has been funded by the SomEMBED TIN2015-71147-C2-1-P MINECO research project. The work of V. Patti was partially funded by Progetto di Ateneo/CSP 2016 (IhatePrejudice, S1618_L2_BOSC_01).Hernandez-Farias, DI.; Montes Gomez, M.; Escalante, H.; Rosso, P.; Patti, V. (2018). A Knowledge-Based Weighted KNN for Detecting Irony in Twitter. Lecture Notes in Computer Science. 11289:1-13. https://doi.org/10.1007/978-3-030-04497-8_16S11311289Barbieri, F., Basile, V., Croce, D., Nissim, M., Novielli, N., Patti, V.: Overview of the Evalita 2016 sentiment polarity classification task. In: Proceedings of Third Italian Conference on Computational Linguistics, vol. 1749. CEUR-WS.org (2016)Basile, V., Bolioli, A., Nissim, M., Patti, V., Rosso, P.: Overview of the Evalita 2014 sentiment polarity classification task. In: Proceedings of the First Italian Conference on Computational Linguistics, pp. 50–57 (2014)Brysbaert, M., Warriner, A.B., Kuperman, V.: Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Met. 46(3), 904–911 (2014)Cambria, E., Hussain, A.: Sentic Computing, vol. 1. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23654-4Cambria, E., Olsher, D., Rajagopal, D.: SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 1515–1521 (2014)Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst., Man, Cybern. SMC 6(4), 325–327 (1976)Ghosh, A., et al.: SemEval-2015 task 11: sentiment analysis of figurative language in Twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation, pp. 470–478 (2015)Giora, R., Fein, O.: Irony: context and salience. Metaphor. Symb. 14(4), 241–257 (1999)Gou, J., Du, L., Zhang, Y., Xiong, T.: A new distance-weighted k-nearest neighbor classifier. J. Inform. Comp. Sci. 9(6), 1429–1436 (2012)Grice, H.P.: Logic and conversation. In: Cole, P., Morgan, J.L. (eds.) Syntax and Semantics: Volume 3: Speech Acts, pp. 41–58. Academic Press, San Diego (1975)Hernández Farías, D.I., Patti, V., Rosso, P.: Irony detection in Twitter: the role of affective content. ACM Trans. Internet Technol. 16(3), 19:1–19:24 (2016)Hernández Farías, D.I., Rosso, P.: Irony, sarcasm, and sentiment analysis. chapter 7. In: Pozzi, F.A., Fersini, E., Messina, E., Liu, B. (eds.) Sentiment Analysis in Social Networks, pp. 113–127. Morgan Kaufmann (2016)Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the 10th SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)Joshi, A., Bhattacharyya, P., Carman, M.J.: Automatic sarcasm detection: a survey. ACM Comput. Surv. 50(5), 73:1–73:22 (2017)Mitchell, T.M.: Machine learning and data min. Com. ACM 42(11), 30–36 (1999)Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)Mohammad, S.M., Zhu, X., Kiritchenko, S., Martin, J.: Sentiment, emotion, purpose, and style in electoral tweets. Inf. Process. Manag. 51(4), 480–499 (2015)Plutchik, R.: The nature of emotions. Am. Sci. 89(4), 344–350 (2001)Reyes, A., Rosso, P., Veale, T.: A multidimensional approach for detecting irony in Twitter. Lang. Resour. Eval. 47(1), 239–268 (2013)Riloff, E., Qadir, A., Surve, P., Silva, L.D., Gilbert, N., Huang, R.: Sarcasm as contrast between a positive sentiment and negative situation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 704–714. ACL (2013)Skalicky, S., Crossley, S.: A statistical analysis of satirical Amazon.com product reviews. Eur. J. Humour Res. 2, 66–85 (2015)Van Hee, C., Lefever, E., Hoste, V.: SemEval-2018 task 3: irony detection in English tweets. In: Proceedings of the 12th International Workshop on Semantic Evaluation, SemEval-2018. ACL, June 201

    OGEO: Sistema de navegaciĂłn interior para la orientaciĂłn y movilidad de personas con discapacidad visual

    Get PDF
    Moving freely means, for visually impaired people, acquiring spatial orientation and mobility techniques that allow them to achieve their autonomy and independence. Orientation allows them to recognize their position according to the environment, while mobility allows them to move safely and efficiently from one place to another. However, navigating around an unknown place is a challenge for visually impaired people (ViP). In this research and development work, OGeo is created, an orientation and navigation system for interiors of buildings that uses markers (iBeacon devices) and mobile phones based on Android. This system allows a ViP to move inside a building independently. The system is based on voice instructions, which will guide a ViP from point A to point B, using precise instructions, step by step. The fundamental objective, from a social point of view, is to provide support for the social inclusion of ViP, where it is evident that it will provide greater school inclusiveness.El movilizarse libremente implica, para las personas con discapacidad visual, adquirir técnicas de orientación y movilidad que les permita ejercer su autonomía e independencia. La orientación le permite reconocer su posición con respecto al entorno, mientras que la movilidad le permite moverse de manera segura y eficiente de un lugar a otro. Sin embargo, el recorrer un lugar desconocido es un reto para una persona con discapacidad visual (PcDV). En este trabajo de investigación y desarrollo se construye OGeo, un sistema de orientación y navegación para interiores de edificios que mediante el uso de balizas (dispositivos iBeacon) y celulares basados en Android permite que una PcDV pueda desplazarse en el interior de un edificio. El sistema está basado en instrucciones de voz, que guiarán a la PcDV de un punto A hasta un punto B, con instrucciones precisas, paso a paso. El objetivo fundamental, desde un punto de vista social, es el de aportar ayudas en la inclusión social de las PcDV, en donde es evidente que proporcionará una mayor inserción escolar

    Learning When to Classify for Early Text Classification

    Get PDF
    The problem of classification in supervised learning is a widely studied one. Nonetheless, there are scenarios that received little attention despite its applicability. One of such scenarios is early text classification, where one needs to know the category of a document as soon as possible. The importance of this variant of the classification problem is evident in tasks like sexual predator detection, where one wants to identify an offender as early as possible. This paper presents a framework for early text classification which highlights the two main pieces involved in this problem: classification with partial information and deciding the moment of classification. In this context, a novel approach that learns the second component (when classify) and an adaptation of a temporal measurement for multi-class problems are introduced. Results with a classical text classification corpus in comparison against a model that reads the entire documents confirm the feasibility of our approach.Eje: XVIII Workshop de Agentes y Sistemas Inteligentes (WASI).Red de Universidades con Carreras en Informática (RedUNCI

    Activation of amino acid metabolic program in cardiac HIF1-alpha-deficient mice.

    Get PDF
    HIF1-alpha expression defines metabolic compartments in the developing heart, promoting glycolytic program in the compact myocardium and mitochondrial enrichment in the trabeculae. Nonetheless, its role in cardiogenesis is debated. To assess the importance of HIF1-alpha during heart development and the influence of glycolysis in ventricular chamber formation, herein we generated conditional knockout models of Hif1a in Nkx2.5 cardiac progenitors and cardiomyocytes. Deletion of Hif1a impairs embryonic glycolysis without influencing cardiomyocyte proliferation and results in increased mitochondrial number and transient activation of amino acid catabolism together with HIF2α and ATF4 upregulation by E12.5. Hif1a mutants display normal fatty acid oxidation program and do not show cardiac dysfunction in the adulthood. Our results demonstrate that cardiac HIF1 signaling and glycolysis are dispensable for mouse heart development and reveal the metabolic flexibility of the embryonic myocardium to consume amino acids, raising the potential use of alternative metabolic substrates as therapeutic interventions during ischemic events.This project has been supported by Fundación Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Spain and by grants to S.M.-P. from the European Research Council, European Union, FP7-PEOPLE-2010-RG_276891; Fundación TV3 La Marató, Spain, 201507.30.31; Comunidad de Madrid (CAM); Spain and European Union (EU), B2017/BMD-3875; Instituto de Salud Carlos III, Spain, PI17/01817; Universidad Francisco de Vitoria (UFV), Spain and LeDucq Foundation, France, 17CVD04. I.M.-M. was supported by La Caixa-CNIC and Fundacion Alfonso Martín Escudero fellowships, Spain. T.A.-G. was supported by a predoctoral award granted by CAM/EU and UFV, Spain, PEJD-2018-PRE/SAL-9529 and SM-P by a Contrato de Investigadores Miguel Servet (CPII16/00050) and UFV, Spain.S

    SPINK7 expression changes accompanied by HER2, P53 and RB1 can be relevant in predicting oral squamous cell carcinoma at a molecular level

    Get PDF
    The oral squamous cell carcinoma (OSCC), which has a high morbidity rate, affects patients worldwide. Changes in SPINK7 in precancerous lesions could promote oncogenesis. Our aim was to evaluate SPINK7 as a potential molecular biomarker which predicts OSCC stages, compared to: HER2, TP53, RB1, NFKB and CYP4B1. This study used oral biopsies from three patient groups: dysplasia (n = 33), less invasive (n = 28) and highly invasive OSCC (n = 18). The control group consisted of clinically suspicious cases later to be confirmed as normal mucosa (n = 20). Gene levels of SPINK7, P53, RB, NFKB and CYP4B1 were quantified by qPCR. SPINK7 levels were correlated with a cohort of 330 patients from the TCGA. Also, SPINK7, HER2, TP53, and RB1, were evaluated by immunohistofluorescence. One-way Kruskal–Wallis test and Dunn's post-hoc with a p < 0.05 significance was used to analyze data. In OSCC, the SPINK7 expression had down regulated while P53, RB, NFKB and CYP4B1 had up regulated (p < 0.001). SPINK7 had also diminished in TCGA patients (p = 2.10e-6). In less invasive OSCC, SPINK7 and HER2 proteins had decreased while TP53 and RB1 had increased with respect to the other groups (p < 0.05). The changes of SPINK7 accompanied by HER2, P53 and RB1 can be used to classify the molecular stage of OSCC lesions allowing a diagnosis at molecular and histopathological levels.Fil: Pennacchiotti, Graciela Laura. Universidad de Chile; ChileFil: Valdes Garrido, Fabio. Instituto Nacional del Cáncer; ChileFil: González Arriaga, Wilfredo Alejandro. Universidad de Valparaíso; ChileFil: Montes, Héctor Federico. Universidad Nacional de Cuyo. Facultad de Odontologia; ArgentinaFil: Parra, Judith Maria Roxana. Universidad Nacional de Cuyo. Facultad de Odontologia; ArgentinaFil: Guida, Valeria Andrea. Universidad Nacional de Cuyo. Facultad de Odontologia; ArgentinaFil: Gomez, Silvina Esther. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Medicina y Biología Experimental de Cuyo; ArgentinaFil: Guerrero Gimenez, Martin Eduardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Medicina y Biología Experimental de Cuyo; ArgentinaFil: Fernandez Muñoz, Juan Manuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Medicina y Biología Experimental de Cuyo; ArgentinaFil: Zoppino, Felipe Carlos Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Medicina y Biología Experimental de Cuyo; ArgentinaFil: Caron, Ruben Walter. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Medicina y Biología Experimental de Cuyo; ArgentinaFil: Ezquer, Marcelo Eduardo. Universidad del Desarrollo; ChileFil: Ramires Fernández, Ricardo. Universidad Mayor; ChileFil: Bruna, Flavia Alejandra. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Medicina y Biología Experimental de Cuyo; Argentina. Universidad Nacional de Cuyo. Facultad de Odontologia; Argentin
    • …
    corecore