1,089 research outputs found

    Author Profiling and Plagiarism Detection

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-25485-2_6In this chapter we introduce the topics that we will cover in the RuSSIR 2014 course on Author Profiling and Plagiarism Detection (APPD). Author profiling distinguishes between classes of authors studying how language is shared by classes of people. This task helps in identifying profiling aspects such as gender, age, native language, or even personality type. In case of the plagiarism detection task we are not interested in studying how language is shared. On the contrary, given a document we are interested in investigating if the writing style changes in order to unveil text inconsistencies, i.e., unexpected irregularities through the document such as changes in vocabulary, style and text complexity. In fact, when it is not possible to retrieve the source document(s) where plagiarism has been committed from, the intrinsic analysis of the suspicious document is the only way to find evidence of plagiarism. The difficulty in retrieving the source of plagiarism could be due to the fact that the documents are not available on the web or the plagiarised text fragments were obfuscated via paraphrasing or translation (in case the source document was in another language). In this overview, we also discuss the results of the shared tasks on author profiling (gender and age identification) and plagiarism detection that we help to organise at the PAN Lab on Uncovering Plagiarism, Authorship, and Social Software Misuse.The PAN shared tasks on author profil-ing and on plagiarism detection have been organised in the framework of the WIQ-EIIRSES project (Grant No. 269180) within the EC FP 7 Marie Curie People. The research work described in the paper was carried out in the framework of the DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project, and the VLC/CAMPUS Microcluster on Multimodal Interaction inIntelligent Systems.Rosso, P. (2015). Author Profiling and Plagiarism Detection. En Information Retrieval. Springer. 229-250. https://doi.org/10.1007/978-3-319-25485-2_6S229250Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. TEXT 23, 321–346 (2003)Association of Teachers and Lecturers. School work plagued by plagiarism - ATL survey. Technical report, Association of Teachers and Lecturers, London, UK (2008). (Press release)Barrón-Cedeño, A.: On the mono- and cross-language detection of text re-use and plagiarism. Ph.D. thesis, Universitat Politènica de València (2012)Barrón-Cedeño, A., Rosso, P., Pinto, D., Juan, A.: On cross-lingual plagiarism analysis using a statistical model. In: Proceedings of the ECAI 2008 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 (2008)Barrón-Cedeño, A., Gupta, P., Rosso, P.: Methods for cross-language plagiarism detection. Knowl. Based Syst. 50, 11–17 (2013)Barrón-Cedeño, A., Vila, M., Martí, M., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39(4), 917–947 (2013)Bogdanova, D., Rosso, P., Solorio, T.: Exploring high-level features for detecting cyberpedophilia. Comput. Speech Lang. 28(1), 108–120 (2014)Braschler, M., Harman, D.: Notebook papers of CLEF 2010 LABs and workshops. Padua, Italy (2010)Cappellato, L., Ferro, N., Halvey, M., Kraaij, W.: CLEF 2014 labs and workshops, notebook papers. In: CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613–0073 (2014). http://ceur-ws.org/Vol-1180/Comas, R., Sureda, J., Nava, C., Serrano, L.: Academic cyberplagiarism: a descriptive and comparative analysis of the prevalence amongst the undergraduate students at Tecmilenio University (Mexico) and Balearic Islands University (Spain). In: Proceedings of the International Conference on Education and New Learning Technologies (EDULEARN 2010), Barcelona (2010)Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221–233 (1948)Flores, E., Barrón-Cedeño, A., Rosso, P., Moreno, L.: Desocore: detecting source code re-use across programming languages. In: Proceedings of 12th International Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-2012, pp. 1–4, Montreal, Canada (2012)Flores, E., Barrón-Cedeño, A., Moreno, L., Rosso, P.: Uncovering source code re-use in large-scale programming environments. In: Computer Applications in Engineering and Education, Accepted (2014). doi: 10.1002/cae.21608Forner, P., Navigli, R., Tufis, D.: CLEF 2013 evaluation labs and workshop - working notes papers, 23–26 September. Valencia, Spain (2013)Franco-Salvador, M., Gupta, P., Rosso, P.: Cross-Language plagiarism detection using a multilingual semantic network. In: Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E., Serdyukov, P. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 710–713. Springer, Heidelberg (2013)Franco-Salvador, M., Gupta, P., Rosso, P.: Knowledge graphs as context models: improving the detection of cross-language plagiarism with paraphrasing. In: Ferro, N. (ed.) PROMISE Winter School 2013. LNCS, vol. 8173, pp. 227–236. Springer, Heidelberg (2014)Gollub, T., Stein, B., Burrows, S.: Ousting Ivory tower research: towards a web framework for providing experiments as a service. In: Hersh, B., Callan, J., Maarek, Y., Sanderson, M., (eds.) 35th International ACM Conference on Research and Development in Information Retrieval (SIGIR 2012), pp. 1125–1126. ACM, August 2012. ISBN 978-1-4503-1472-5. doi: 10.1145/2348283.2348501Gollub, T., Hagen, M., Michel, M., Stein, B.: From keywords to keyqueries: content descriptors for the web. In: Gurrin, C., Jones, G., Kelly, D., Kruschwitz, U., de Rijke, M., Sakai, T., Sheridan, P., (eds.) 36th International ACM Conference on Research and Development in Information Retrieval (SIGIR 2013), pp. 981–984. ACM (2013)Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Adar, E., Hurst, M., Finin, T., Glance, N.S., Nicolov, N., Tseng, B.L., (eds.) ICWSM. The AAAI Press (2009)Gressel, G., Hrudya, P., Surendran, K., Thara, S., Aravind, A., Prabaharan, P.: Ensemble Learning Approach for Author Profiling-Notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Grozea, C., Popescu, M.: ENCOPLOT - performance in the Second International Plagiarism Detection Challenge lab report for PAN at CLEF 2010. In: Braschler and Harman [8]Grozea, C., Gehl, C., Popescu, M.: ENCOPLOT: pairwise sequence matching in linear time applied to plagiarism detection. In: Stein et al., (ed.) Overview of the 1st International Competition on Plagiarism Detection, pp. 10–18 (2009)Gunning, R.: The Technique of Clear Writing. McGraw-Hill Int. Book Co, New York (1952)Gupta, P., Barrón-Cedeño, A., Rosso, P.: Cross-language high similarity search using a conceptual thesaurus. In: Catarci, T., Peñas, A., Santucci, G., Forner, P., Hiemstra, D. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 67–75. Springer, Heidelberg (2012)Honore, A.: Some simple measures of richness of vocabulary. Assoc. Lit. Linguist. Comput. Bull. 7(2), 172–177 (1979)IEEE. A Plagiarism FAQ. http://www.ieee.org/publications_standards/publications/rights/plagiarism_FAQ.html (2008). Published: 2008; Last Accessed 25 November 2012Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Lit. Linguist. Comput. 17(4), 401–412 (2002)Liau, Y., Vrizlynn, L.: Submission to the author profiling competition at pan-2014. In: Proceedings Recent Advances in Natural Language Processing III (2014). http://www.webis.de/research/events/pan-14Lopez-Monroy, A.P., Montes-Y-Gomez, M., Escalante, H.J., Villaseñor-Pineda, L., Villatoro-Tello, E.: INAOE’s participation at PAN 2013: author profiling task–notebook for PAN at CLEF 2013. In: Forner, et al. [14]Pastor López-Monroy, A., Montes y Gómez, M., Escalante, H.J., Villaseñor-Pineda, L.: Using Intra-profile information for author profiling-notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Maharjan, S., Shrestha, P., Solorio, T.: A simple approach to author profiling in MapReduce–notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Marquardt, J., Fanardi, G., Vasudevan, G., Moens, M.F., Davalos, S., Teredesai, A., De Cock, M.: Age and gender identification in social media-notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Martin, B.: Plagiarism: policy against cheating or policy for learning? Nexus (Newsl. Aust. Sociol. Assoc.) 16(2), 15–16 (2004)Mcnamee, P., Mayfield, J.: Character n-gram tokenization for european language text retrieval. Inf. Retr. 7(1), 73–97 (2004)Meina, M., Brodzinska, K., Celmer, B., Czokow, M., Patera, M., Pezacki, J., Wilk, M.: Ensemble-based classification for author profiling using various features-notebook for PAN at CLEF 2013. In: Forner, et al. [14]Eissen, S.M., Stein, B.: Intrinsic plagiarism detection. In: Tombros, A., Yavlinsky, A., Rüger, S.M., Tsikrika, T., Lalmas, M., MacFarlane, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 565–569. Springer, Heidelberg (2006)Montes y Gómez, M., Gelbukh, A.F., López-López, A., Baeza-Yates, R.A.: Flexible comparison of conceptual graphs. In: Proceedings DEXA, pp. 102–111 (2001)Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)Nawab, R.M.A., Stevenson, M., Clough, P.: University of sheffield lab report for pan at clef 2010. In: Braschler and Harman [8]Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “how old do you think i am?”; a study of language and age in twitter. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (2013)Oberreuter, G., Eiselt, A.: Submission to the 6th international competition on plagiarism detection, From Innovand.io, Chile (2014). http://www.webis.de/research/events/pan-14Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)Palkovskii, Y., Belov, A.: Developing high-resolution universal multi-type N-Gram plagiarism detector-notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Ann. Rev. Psychol. 54(1), 547–577 (2003)Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An evaluation framework for plagiarism detection. In: COLING 2010: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 997–1005 (2010)Potthast, M., Stein, B., Anderka, M.: A wikipedia-based multilingual retrieval model. In: Plachouras, V., Macdonald, C., Ounis, I., White, R.W., Ruthven, I. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008)Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.:. Overview of the 1st international competition on plagiarism detection. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E., (eds.) Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 1–9, 2009. CEUR-WS.org (September 2009). http://ceur-ws.org/Vol-502Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd International Competition on Plagiarism Detection. In: Braschler and Harman [8]Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd international competition on plagiarism detection. In: Braschler, M., Harman, D., Pianta, E., (eds.) Working Notes Papers of the CLEF 2010 Evaluation Labs (September 2010) 2010. http://www.clef-initiative.eu/publication/working-notesPotthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-language plagiarism detection. Lang. Resour. Eval. 45(1), 45–62 (2011)Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: Petras, V., Forner, P., Clough, P., (eds.) Working Notes Papers of the CLEF 2011 Evaluation Labs (September 2011) (2011). http://www.clef-initiative.eu/publication/working-notesPotthast, M., Gollub, T., Hagen, M., Grabegger, J., Kiesel, J., Michel, M., Oberlander, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th international competition on plagiarism detection. In: Forner, P., Karlgren, J., Womser-Hacker, C., (eds.) Working Notes Papers of the CLEF 2012 Evaluation Labs (September 2012) (2012). http://www.clef-initiative.eu/publication/working-notesPotthast, M., Hagen, M., Stein, B., Grabegger, J., Michel, M., Tippmann, M., Welsch, C.: Chatnoir: a search engine for the clueweb09 corpus. In: Hersh, B., Callan, J., Maarek, Y., Sanderson, M., (eds.) 35th International ACM Conference on Research and Development in Information Retrieval (SIGIR 2012), p. 1004 (2012)Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th international competition on plagiarism detection. In: Forner, et al. [14]Potthast, M., Hagen, M., Beyer, A., Busse, M., Tippmann, M., Rosso, P., Stein, B.: Overview of the 6th International Competition on Plagiarism Detection. In: Cappellato, et al. [9]Pouliquen, B., Steinberger, R., Ignat, C.: Automatic linking of similar texts across languages. In: Proceedings of Recent Advances in Natural Language Processing III, RANLP 2003, pp. 307–316 (2003)Prakash, A., Saha, S.: Experiments on document chunking and query formation for plagiarism source retrieval-notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013–notebook for PAN at CLEF 2013. In: Forner, et al. [14]Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkman, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at PAN 2014–notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Sanchez-Perez, M., Sidorov, G., Gelbukh, A.: A winning approach to text alignment for text reuse detection at PAN 2014-notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 199–205. AAAI (2006)Stamatatos, E.: Intrinsic plagiarism detection using character n-gram profiles. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E., (eds.) Proceedings of the SEPLN09 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 38–46, 2009. CEUR-WS.org, September 2009. http://ceur-ws.org/Vol-502Stein, B., Meyer zu Eissen, S., Potthast, M.: Strategies for retrieving plagiarized documents. In: Clarke, C., Fuhr, N., Kando, N., Kraaij, W., de Vries, A., (eds.) 30th International ACM Conference on Research and Development in Information Retrieval (SIGIR 2007), pp. 825–826. ACM (2007)Stein, B., Potthast, M., Rosso, P., Barrón-Cedeño, A., Stamatatos, E., Koppel, M.: Fourth international workshop on uncovering plagiarism, authorship, and social software misuse. ACM SIGIR Forum 45, 45–48 (2011)Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The jrc-acquis: a multilingual aligned parallel corpus with +20 languages. In: Proceedings of 5th International Conference on language resources and evaluation LREC 2006 (2006)Suchomel, S., Brandejs, M.: Heterogeneous queries for synoptic and phrasal search-notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Villena-Román, J., González-Cristóbal, J.C.: DAEDALUS at PAN 2014: Guessing Tweet Author’s Gender and Age-Notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Vossen, P.: Eurowordnet: a multilingual database of autonomous and language-specific wordnets connected via an inter-lingual index. Int. J. Lexicography 17, 161–173 (2004)Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 783–792 (2010)Weren, E.R.D., Moreira, V.P., de Oliveira, J.P.M.:. Exploring information retrieval features for author profiling-notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Williams, K., Chen, H.H., Giles, C.: Supervised ranking for plagiarism source retrieval-notebook for PAN at CLEF 2014. In: Cappellato, et al. [9]Yule, G.: The Statistical Study of Literary Vocabulary. Cambridge University press, Cambridge (1944)Zubarev, D., Sochenkov, I.: Using sentence similarity measure for plagiarism source retrieval-notebook for PAN at CLEF 2014. In: Cappellato, L., et al. [9

    Un ejemplo de cooperación de área vasta. La experiencia y las perspectivas de desarrollo en la Eurorregión Adriática

    Get PDF
    El artículo analiza el caso de estudio de la Eurorregión Adriática (EA) para ejemplificar la emergencia de la cooperación de área vasta. Este modelo se considera el último desafío de la cooperación transnacional en Europa, puesto que se requieren motivaciones sólidas para cooperar, y la dimensión y el número de participantes conlleva problemas. Empezaremos introduciendo las características y las perspectivas de la EA, luego, entraremos en el core business, destacando tres puntos principales de él: el lobbying de los presidentes, la articulación operacional y la programación estratégica. En el texto, se resalta el papel del mar Adriático como factor de ventaja absoluta para la cooperación, y se termina destacando las condiciones (necesarias pero no suficientes) para la sostenibilidad de dicha cooperación a largo plazo.L'article analitza el cas d'estudi de l'Euroregió Adriàtica (EA), amb l'objectiu d'exemplificar l'emergència de la cooperació d'àrea àmplia. Aquest model és considerat el darrer desafiament de la cooperació transnacional a Europa, atès que es demanen motivacions sòlides per cooperar, i la dimensió i el nombre de participants comporta problemes. Començarem introduint-hi les característiques i les perspectives de l'EA, després, entrarem al core business destacant-ne tres aspectes principals: el lobbying dels presidents, l'articulació operacional i la programació estratègica. Al text, s'hi subratlla el paper de la mar Adriàtica com a factor d'avantatge absolut per a la cooperació, i es clou destacant les condicions (necessàries però no suficients) per a la sostenibilitat d'aquesta cooperació a llarg termini.L'article analyse le cas d'étude de l'Eurorégion Adriatique (EA) pour illustrer l'urgence de la «coopération de zone vaste». Ce modèle de coopération est considéré le dernier défi de la coopération transnationale en Europe: on exige des motivations solides pour coopérer et la dimension et le nombre de participants entraînent des problèmes. Nous commencerons à introduire les caractéristiques et les perspectives de l'EA, ensuite nous entrerons dans le coeur business en soulignant trois points principaux: le lobbying des Présidents, l'articulation opérationnelle et la programmation stratégique. On remarque le rôle de la mer Adriatique comme facteur d'avantage absolu pour la coopération et on termine en soulignant les conditions (nécessaires mais non suffisantes) pour le soutien à long terme de cette coopération.The article analyzes the case study of the Adriatic Euroregion (AE) to exemplify the emergence of «vast area cooperation». This kind of cooperation is considered to be the last challenge of transnational cooperation in Europe; solid motivations are needed to cooperate, and the dimension and number of participants is a source of problems. We will start by introducing the characteristics and perspectives of the AE, and then we will enter the core business, emphasizing three main points: the lobbying of Presidents, operational articulation and strategic programming. The role of the Adriatic Sea as a factor of absolute advantage for cooperation is highlighted. We conclude by emphasizing the conditions (necessary but not sufficient) for long-term sustainability of the above-mentioned cooperation

    Detecting Deceptive Opinions: Intra and Cross-domain Classification using an Efficient Representation

    Full text link
    Electronic versíon of an article published as International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 25, 2, 2017, 151-174. DOI:10.1142/S0218488517400165 © copyright World Scientific Publishing Company. https://www.worldscientific.com/worldscinet/ijufks[EN] Online opinions play an important role for customers and companies because of the increasing use they do to make purchase and business decisions. A consequence of that is the growing tendency to post fake reviews in order to change purchase decisions and opinions about products and services. Therefore, it is really important to filter out deceptive comments from the retrieved opinions. In this paper we propose the character n-grams in tokens, an efficient and effective variant of the traditional character n-grams model, which we use to obtain a low dimensionality representation of opinions. A Support Vector Machines classifier was used to evaluate our proposal on available corpora with reviews of hotels, doctors and restaurants. In order to study the performance of our model, we make experiments with intra and cross-domain cases. The aim of the latter experiment is to evaluate our approach in a realistic cross-domain scenario where deceptive opinions are available in a domain but not in another one. After comparing our method with state-of-the-art ones we may conclude that using character n-grams in tokens allows to obtain competitive results with a low dimensionality representation.This publication was made possible by NPRP grant #9-175-1-033 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Cagnina, L.; Rosso, P. (2017). Detecting Deceptive Opinions: Intra and Cross-domain Classification using an Efficient Representation. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems. 25(2):151-174. https://doi.org/10.1142/S0218488517400165S151174252Cambria, E. (2016). Affective Computing and Sentiment Analysis. IEEE Intelligent Systems, 31(2), 102-107. doi:10.1109/mis.2016.31Cambria, E., & Hussain, A. (2015). Sentic Computing. Cognitive Computation, 7(2), 183-185. doi:10.1007/s12559-015-9325-0Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software. ACM SIGKDD Explorations Newsletter, 11(1), 10-18. doi:10.1145/1656274.1656278Hancock, J. T., Curry, L. E., Goorha, S., & Woodworth, M. (2007). On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication. Discourse Processes, 45(1), 1-23. doi:10.1080/01638530701739181Hernández Fusilier, D., Montes-y-Gómez, M., Rosso, P., & Guzmán Cabrera, R. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management, 51(4), 433-443. doi:10.1016/j.ipm.2014.11.001Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1), 50-60. doi:10.1214/aoms/1177730491MONTAÑÉS, E., QUEVEDO, J. R., COMBARRO, E. F., DÍAZ, I., & RANILLA, J. (2007). A HYBRID FEATURE SELECTION METHOD FOR TEXT CATEGORIZATION. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 15(02), 133-151. doi:10.1142/s0218488507004492Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying Words: Predicting Deception from Linguistic Styles. Personality and Social Psychology Bulletin, 29(5), 665-675. doi:10.1177/0146167203029005010Raudys, S. J., & Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(3), 252-264. doi:10.1109/34.75512Wang, G., Xie, S., Liu, B., & Yu, P. S. (2012). Identify Online Store Review Spammers via Social Review Graph. ACM Transactions on Intelligent Systems and Technology, 3(4), 1-21. doi:10.1145/2337542.2337546Webb, G. I. (2000). Machine Learning, 40(2), 159-196. doi:10.1023/a:100765951484

    On the multilingual and genre robustness of EmoGraphs for author profiling in social media

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-24027-5_28Author profiling aims at identifying different traits such as age and gender of an author on the basis of her writings. We propose the novel EmoGraph graph-based approach where morphosyntactic categories are enriched with semantic and affective information. In this work we focus on testing the robustness of EmoGraphs when applied to age and gender identification. Results with PAN-AP-14 corpus show the competitiveness of the representation over genres and languages. Finally, some interesting insights are shown, for example with topic and emotion bounded genres such as hotel reviews.The research has been carried out in the framework of the European Commission WIQ-EI IRSES (no. 269180) and DIANA - Finding Hidden Knowledge in Texts (TIN2012-38603-C02) projects. The work of the first author was partially funded by Autoritas Consulting SA and by Spanish Ministry of Economics under grant ECOPORTUNITY IPT-2012-1220-430000.Rangel, F.; Rosso, P. (2015). On the multilingual and genre robustness of EmoGraphs for author profiling in social media. En Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th International Conference of the CLEF Association, CLEF'15, Toulouse, France, September 8-11, 2015, Proceedings. Springer International Publishing. 274-280. https://doi.org/10.1007/978-3-319-24027-5_28S274280Argamon, S., Koppel, M., Fine, J., Shimoni, A.: Gender, genre, and writing style informal written texts. TEXT 23, 321–346 (2003)Levin, B.: English Verb Classes and Alternations. University of Chicago Press, Chicago (1993)Mohammad, S.M., Yang, T.: Tracking sentiment in mail: how gender differ on emotional axes. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (2011)Pennebaker, J.W.: The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury Press (2011)Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Information Processing & Management, Special Issue on Emotion and Sentiment in Social and Expressive Media (in press, 2015)Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at pan 2014. In: Cappellato L., Ferro N., Halvey M., Kraaij, W. (eds.) CLEF 2014 Labs and Workshops, Notebook Papers. CEUR-WS.org, vol. 1180 (2014)Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at pan 2013. In: Forner, P., Navigli, R., Tufis, D. (eds.) Notebook Papers of CLEF 2013 LABs and Workshops. CEUR-WS.org, vol. 1179 (2013)Sidorov, G., Miranda-Jimnez, S., Viveros-Jimnez, F., Gelbukh, F., Castro-Snchez, N., Velsquez, F., Daz-Rangel, I., Surez-Guerra, S., Trevio, A., Gordon-Miranda, J.: Empirical study of opinion mining in spanish tweets. In: 11th Mexican International Conference on Artificial Intelligence, MICAI, pp. 1–4 (2012)Strapparava, C., Valitutti, A.: Wordnet-affect: an affective extension of wordnet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon (2004

    Deep Learning Architectures and Strategies for Early Detection of Self-harm and Depression Level Prediction

    Full text link
    [EN] This paper summarizes the contributions of the PRHLT- UPV team as a participant in the eRisk 2020 tasks on self-harm detection and prediction of depression levels from social media. Computational methods based on machine learning and natural language processing have a great potential to assist with early detection of mental disorders of social media users, based on their online activity.We use multi-dimensional representations of language, and compare various deep learning models' performance, exploring rarely approached avenues in previous research, including hierarchical deep learning architectures and pre-trained transformers and language models.The work of Paolo Rosso was in the framework of the research project PROMETEO/2019/121 (DeepPattern) by the Generalitat Valenciana.Uban, A.; Rosso, P. (2020). Deep Learning Architectures and Strategies for Early Detection of Self-harm and Depression Level Prediction. CEUR Workshop Proceedings. 2696:1-12. http://hdl.handle.net/10251/166536S112269

    On the difficulty of automatically detecting irony: beyond a simple case of negation

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10115-013-0652-8It is well known that irony is one of the most subtle devices used to, in a refined way and without a negation marker, deny what is literally said. As such, its automatic detection would represent valuable knowledge regarding tasks as diverse as sentiment analysis, information extraction, or decision making. The research described in this article is focused on identifying key values of components to represent underlying characteristics of this linguistic phenomenon. In the absence of a negation marker, we focus on representing the core of irony by means of three conceptual layers. These layers involve 8 different textual features. By representing four available data sets with these features, we try to find hints about how to deal with this unexplored task from a computational point of view. Our findings are assessed by human annotators in two strata: isolated sentences and entire documents. The results show how complex and subjective the task of automatically detecting irony could be.The research work of Paolo Rosso was done in the framework of the European Commission WIQ-EI Web Information Quality Evaluation Initiative (IRSES grant no. 269180) project within the FP 7 Marie Curie People, the DIANA-APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Reyes Pérez, A.; Rosso, P. (2014). On the difficulty of automatically detecting irony: beyond a simple case of negation. Knowledge and Information Systems. 40(3):595-614. https://doi.org/10.1007/s10115-013-0652-8S595614403Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput Linguistics 34(4):555–596Atserias J, Casas B, Comelles E, González M, Padró L, Padró M (2006) Freeling 1.3: syntactic and semantic services in an open-source nlp library. In: Proceedings of the 5th international conference on language resources and evaluation, pp 48–55Attardo S (2007) Irony as relevant inappropriateness. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 135–174Banerjee S, Agarwal N (2012) Analyzing collective behavior from blogs using swarm intelligence. Knowl Inf Syst. doi: 10.1007/s10115-012-0512-yBeydoun G, Hoffmann A (2012) Dynamic evaluation of the development process of knowledge-based information systems. Knowl Inf Syst. doi: 10.1007/s10115-012-0491-zBurfoot C, Baldwin T (2009) Automatic satire detection: are you having a laugh? In: ACL-IJCNLP ’09: proceedings of the ACL-IJCNLP 2009 conference short papers, pp 161–164Carvalho P, Sarmento L, Silva M, de Oliveira E (2009) Clues for detecting irony in user-generated contents: oh...!! It’s “so easy”; -). In: TSA ’09: proceeding of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion. ACM, Hong Kong, China, pp 53–56Clark H, Gerrig R (1984) On the pretense theory of irony. J Exp Psychol Gen 113(1):121–126Colston H (2007) On necessary conditions for verbal irony comprehension. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 97–134Colston H, Gibbs R (2007) A brief history of irony. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 3–24Curcó C (2007) Irony: negation, echo, and metarepresentation. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 269–296Davidov D, Tsur O, Rappoport A (2010) Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: Proceedings of the 14th conference on computational natural language learning, CoNLL ’10. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 107–116Francisco V, Gervás P, Peinado F (2010) Ontological reasoning for improving the treatment of emotions in text. Knowl Inf Syst 24(2):23Gibbs R (2007) Irony in talk among friends. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 339–360Gibbs R, Colston H (2007) The future of irony studies. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, LondonGiora R (1995) On irony and negation. Discourse Process 19(2):239–264Giora R, Balaban N, Fein O, Alkabets I (2005) Negation as positivity in disguise. In: Colston H, Katz A (eds) Figurative language comprehension: social and cultural influences. Erlbaum, Hillsdale, pp 233–258Giora R, Federman S, Kehat A, Fein O, Sabah H (2005) Irony aptness. Humor 18:23–39Grice H (1975) Logic and conversation. In: Cole P, Morgan JL (eds) Syntax and semantics, vol 3. Academic Press, New York, pp 41–58Horn L, Kato Y (2000) Introduction: negation and polarity at the millennium. In: Horn L, Kato Y (eds) Studies in negation and polarity. Oxford University Press, Oxford, pp 1–19Kaup B, Lüdtke J, Zwaan R (2006) Processing negated sentences with contradictory predicates: is a door that is not open mentally closed? J Pragmat 38:1033–1050Kisilevich S, Ang CS, Last M (2011) Large-scale analysis of self-disclosure patterns among online social networks users: A Russian context. Knowl Inf Syst. doi: 10.1007/s10115-011-0443-zKreuz R (2001) Using figurative language to increase advertising effectiveness. In: Office of Naval Research Military Personnel Research Science Workshop. University of Memphis, Memphis, TNKumon-Nakamura S, Glucksberg S, Brown M (2007) How about another piece of pie: the allusional pretense theory of discourse irony. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, LondonLangacker R (1991) Concept, image and symbol, the cognitive basis of grammar. Mounton de Gruyter, BerlinLiu J, Wang K (2012) Anonymizing bag-valued sparse data by semantic similarity-based clustering. Knowl Inf Syst. doi: 10.1007/s10115-012-0515-8Lucariello J (2007) Situational irony: a concept of events gone away. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 467–498Miller G (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the ACL, pp 271–278Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Morristown, NJ, USA, pp 79–86Pedersen T, Patwardhan S, Michelizzi J (2004) Wordnet:similarity—measuring the relatedness of concepts. In: Proceeding of the 9th national conference on artificial intelligence (AAAI-04). Association for Computational Linguistics, Morristown, NJ, USA, pp 1024–1025Reyes A, Rosso P (2011) Mining subjective knowledge from customer reviews: a specific case of irony detection. In: Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (WASSA 2.011). Association for Computational Linguistics, pp 118–124Reyes A, Rosso P (2012) Making objective decisions from subjective data: detecting irony in customers reviews. Decis Support Syst 53(4):754–760. doi: 10.1016/j.dss.2012.05.027Reyes A, Rosso P, Buscaldi D (2012) From humor recognition to irony detection: the figurative language of social media. Data Knowl Eng 74:1–12. doi: 10.1016/j.datak.2012.02.005Sarmento L, Carvalho P, Silva M, de Oliveira E (2009) Automatic creation of a reference corpus for political opinion mining in user-generated content, In: TSA ’09: proceeding of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion. ACM, Hong Kong, China, pp 29–36Sperber D, Wilson D (1992) On verbal irony. Lingua 87:53–76Tsur O, Davidov D, Rappoport A (2010) ICWSM—a great catchy name: semi-supervised recognition of sarcastic sentences in online product reviews. In: Cohen WW, Gosling S (eds) Proceedings of the 4t international AAAI conference on weblogs and social media. The AAAI Press, Washington, DC, pp 162–169Utsumi A (1996) A unified theory of irony and its computational formalization. In: Proceedings of the 16th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 962–967Veale T, Hao Y (2009) Support structures for linguistic creativity: a computational analysis of creative irony in similes. In: Proceedings of CogSci 2009, the 31st annual meeting of the cognitive science society, pp 1376–1381Veale T, Hao Y (2010) Detecting ironic intent in creative comparisons. In: Proceedings of 19th European conference on artificial intelligence—ECAI 2010. IOS Press, Amsterdam, The Netherlands, pp 765–770Whissell C (2009) Using the revised dictionary of affect in language to quantify the emotional undertones of samples of natural language. Psychol Rep 105(2):509–521Wilson D, Sperber D (2007) On verbal irony. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 35–56Zagibalov T, Belyatskaya K, Carroll J (2010) Comparable English-Russian book review corpora for sentiment analysis. In: Proceedings of the 1st workshop on computational approaches to subjectivity and sentiment analysis. Lisbon, Portugal, pp 67–7
    • …
    corecore