138 research outputs found

    Un arbre en perill

    Get PDF

    Spanish sentiment analysis in Twitter at the TASS workshop

    Full text link
    [EN] This paper describes a support vector machine-based approach to different tasks related to sentiment analysis in Twitter for Spanish. We focus on parameter optimization of the models and the combination of several models by means of voting techniques. We evaluate the proposed approach in all the tasks that were defined in the five editions of the TASS workshop, between 2012 and 2016. TASS has become a framework for sentiment analysis tasks that are focused on the Spanish language. We describe our participation in this competition and the results achieved, and then we provide an analysis of and comparison with the best approaches of the teams who participated in all the tasks defined in the TASS workshops. To our knowledge, our results exceed those published to date in the sentiment analysis tasks of the TASS workshops.This work has been partially funded by the Spanish MINECO and FEDER founds under project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics, TIN2014-54288-C4-3-R.Pla Santamaría, F.; Hurtado Oliver, LF. (2018). Spanish sentiment analysis in Twitter at the TASS workshop. Language Resources and Evaluation. 52(2):645-672. https://doi.org/10.1007/s10579-017-9394-7S645672522Álvarez-López, T., Juncal-Martínez, J., Fernández-Gavilanes, M., Costa-Montenegro, E., González-Castaño, F.J., Cerezo-Costas, H. , & Celix-Salgado, D. (2015). GTI-gradiant at TASS 2015: A hybrid approach for sentiment analysis in Twitter. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 35–40), Alicante, Spain, September 15, 2015.Álvarez-López, T., Fernández-Gavilanes, M., García-Méndez, S., Juncal-Martínez, J., & González-Castaño, F.J. (2016). GTI at TASS 2016: Supervised approach for aspect based sentiment analysis in Twitter. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 53–57), Salamanca, Spain, September 13th, 2016.Araque, O., Corcuera, I., Román, C., Iglesias, C. A., & Sánchez-Rada, J. F. (2015). Aspect based sentiment analysis of Spanish tweets. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 29–34), Alicante, Spain, September 15, 2015.Balahur, A., & Perea-Ortega, J. M. (2013). Experiments using varying sizes and machine translated data for sentiment analysis in Twitter. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Barbosa, L., & Feng, J. (2010). Robust sentiment detection on Twitter from biased and noisy data. In Proceedings of the 23rd international conference on computational linguistics: posters, association for computational linguistics (pp. 36–44).Batista, F., & Ribeiro, R. (2012). The L2F Strategy for Sentiment Analysis and Topic Classification. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Casasola Murillo, E., & Marín Raventós, G. (2016). Evaluación de Modelos de Representación del Texto con Vectores de Dimensiónn Reducida para Análisis de Sentimiento. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 23–28), Salamanca, Spain, September 13th, 2016.Castellano, A., Cigarrán, J. & García-Serrano, A. (2012). UNED @ TASS: Using IR techniques for topic-based sentiment analysis through divergence models. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Castellanos-González, A., Cigarrán-Recuero, J. & García-Serrano, A. (2013). UNED LSI @ TASS 2013: Considerations about textual representation for IR based tweet classification. In: Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Cerón-Guzmán, J. A. (2016). JACERONG at TASS 2016: An ensemble classifier for sentiment analysis of Spanish tweets at global level. In: Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 35–39), Salamanca, Spain, September 13th, 2016.del-Hoyo-Alonso, R., Hupont, I., & Lacueva, F. (2013). Affective polarity word discovering by means of artificial general intelligence techniques. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.del-Hoyo-Alonso, R., de la Vega Rodrigalvarez-Chamorro, M., Vea-Murguía, J., & Montañes-Salas, R. M. (2015). Ensemble algorithm with syntactical tree features to improve the opinion analysis. In Proceedings of TASS 2015: workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 53–58), Alicante, Spain, September 15, 2015.Deriu, J., Gonzenbach, M., Uzdilli, F., Lucchi, A., De Luca, V., & Jaggi, M. (2016). Swisscheese at semeval-2016 task 4: Sentiment classification using an ensemble of convolutional neural networks with distant supervision. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 1124–1128), Association for Computational Linguistics, San Diego, California, http://www.aclweb.org/anthology/S16-1173 .Díaz-Galiano, M. C., & Montejo-Ráez, A. (2015). Participación de SINAI DW2Vec en TASS 2015. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 59–64), Alicante, Spain, September 15, 2015.Fernández, J., Gutiérrez, Y., Tomás, D., Gómez, J. M. & Martínez-Barco, P. (2015). Evaluating a sentiment analysis approach from a business point of view. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 93–98), Alicante, Spain, September 15, 2015.Fernández, J., Gutiérrez, Y., Gómez, J.M., Martínez-Barco, P., Montoyo A., & Muñoz, R. (2013). Sentiment analysis of Spanish Tweets using a ranking algorithm and skipgrams. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Frank, E., Hall, M. A., & Witten, I. H. (2016). The WEKA workbench. Online appendix for “Data mining: Practical machine learning tools and techniques” (4th ed.). Burlington: Morgan Kaufmann.Gamallo, P., García, M. & Fernández-Lanza, S. (2013). TASS: A Naive-Bayes strategy for sentiment analysis on Spanish tweets. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.García Cumbreras, M. Á., Martínez Cámara, E., Villena-Román, J., & García Morera, J. (2016a). TASS 2015—The evolution of the Spanish opinion mining systems. Procesamiento del Lenguaje Natural.García Cumbreras, M. Á., Villena Román, J., Martínez Cámara, E., Díaz Galiano, M. C., Martín Valdivia, M. T., & Ureña López, L. A. (2016b). Overview of TASS 2016. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 13–21), Salamanca, Spain, September 13th, 2016.García, D., & Thelwall, M. (2013). Political alignment and emotional expression in Spanish Tweets. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Hagen, M., Potthast, M., Büchner, M., & Stein, B. (2015). Webis: An ensemble for twitter sentiment detection. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 582–589), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2097 .Hamdan, H., Bellot, P., & Bechet, F. (2015). Lsislif: Crf and logistic regression for opinion target extraction and sentiment polarity analysis. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 753–758), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2128 .Hernández Petlachi, R., & Li, X. (2014). Análisis de sentimiento sobre textos en Español basado en aproximaciones semánticas con reglas lingüísticas. In Proceedings of the TASS workshop at SEPLN 2014.Hurtado, L.F. , & Pla, F. (2014). ELiRF-UPV en TASS 2014: Análisis de Sentimientos, Detección de Tópicos y Análisis de Sentimientos de Aspectos en Twitter. In Proceedings of the TASS workshop at SEPLN 2014.Hurtado, L. F., & Pla, F. (2016). ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 47–51), Salamanca, Spain, September 13th, 2016.Hurtado, L. F., Pla, F., & Buscaldi, D. (2015). ELiRF-UPV en TASS 2015: Análisis de Sentimientos en Twitter. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 75–79), Alicante, Spain, September 15, 2015.Jansen, B. J., Zhang, M., Sobel, K., & Chowdury, A. (2009). Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60(11), 2169–2188.Jiménez Zafra, S. M., Martínez Cámara, E., Martín Valdivia, M. T., & Ureña López, L. A. (2014) SINAI-ESMA: An unsupervised approach for sentiment analysis in Twitter. In Proceedings of the TASS workshop at SEPLN 2014.Liu, B. (2012). Sentiment analysis and opinion mining. A comprehensive introduction and survey. San Rafael: Morgan & Claypool Publishers.Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on world wide web (pp. 342–351), ACM, New York, NY, USA, WWW ’05, doi: 10.1145/1060745.1060797 , http://doi.acm.org/10.1145/1060745.1060797Martínez-Cámara, E., Martín-Valdivia, M. T., Ureña-López, L. A., & Montejo-Raéz, A. (2014). Sentiment analysis in Twitter. Natural Language Engineering, 1(1), 1–28.Martínez-Cámara, E., García-Cumbreras, M.Á., Martín-Valdivia, M. T., & López, L. A. U. (2015). SINAI-EMMA: Vectores de Palabras para el Análisis de Opiniones en Twitter. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 41–46), Alicante, Spain, September 15, 2015.Martín-Wanton, T., & de Albornoz, J. C. (2012). UNED at TASS 2012: Polarity classification and trending topic system. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Martínez-Cámara, E., Ángel García-Cumbreras, M., Martín-Valdivia, M. T., & Ureña-López, L. A. (2013). SINAI-EMML: Combinación de Recursos Lingüíticos para el Análisis de la Opinión en Twitter. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Martínez-Cámara, E., Martín-Valdivia, M. T., Molina-González, M. D., & Ureña-López, L. A. (2013). Bilingual experiments on an opinion comparable corpus. In Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 87–93).Mendizabal, I., & Carandell, J. (2015). BittenPotato: Tweet sentiment analysis by combining multiple classifiers. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 71–74), Alicante, Spain, September 15, 2015.Mohammad, S., Kiritchenko, S., & Zhu, X. (2013). Nrc-canada: Building the state-of-the-art in sentiment analysis of tweets. In Second joint conference on lexical and computational semantics (*SEM), Volume 2: Proceedings of the seventh international workshop on semantic evaluation (SemEval 2013) (pp. 321–327), Association for Computational Linguistics, Atlanta, Georgia, USA, http://www.aclweb.org/anthology/S13-2053 .Montejo-Ráez, A., & Díaz-Galiano, M. C. (2016). Participación de SINAI en TASS 2016. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 41–45), Salamanca, Spain, September 13th, 2016.Montejo-Ráez, A., Díaz-Galiano, M. C., & García-Vega, M. (2013). LSA based approach to TASS 2013. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Montejo-Ráez, A., García-Cumbreras, M., & Díaz-Galiano, M. (2014). Participación de SINAI Word2Vec en TASS 2014. In Proceedings of the TASS workshop at SEPLN 2014.Moreno-Ortiz, A., & Pérez-Hernández, C. (2012). Lexicon-based sentiment analysis of Twitter messages in Spanish. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., & Wilson, T. (2013). SemEval-2013 Task 2: Sentiment analysis in Twitter.Nakov, P., Ritter, A., Rosenthal, S., Stoyanov, V., & Sebastiani, F. (2016). SemEval-2016 Task 4: Sentiment analysis in Twitter. In Proceedings of the 10th international workshop on semantic evaluation (pp. 1–18), Association for Computational Linguistics, San Diego, California, SemEval ’16.O’Connor, B., Krieger, M., & Ahn, D. (2010). TweetMotif: Exploratory search and topic summarization for Twitter. In Cohen, W. W. & Gosling, S. (Eds)., Proceedings of the fourth international conference on weblogs and social media, ICWSM 2010, Washington, DC, USA, May 23-26, 2010, The AAAI Press, http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1540 .Padró, L., & Stanilovsky, E. (2012). FreeLing 3.0: Towards Wider Multilinguality. In Proceedings of the language resources and evaluation conference (LREC 2012), ELRA, Istanbul, Turkey.Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP (pp. 79–86).Park, S. (2015). Sentiment Classification Using Sociolinguistic Clusters. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 99–104), Alicante, Spain, September 15, 2015.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Perea-Ortega, J. M. & Balahur, A. (2014). Experiments on feature replacements for polarity classification of Spanish tweets. In Proceedings of the TASS workshop at SEPLN 2014.Perez-Rosas, V., Banea, C., & Mihalcea, R. (2012). Learning Sentiment Lexicons in Spanish. In: N. C. C. Chair, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, J. Odijk, & S. Piperidis (Eds.), Proceedings of the eight international conference on language resources and evaluation (LREC’12), European Language Resources Association (ELRA), Istanbul, Turkey.Pla, F., & Hurtado, L. F. (2013a) ELiRF-UPV en TASS-2013: Análisis de sentimientos en Twitter. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Pla, F., & Hurtado, L. F. (2013b) ELiRF-UPV en TASS-2013: Análisis de sentimientos en Twitter. In XXIX Congreso de la Sociedad Espanola para el Procesamiento del Lenguaje Natural (SEPLN 2013) TASS (pp. 220–227).Pla, F., & Hurtado, L. F. (2014a) Political tendency identification in Twitter using sentiment analysis techniques. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical Papers (pp. 183–192), Dublin City University and Association for Computational Linguistics, Dublin, Ireland, http://www.aclweb.org/anthology/C14-1019 .Pla, F., & Hurtado, L. F. (2014b) Sentiment analysis in Twitter for Spanish. In International conference on applications of natural language to data bases/information systems (pp. 208–213), Springer International Publishing.Quirós, A., Segura-Bedmar, I., & Martínez, P. (2016). LABDA at the 2016 TASS challenge task: Using word embeddings for the sentiment analysis task. In Proceedings of TASS 2016: workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 29–33), Salamanca, Spain, September 13th, 2016.Ramón Quevedo, J., Luaces, O., & Bahamonde, A. (2012). Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recogn, 45(2), 876–883.Rosenthal, S., Nakov, P., Ritter, A., & Stoyanov, V. (2014). SemEval-2014 Task 9: Sentiment analysis in Twitter. In: P. Nakov, T. Zesch (Eds.), Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, Dublin, Ireland.Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S., Ritter, A., & Stoyanov, V. (2015). SemEval-2015 Task 10: Sentiment analysis in Twitter. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 451–463), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2078 .Rouvier, M., & Favre, B. (2016). SENSEI-LIF at SemEval-2016 task 4: Polarity embedding fusion for robust sentiment analysis. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 202–208), Association for Computational Linguistics, San Diego, California, http://www.aclweb.org/anthology/S16-1030 .San Vicente Roncal, I., & Saralegi Urizar, X. (2014). Looking for features for supervised tweet polarity classification. In Proceedings of the TASS workshop at SEPLN 2014.Santos-Deas, M., Biran, O., McKeown, K., & Rosenthal, S. (2015). Spanish Twitter messages polarized through the lens of an english system. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 81–86), Alicante, Spain, September 15, 2015.Saralegi, X., & San Vicente, I. (2012). TASS: Detecting sentiments in Spanish tweets. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Saralegi, X., & San Vicente, I. (2013). Elhuyar at TASS 2013. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47. doi: 10.1145/505282.505283 .Segura-Bedmar, I., Quiròs, A., & Martìnez, P. (2017). Exploring convolutional neural networks for sentiment analysis of Spanish tweets. In Proceedings of EACL (15th conference of the European chapter of the Association for Computational Linguistics) (pp. 1014–1022), Association for Computational Linguistics.Severyn, A., & Moschitti, A. (2015). Unitn: Training deep convolutional neural network for twitter sentiment classification. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 464–469), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2079 .Siordia, O. S., Moctezuma, D., Graff, M., Miranda-Jiménez, S., Téllez, E. S., & Villaseñor, E. (2015). Sentiment analysis for Twitter: TASS 2015. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN Conference (SEPLN 2015) (pp 65–70), Alicante, Spain, September 15, 2015.Sixto-Cesteros, J., Almeida, A., & López-de-Ipiña, D. (2015). DeustoTech Internet at TASS 2015: Sentiment analysis and polarity classification in Spanish tweets. In: Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 23–28), Alicante, Spain, September 15, 2015.Trilla, A., & Alías, F. (2012). Sentiment analysis of Twitter messages based on multinomial Naive Bayes. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 2007, 1–13.Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In ACL (pp. 417–424), http://www.aclweb.org/anthology/P02-1053.pdf .Valverde-Tohalino, J., & Tejada-Cárcamo, J. (2015). Comparing supervised learning methods for classifying Spanish tweets. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 87–92), Alicante, Spain, September 15, 2015.Vilares, D., Alonso, M. A., & Gómez-Rodríguez, C. (2013). LyS at TASS 2013: Analysing Spanish tweets by means of dependency parsing, semantic-oriented lexicons and psychometric word-properties. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Vilares, D., Doval, Y., Alonso, M. A. & Gómez-Rodríguez, C. (2014). LyS at TASS 2014: A prototype for extracting and analysing aspects from Spanish tweets. In Proceedings of the TASS workshop at SEPLN 2014.Vilares, D., Doval, Y., Alonso, M. A., & Gómez-Rodríguez, C. (2015). LyS at TASS 2015: Deep learning experiments for sentiment analysis on Spanish tweets. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 47–52), Alicante, Spain, September 15, 2015.Villar Rodríguez, E., Torre Bastida, A. I., García Serrano, A., & González Rodríguez, M. (2013). TECNALIA-UNED @ TASS: Uso de un enfoque lingüístico para el análisis de sentimientos. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Villena-Román, J., García Morera, J., García Cumbreras, MÁ., Martínez Cámara, E., Martín Valdivia, M. T., & Ureña López, L. A. (2013a). Workshop on sentiment analysis at SEPLN 2013: An overview. In Proceedings of the TASS workshop at SEPLN 2013, Villena-Román, Julio; García Morera, Janine; García Cumbreras, Miguel Ángel; Martínez Cámara, Eugenio; Martín Valdivia, M. Teresa; Ureña López, L. Alfonso.Villena-Román, J., Lana-Serrano, S., Martínez-Cámara, E., & González-Cristóbal, J. C. (2013b). TASS-workshop on sentiment analysis at SEPLN. Procesamiento del Lenguaje Natural, 50, 37–44.Villena-Román, J., García Morera, J., García Cumbreras, MÁ., Martínez Cámara, E., Martín Valdivia, M. T., & Ureña López, L.A. (2014). Workshop on sentiment analysis at SEPLN: Overview. In Proceedings of the TASS workshop at SEPLN 2014, Villena-Román, Julio; García Morera, Janine; García Cumbreras, Miguel Ángel; Martínez Cámara, Eugenio; Martín Val

    Language identification of multilingual posts from Twitter: a case study

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10115-016-0997-xThis paper describes a method for handling multi-class and multi-label classification problems based on the support vector machine formalism. This method has been applied to the language identification problem in Twitter. The system evaluation was performed mainly on a Twitter data set developed in the TweetLID workshop. This data set contains bilingual tweets written in the most commonly used Iberian languages (i.e., Spanish, Portuguese, Catalan, Basque, and Galician) as well as the English language. We address the following problems: (1) social media texts. We propose a suitable tokenization that processes the peculiarities of Twitter; (2) multilingual tweets. Since a tweet can belong to more than one language, we need to use a multi-class and multi-label classifier; (3) similar languages. We study the main confusions among similar languages; and (4) unbalanced classes. We propose threshold-based strategy to favor classes with less data. We have also studied the use of Wikipedia and the addition of new tweets in order to increase the training data set. Additionally, we have tested our system on Bergsma corpus, a collection of tweets in nine languages, focusing on confusable languages using the Cyrillic, Arabic, and Devanagari alphabets. To our knowledge, we obtained the best results published on the TweetLID data set and results that are in line with the best results published on Bergsma data set.This work has been partially funded by the project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics (MINECO TIN2014-54288-C4-3-R).Pla Santamaría, F.; Hurtado Oliver, LF. (2016). Language identification of multilingual posts from Twitter: a case study. Knowledge and Information Systems. 51(3):965-989. https://doi.org/10.1007/s10115-016-0997-xS965989513Baldwin T, Lui M (2010) Language identification: the long and the short of the matter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, HLT ‘10. Association for Computational Linguistics, Stroudsburg, PA, pp 229–237Bergsma S, McNamee P, Bagdouri M, Fink C, Wilson T (2012) Language identification for creating language-specific twitter collections. In: Proceedings of the second workshop on language in social media, LSM ‘12. Association for Computational Linguistics, Stroudsburg, PA, pp 65–74Carter S, Weerkamp W, Tsagkias M (2013) Microblog language identification: overcoming the limitations of short, unedited and idiomatic text. Lang Resour Eval 47(1):195–215Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, pp. 161–175Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297Gamallo P, García M, Sotelo S, Campos JRP (2014) Comparing ranking-based and naive bayes approaches to language detection on tweets. ‘TweetLID@SEPLN’, pp 12–16Goldszmidt M, Najork M, Paparizos S (2013) Boot-strapping language identifiers for short colloquial postings. In: Proceeding of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECMLPKDD 2013). SpringerGrefenstette G (1995) Comparing two language identification schemes. In: 3rd international conference on statistical analysis of textural dataHurtado LF, Pla F, Giménez M, Arnal ES (2014) Elirf-upv en tweetlid: Identificación del idioma en twitter, In: Proceedings of the Tweet language identification workshop co-located with 30th conference of the Spanish society for natural language processing, TweetLID@SEPLN 2014, Girona, 16 Sept 2014, pp 35–38Jauhiainen T, Lindén K, Jauhiainen H (2015) Language set identification in noisy synthetic multilingual documents. In: Gelbukh A (ed) Computational linguistics and intelligent text processing, vol 9041 of lecture notes in computer science. Springer International Publishing, pp 633–643Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning, no. 1398. Springer, Heidelberg, pp 137–142Liu B (2012) Sentiment analysis and opinion mining. A comprehensive introduction and survey. Morgan & Claypool Publishers, San RafaelLjubešić N, Mikelić N, Boras D (2007) Language identification: How to distinguish similar languages, In: Lužar-Stifter V, Hljuz Dobrić V (eds), Proceedings of the 29th international conference on information technology interfaces. SRCE University Computing Centre, Zagreb, pp 541–546Lui M, Baldwin T (2014) Accurate language identification of twitter messages. In: Proceedings of the EACL 2014 workshop on language analysis in social media (LASM 2014), pp 17–25Lui M, Lau JH, Baldwin T (2014) Automatic detection and language identification of multilingual documents. Trans Assoc Comput Linguist 2:27–40Nguyen D, Dogruoz AS (2014) Word level language identification in online multilingual communication. In: Proceedings of the 2013 conference on empirical methods in natural language processingO’Connor B, Krieger M, Ahn D (2010) Tweetmotif: exploratory search and topic summarization for twitter. In: Cohen WW, Gosling S (eds) Proceedings of the fourth international conference on weblogs and social media, ICWSM 2010, Washington, DC. The AAAI Press, 23–26 May 2010Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830Pla F, Hurtado L-F (2014) Political tendency identification in twitter using sentiment analysis techniques. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, pp 183–192Prager JM (1999) Linguini: language identification for multilingual documents. J Manage Inf Syst 16(3):71–101Ramón Quevedo J, Luaces O, Bahamonde A (2012) Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recogn 45(2):876–883Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents, SMUC ‘10. ACM, New York, NY, pp 37–44Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 2007:1–13Zubiaga A, Vicente IS, Gamallo P, Campos JRP, Loinaz IA, Aranberri N, Ezeiza A Fresno-Fernández V (2014) Overview of tweetlid: Tweet language identification at SEPLN 2014. In: Proceedings of the Tweet language identification workshop co-located with 30th conference of the Spanish society for natural language processing. TweetLID@SEPLN 2014, Girona, Spain, 16 Sept 2014, pp 1–11Zubiaga A, San Vicente I, Gamallo P, Pichel JR, Alegria I, Aranberri N, Ezeiza A, Fresno V (2015) TweetLID: a benchmark for tweet language identification. J Lang Res Eval. Springer, pp 1–38. doi: 10.1007/s10579-015-9317-

    Study of calcium sparks and calcium wave propagation in cardiac cells

    Get PDF
    Contraction of cardiac cells is initiated by an increase in the level of intracellular calcium concentration. The calcium response is the combination of the local stochastic release of tens of thousands of release sites. The possible responses range from sparks (local release) to a global calcium increase, passing from calcium waves that propagate along the cell. In this project we model the intracellular calcium dynamics as a network of excitable elements that fire stochastically, and study the occurrence of calcium waves and spark nucleation. In an initial part, we model the sparks of a homogeneous distribution of calcium nodes. We study the wave propagation of this model depending on properties accounting for the state of the heart. We develop a simplified mean-field theory which will shed light on various aspects of the dynamics of the model. In a second part, we proceed to add clustering onto our model, and with it, we study its new wave propagation and the dynamics of the model

    Development Of A Virtual Environment Based On The Perceived Characteristics Of Pain In Patients With Fibromyalgia

    Get PDF
    Fibromyalgia (FM) is a disorder characterized by chronic phys ical pain.The perception of this pain has psychological effects on mood, anxiety, and the degree of perceived control. In turn, these factors may increase the experience of pain. This study aims to develop a new virtual environment for the treatment of FM in order to enhance the therapeutic effects of traditional interventions. The first phase included a sample of 19 patients in order to identify common characteristics of the representation of pain and absence of pain, through drawing. The results showed that patients used different colors and different physical states to depict pain (red, motionless) and the absence of pain (blue, in motion). These features were then included in a 3D representation of the human body. ANOVA analysis showed that the degree of anxiety and depression influenced the perceived characteristic of movement

    Transformer based contextualization of pre-trained word embeddings for irony detection in Twitter

    Full text link
    [EN] Human communication using natural language, specially in social media, is influenced by the use of figurative language like irony. Recently, several workshops are intended to explore the task of irony detection in Twitter by using computational approaches. This paper describes a model for irony detection based on the contextualization of pre-trained Twitter word embeddings by means of the Transformer architecture. This approach is based on the same powerful architecture as BERT but, differently to it, our approach allows us to use in-domain embeddings. We performed an extensive evaluation on two corpora, one for the English language and another for the Spanish language. Our system was the first ranked system in the Spanish corpus and, to our knowledge, it has achieved the second-best result on the English corpus. These results support the correctness and adequacy of our proposal. We also studied and interpreted how the multi-head self-attention mechanisms are specialized on detecting irony by means of considering the polarity and relevance of individual words and even the relationships among words. This analysis is a first step towards understanding how the multi-head self-attention mechanisms of the Transformer architecture address the irony detection problem.This work has been partially supported by the Spanish Ministerio de Ciencia, Innovacion y Universidades and FEDER founds under project AMIC (TIN2017-85854-C4-2-R) and the GiSPRO project (PROMETEU/2018/176). Work of Jose-Angel Gonzalez is financed by Universitat Politecnica de Valencia under grant PAID-01-17.González-Barba, JÁ.; Hurtado Oliver, LF.; Pla Santamaría, F. (2020). Transformer based contextualization of pre-trained word embeddings for irony detection in Twitter. Information Processing & Management. 57(4):1-15. https://doi.org/10.1016/j.ipm.2020.102262S115574Farías, D. I. H., Patti, V., & Rosso, P. (2016). Irony Detection in Twitter. ACM Transactions on Internet Technology, 16(3), 1-24. doi:10.1145/2930663Greene, R., Cushman, S., Cavanagh, C., Ramazani, J., & Rouzer, P. (Eds.). (2012). The Princeton Encyclopedia of Poetry and Poetics. doi:10.1515/9781400841424Van Hee, C., Lefever, E., & Hoste, V. (2018). We Usually Don’t Like Going to the Dentist: Using Common Sense to Detect Irony on Twitter. Computational Linguistics, 44(4), 793-832. doi:10.1162/coli_a_00337Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. doi:10.1162/neco.1997.9.8.1735Joshi, A., Bhattacharyya, P., & Carman, M. J. (2017). Automatic Sarcasm Detection. ACM Computing Surveys, 50(5), 1-22. doi:10.1145/3124420Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations.Mohammad, S. M., & Turney, P. D. (2012). CROWDSOURCING A WORD-EMOTION ASSOCIATION LEXICON. Computational Intelligence, 29(3), 436-465. doi:10.1111/j.1467-8640.2012.00460.xMuecke, D. C. (1978). Irony markers. Poetics, 7(4), 363-375. doi:10.1016/0304-422x(78)90011-6Potamias, R. A., Siolas, G., & Stafylopatis, A. (2019). A transformer-based approach to irony and sarcasm detection. arXiv:1911.10401.Rosso, P., Rangel, F., Farías, I. H., Cagnina, L., Zaghouani, W., & Charfi, A. (2018). A survey on author profiling, deception, and irony detection for the Arabic language. Language and Linguistics Compass, 12(4), e12275. doi:10.1111/lnc3.12275Sulis, E., Irazú Hernández Farías, D., Rosso, P., Patti, V., & Ruffo, G. (2016). Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not. Knowledge-Based Systems, 108, 132-143. doi:10.1016/j.knosys.2016.05.035Wilson, D., & Sperber, D. (1992). On verbal irony. Lingua, 87(1-2), 53-76. doi:10.1016/0024-3841(92)90025-eYus, F. (2016). Propositional attitude, affective attitude and irony comprehension. Pragmatics & Cognition, 23(1), 92-116. doi:10.1075/pc.23.1.05yusZhang, S., Zhang, X., Chan, J., & Rosso, P. (2019). Irony detection via sentiment-based transfer learning. Information Processing & Management, 56(5), 1633-1644. doi:10.1016/j.ipm.2019.04.00

    Self-attention for Twitter sentiment analysis in Spanish

    Full text link
    [EN] This paper describes our proposal for Sentiment Analysis in Twitter for the Spanish language. The main characteristics of the system are the use of word embedding specifically trained from tweets in Spanish and the use of self-attention mechanisms that allow to consider sequences without using convolutional nor recurrent layers. These self-attention mechanisms are based on the encoders of the Transformer model. The results obtained on the Task 1 of the TASS 2019 workshop, for all the Spanish variants proposed, support the correctness and adequacy of our proposal.This work has been partially supported by the Spanish MINECO and FEDER founds under project AMIC (TIN2017-85854-C4-2-R) and the GiSPRO project (PROMETEU/2018/176). Work of Jose-Angel Gonzalez is financed by Universitat Politecnica de Valencia under grant PAID-01-17.González-Barba, JÁ.; Hurtado Oliver, LF.; Pla Santamaría, F. (2020). Self-attention for Twitter sentiment analysis in Spanish. Journal of Intelligent & Fuzzy Systems. 39(2):2165-2175. https://doi.org/10.3233/JIFS-179881S21652175392Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. doi:10.1162/neco.1997.9.8.173

    Choosing the right loss function for multi-label Emotion Classification

    Full text link
    [EN] Natural Language Processing problems has recently been benefited for the advances in Deep Learning. Many of these problems can be addressed as a multi-label classification problem. Usually, the metrics used to evaluate classification models are different from the loss functions used in the learning process. In this paper, we present a strategy to incorporate evaluation metrics in the learning process in order to increase the performance of the classifier according to the measure we are interested to favor. Concretely, we propose soft versions of the Accuracy, micro-F-1, and macro-F-1 measures that can be used as loss functions in the back-propagation algorithm. In order to experimentally validate our approach, we tested our system in an Emotion Classification task proposed at the International Workshop on Semantic Evaluation, SemEval-2018. Using a Convolutional Neural Network trained with the proposed loss functions we obtained significant improvements both for the English and the Spanish corpora.This work has been partially supported by the Spanish MINECO and FEDER founds under project AMIC (TIN2017-85854-C4-2-R) and the GiSPRO project (PROMETEU/2018/176). Work of Jose-Angel Gonzalez is also financed by Universitat Politecnica de Valencia under grant PAID-01-17.Hurtado Oliver, LF.; González-Barba, JÁ.; Pla Santamaría, F. (2019). Choosing the right loss function for multi-label Emotion Classification. Journal of Intelligent & Fuzzy Systems. 36(5):4697-4708. https://doi.org/10.3233/JIFS-179019S46974708365Baccianella S. , Esuli A. and Sebastiani F. , Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining, In in Proc of LREC, 2010.Bilmes J. , Asanovic K. , Chin C.-W. and Demmel J. , Using phipac to speed error back-propagation learning, In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 5, 1997, pp. 4153–4156.Cruz, F. L., Troyano, J. A., Pontes, B., & Ortega, F. J. (2014). Building layered, multilingual sentiment lexicons at synset and lemma levels. Expert Systems with Applications, 41(13), 5984-5994. doi:10.1016/j.eswa.2014.04.005Dembczynski K. , Jachnik A. , Kotlowski W. , Waegeman W. and Huellermeier E. , Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization, In DasguptaS. and McAllester D., editors, Proceedings of the 30th International Conference on Machine Learning volume 28 of Proceedings of Machine Learning Research, Atlanta, Georgia, USA, PMLR, 2013, pp. 1130–1138.Goodfellow I. , Bengio Y. and Courville A. , Deep Learning, MIT Press, http://www.deeplearningbook.org (2016).Hu M. and Liu B. , Mining and summarizing customer reviews, In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, New York, NY, USA, ACM, 2004, pp. 168–177.Ioffe S. and Szegedy C. , Batch normalization: Accelerating deep network training by reducing internal covariate shift, CoRR, abs/1502.03167 (2015).Janocha K. and Czarnecki W.M. , On loss functions for deep neural networks in classification, CoRR, abs/1702.05659 (2017).Krieger M. and Ahn D. , Tweetmotif: Exploratory search and topic summarization for twitter, In Proc of AAAI Conference on Weblogs and Social, 2010.Liu B. , Sentiment Analysis and Opinion Mining, A Comprehensive Introduction and Survey. Morgan & Claypool Publishers, 2012.Mikolov T. , Sutskever I. , Chen K. , Corrado G. and Dean J. , Distributed representations of words and phrases and their compositionality, CoRR, abs/1310.4546 (2013a).Mikolov T. , Chen K. , Corrado G. and Dean J. , Efficient estimation of word representations in vector space, CoRR, abs/1301.3781, 2013b.Mohammad S. , #emotional tweets, In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the Main Conference and the Shared Task and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Montréal, Canada. Association for Computational Linguistics, 2012, pp. 246–255.Mohammad S. , Kiritchenko S. , Sobhani P. , Zhu X. and Cherry C. , Semeval-task 6: Detecting stance in tweets, In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 31–41.Mohammad S.M. and Bravo-Marquez F. , WASSA-shared task on emotion intensity, CoRR, abs/1708.03700, 2017.Mohammad, S. M., & Turney, P. D. (2012). CROWDSOURCING A WORD-EMOTION ASSOCIATION LEXICON. Computational Intelligence, 29(3), 436-465. doi:10.1111/j.1467-8640.2012.00460.xMohammad, S. M., Sobhani, P., & Kiritchenko, S. (2017). Stance and Sentiment in Tweets. ACM Transactions on Internet Technology, 17(3), 1-23. doi:10.1145/3003433Mohammad S.M. , Bravo-Marquez F. , Salameh M. and Kiritchenko S. , Semeval-2018 Task 1: Affect in tweets, In Proceedings of International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA, 2018.Molina-González, M. D., Martínez-Cámara, E., Martín-Valdivia, M.-T., & Perea-Ortega, J. M. (2013). Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications, 40(18), 7250-7257. doi:10.1016/j.eswa.2013.06.076Nair V. and Hinton G.E. , Rectified linear units improve restricted boltzmann machines, In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, USA, 2010, pp. 807–814. Omnipress.NielsenF.Å., AFINN, 2011.Pastor-Pellicer J. , Zamora-Martínez F. , España Boquera S. and Castro Bleda M.J. , F-Measure as the Error Function to Train Neural Networks, In IWANN Proceedings, 2013.Pennebaker J. , Chung C. , Ireland M. , Gonzales A. and Booth R. , The development and psychological properties of liwc2007, 2014.Pla, F., & Hurtado, L.-F. (2016). Language identification of multilingual posts from Twitter: a case study. Knowledge and Information Systems, 51(3), 965-989. doi:10.1007/s10115-016-0997-xRosenthal S. , Farra N. and Nakov P. , SemEval-2017 task 4: Sentiment analysis in Twitter, In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval ’17, Vancouver, Canada, Association for Computational Linguistics, 2017.Saralegi X. and San I. , Vicente, Elhuyar at tass 2013, In XXIX Congreso de la Sociedad Espaola de Procesamiento de Lenguaje Natural, Workshop on Sentiment Analysis at SEPLN (TASS2013), 2013, pp. 143–150.Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47. doi:10.1145/505282.505283Taulé M. , Martí M. , Rangel F. , Rosso P. , Bosco C. and Patti V. , Overview of the task of Stance and Gender Detection in Tweets on Catalan Independence at IBEREVAL 2017, In Notebook Papers of 2nd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL), Murcia (Spain). CEUR Workshop Proceedings. CEUR-WS.org, 2017, 2017.Wiebe J. , Wilson T. and Cardie C. , Annotating expressions of opinions and emotions in language, Language Resources and Evaluation 1(2) (2005).Wilson T. , Wiebe J. and Hoffmann P. , Recognizing contextual polarity in phrase-level sentiment analysis, In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05, Stroudsburg, PA, USA, 2005, pp. 347–354. Association for Computational Linguistics.Zhang Y. and Wallace B. , A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification, In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2017, pp. 253–263. Asian Federation of Natural Language Processing

    Retrobada a Catalunya (NE de la península Ibèrica) una població de Spirodela polyrhiza (Araceae)

    Get PDF
    L'estiu del 2012 es van detectar al darrer tram de l'Ebre diversos poblaments de Spirodela polyrhiza (L.) Schleid. Aquestes localitats representen el retrobament en gran quantitat d'una espècie de flora que fins fa molt poc es considerava extingida al territori català i que mai no havia estat indicada a la conca de l'Ebre. La descoberta d'aquests nous poblaments és, tant per la seua freqüència com per la seua abundància, una novetat remarcable en el context català i ibèric.In the summer of 2012 several subpopulations of Spirodela polyrhiza (L.) Schleid. were detected in the last section of the Ebro river. These populations represent the rediscovery, in large quantities, of a species of flora believed until recently to be extinct throughout Catalonia and never noticed before in the Ebro basin. The finding of these new populations is, both for its frequency and its abundance, noteworthy within the Catalan and Iberian context
    corecore