8 research outputs found

    GTI en TASS 2016 : Una aproximaci on supervisada para el an alisis de sentimiento basado en aspectos en Twitter

    Get PDF
    This paper describes the participation of the GTI research group of AtlantTIC, University of Vigo, in TASS 2016. This workshop is framed within the XXXII edition of the Annual Congress of the Spanish Society for Natural Language Processing event. In this work we propose a supervised approach based on classi ers, for the aspect based sentiment analysis task. Using this technique we managed to improve the performance of previous years, obtaining a solution reflecting the actual state-of-the-art.Este artículo describe la participación del grupo de investigación GTI, del centro AtlantTIC, perteneciente a la Universidad de Vigo, en el TASS 2016. Este taller es un evento enmarcado dentro de la XXXII edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural. En este trabajo se propone una aproximación supervisada, basada en clasificadores, para la tarea de análisis de sentimiento basado en aspectos. Mediante esta técnica hemos conseguido mejorar las prestaciones de ediciones anteriores, obteniendo una solución acorde con el estado del arte actual.Ministerio de Economía y Competitividad | Ref. TEC2013-47016-C2-1-RXunta de Galicia | Ref. GRC2014/04

    Self-attention for Twitter sentiment analysis in Spanish

    Full text link
    [EN] This paper describes our proposal for Sentiment Analysis in Twitter for the Spanish language. The main characteristics of the system are the use of word embedding specifically trained from tweets in Spanish and the use of self-attention mechanisms that allow to consider sequences without using convolutional nor recurrent layers. These self-attention mechanisms are based on the encoders of the Transformer model. The results obtained on the Task 1 of the TASS 2019 workshop, for all the Spanish variants proposed, support the correctness and adequacy of our proposal.This work has been partially supported by the Spanish MINECO and FEDER founds under project AMIC (TIN2017-85854-C4-2-R) and the GiSPRO project (PROMETEU/2018/176). Work of Jose-Angel Gonzalez is financed by Universitat Politecnica de Valencia under grant PAID-01-17.González-Barba, JÁ.; Hurtado Oliver, LF.; Pla Santamaría, F. (2020). Self-attention for Twitter sentiment analysis in Spanish. Journal of Intelligent & Fuzzy Systems. 39(2):2165-2175. https://doi.org/10.3233/JIFS-179881S21652175392Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. doi:10.1162/neco.1997.9.8.173

    Spanish sentiment analysis in Twitter at the TASS workshop

    Full text link
    [EN] This paper describes a support vector machine-based approach to different tasks related to sentiment analysis in Twitter for Spanish. We focus on parameter optimization of the models and the combination of several models by means of voting techniques. We evaluate the proposed approach in all the tasks that were defined in the five editions of the TASS workshop, between 2012 and 2016. TASS has become a framework for sentiment analysis tasks that are focused on the Spanish language. We describe our participation in this competition and the results achieved, and then we provide an analysis of and comparison with the best approaches of the teams who participated in all the tasks defined in the TASS workshops. To our knowledge, our results exceed those published to date in the sentiment analysis tasks of the TASS workshops.This work has been partially funded by the Spanish MINECO and FEDER founds under project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics, TIN2014-54288-C4-3-R.Pla Santamaría, F.; Hurtado Oliver, LF. (2018). Spanish sentiment analysis in Twitter at the TASS workshop. Language Resources and Evaluation. 52(2):645-672. https://doi.org/10.1007/s10579-017-9394-7S645672522Álvarez-López, T., Juncal-Martínez, J., Fernández-Gavilanes, M., Costa-Montenegro, E., González-Castaño, F.J., Cerezo-Costas, H. , & Celix-Salgado, D. (2015). GTI-gradiant at TASS 2015: A hybrid approach for sentiment analysis in Twitter. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 35–40), Alicante, Spain, September 15, 2015.Álvarez-López, T., Fernández-Gavilanes, M., García-Méndez, S., Juncal-Martínez, J., & González-Castaño, F.J. (2016). GTI at TASS 2016: Supervised approach for aspect based sentiment analysis in Twitter. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 53–57), Salamanca, Spain, September 13th, 2016.Araque, O., Corcuera, I., Román, C., Iglesias, C. A., & Sánchez-Rada, J. F. (2015). Aspect based sentiment analysis of Spanish tweets. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 29–34), Alicante, Spain, September 15, 2015.Balahur, A., & Perea-Ortega, J. M. (2013). Experiments using varying sizes and machine translated data for sentiment analysis in Twitter. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Barbosa, L., & Feng, J. (2010). Robust sentiment detection on Twitter from biased and noisy data. In Proceedings of the 23rd international conference on computational linguistics: posters, association for computational linguistics (pp. 36–44).Batista, F., & Ribeiro, R. (2012). The L2F Strategy for Sentiment Analysis and Topic Classification. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Casasola Murillo, E., & Marín Raventós, G. (2016). Evaluación de Modelos de Representación del Texto con Vectores de Dimensiónn Reducida para Análisis de Sentimiento. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 23–28), Salamanca, Spain, September 13th, 2016.Castellano, A., Cigarrán, J. & García-Serrano, A. (2012). UNED @ TASS: Using IR techniques for topic-based sentiment analysis through divergence models. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Castellanos-González, A., Cigarrán-Recuero, J. & García-Serrano, A. (2013). UNED LSI @ TASS 2013: Considerations about textual representation for IR based tweet classification. In: Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Cerón-Guzmán, J. A. (2016). JACERONG at TASS 2016: An ensemble classifier for sentiment analysis of Spanish tweets at global level. In: Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 35–39), Salamanca, Spain, September 13th, 2016.del-Hoyo-Alonso, R., Hupont, I., & Lacueva, F. (2013). Affective polarity word discovering by means of artificial general intelligence techniques. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.del-Hoyo-Alonso, R., de la Vega Rodrigalvarez-Chamorro, M., Vea-Murguía, J., & Montañes-Salas, R. M. (2015). Ensemble algorithm with syntactical tree features to improve the opinion analysis. In Proceedings of TASS 2015: workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 53–58), Alicante, Spain, September 15, 2015.Deriu, J., Gonzenbach, M., Uzdilli, F., Lucchi, A., De Luca, V., & Jaggi, M. (2016). Swisscheese at semeval-2016 task 4: Sentiment classification using an ensemble of convolutional neural networks with distant supervision. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 1124–1128), Association for Computational Linguistics, San Diego, California, http://www.aclweb.org/anthology/S16-1173 .Díaz-Galiano, M. C., & Montejo-Ráez, A. (2015). Participación de SINAI DW2Vec en TASS 2015. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 59–64), Alicante, Spain, September 15, 2015.Fernández, J., Gutiérrez, Y., Tomás, D., Gómez, J. M. & Martínez-Barco, P. (2015). Evaluating a sentiment analysis approach from a business point of view. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 93–98), Alicante, Spain, September 15, 2015.Fernández, J., Gutiérrez, Y., Gómez, J.M., Martínez-Barco, P., Montoyo A., & Muñoz, R. (2013). Sentiment analysis of Spanish Tweets using a ranking algorithm and skipgrams. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Frank, E., Hall, M. A., & Witten, I. H. (2016). The WEKA workbench. Online appendix for “Data mining: Practical machine learning tools and techniques” (4th ed.). Burlington: Morgan Kaufmann.Gamallo, P., García, M. & Fernández-Lanza, S. (2013). TASS: A Naive-Bayes strategy for sentiment analysis on Spanish tweets. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.García Cumbreras, M. Á., Martínez Cámara, E., Villena-Román, J., & García Morera, J. (2016a). TASS 2015—The evolution of the Spanish opinion mining systems. Procesamiento del Lenguaje Natural.García Cumbreras, M. Á., Villena Román, J., Martínez Cámara, E., Díaz Galiano, M. C., Martín Valdivia, M. T., & Ureña López, L. A. (2016b). Overview of TASS 2016. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 13–21), Salamanca, Spain, September 13th, 2016.García, D., & Thelwall, M. (2013). Political alignment and emotional expression in Spanish Tweets. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Hagen, M., Potthast, M., Büchner, M., & Stein, B. (2015). Webis: An ensemble for twitter sentiment detection. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 582–589), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2097 .Hamdan, H., Bellot, P., & Bechet, F. (2015). Lsislif: Crf and logistic regression for opinion target extraction and sentiment polarity analysis. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 753–758), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2128 .Hernández Petlachi, R., & Li, X. (2014). Análisis de sentimiento sobre textos en Español basado en aproximaciones semánticas con reglas lingüísticas. In Proceedings of the TASS workshop at SEPLN 2014.Hurtado, L.F. , & Pla, F. (2014). ELiRF-UPV en TASS 2014: Análisis de Sentimientos, Detección de Tópicos y Análisis de Sentimientos de Aspectos en Twitter. In Proceedings of the TASS workshop at SEPLN 2014.Hurtado, L. F., & Pla, F. (2016). ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 47–51), Salamanca, Spain, September 13th, 2016.Hurtado, L. F., Pla, F., & Buscaldi, D. (2015). ELiRF-UPV en TASS 2015: Análisis de Sentimientos en Twitter. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 75–79), Alicante, Spain, September 15, 2015.Jansen, B. J., Zhang, M., Sobel, K., & Chowdury, A. (2009). Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60(11), 2169–2188.Jiménez Zafra, S. M., Martínez Cámara, E., Martín Valdivia, M. T., & Ureña López, L. A. (2014) SINAI-ESMA: An unsupervised approach for sentiment analysis in Twitter. In Proceedings of the TASS workshop at SEPLN 2014.Liu, B. (2012). Sentiment analysis and opinion mining. A comprehensive introduction and survey. San Rafael: Morgan & Claypool Publishers.Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on world wide web (pp. 342–351), ACM, New York, NY, USA, WWW ’05, doi: 10.1145/1060745.1060797 , http://doi.acm.org/10.1145/1060745.1060797Martínez-Cámara, E., Martín-Valdivia, M. T., Ureña-López, L. A., & Montejo-Raéz, A. (2014). Sentiment analysis in Twitter. Natural Language Engineering, 1(1), 1–28.Martínez-Cámara, E., García-Cumbreras, M.Á., Martín-Valdivia, M. T., & López, L. A. U. (2015). SINAI-EMMA: Vectores de Palabras para el Análisis de Opiniones en Twitter. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 41–46), Alicante, Spain, September 15, 2015.Martín-Wanton, T., & de Albornoz, J. C. (2012). UNED at TASS 2012: Polarity classification and trending topic system. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Martínez-Cámara, E., Ángel García-Cumbreras, M., Martín-Valdivia, M. T., & Ureña-López, L. A. (2013). SINAI-EMML: Combinación de Recursos Lingüíticos para el Análisis de la Opinión en Twitter. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Martínez-Cámara, E., Martín-Valdivia, M. T., Molina-González, M. D., & Ureña-López, L. A. (2013). Bilingual experiments on an opinion comparable corpus. In Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 87–93).Mendizabal, I., & Carandell, J. (2015). BittenPotato: Tweet sentiment analysis by combining multiple classifiers. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 71–74), Alicante, Spain, September 15, 2015.Mohammad, S., Kiritchenko, S., & Zhu, X. (2013). Nrc-canada: Building the state-of-the-art in sentiment analysis of tweets. In Second joint conference on lexical and computational semantics (*SEM), Volume 2: Proceedings of the seventh international workshop on semantic evaluation (SemEval 2013) (pp. 321–327), Association for Computational Linguistics, Atlanta, Georgia, USA, http://www.aclweb.org/anthology/S13-2053 .Montejo-Ráez, A., & Díaz-Galiano, M. C. (2016). Participación de SINAI en TASS 2016. In Proceedings of TASS 2016: Workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 41–45), Salamanca, Spain, September 13th, 2016.Montejo-Ráez, A., Díaz-Galiano, M. C., & García-Vega, M. (2013). LSA based approach to TASS 2013. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Montejo-Ráez, A., García-Cumbreras, M., & Díaz-Galiano, M. (2014). Participación de SINAI Word2Vec en TASS 2014. In Proceedings of the TASS workshop at SEPLN 2014.Moreno-Ortiz, A., & Pérez-Hernández, C. (2012). Lexicon-based sentiment analysis of Twitter messages in Spanish. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., & Wilson, T. (2013). SemEval-2013 Task 2: Sentiment analysis in Twitter.Nakov, P., Ritter, A., Rosenthal, S., Stoyanov, V., & Sebastiani, F. (2016). SemEval-2016 Task 4: Sentiment analysis in Twitter. In Proceedings of the 10th international workshop on semantic evaluation (pp. 1–18), Association for Computational Linguistics, San Diego, California, SemEval ’16.O’Connor, B., Krieger, M., & Ahn, D. (2010). TweetMotif: Exploratory search and topic summarization for Twitter. In Cohen, W. W. & Gosling, S. (Eds)., Proceedings of the fourth international conference on weblogs and social media, ICWSM 2010, Washington, DC, USA, May 23-26, 2010, The AAAI Press, http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1540 .Padró, L., & Stanilovsky, E. (2012). FreeLing 3.0: Towards Wider Multilinguality. In Proceedings of the language resources and evaluation conference (LREC 2012), ELRA, Istanbul, Turkey.Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP (pp. 79–86).Park, S. (2015). Sentiment Classification Using Sociolinguistic Clusters. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 99–104), Alicante, Spain, September 15, 2015.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Perea-Ortega, J. M. & Balahur, A. (2014). Experiments on feature replacements for polarity classification of Spanish tweets. In Proceedings of the TASS workshop at SEPLN 2014.Perez-Rosas, V., Banea, C., & Mihalcea, R. (2012). Learning Sentiment Lexicons in Spanish. In: N. C. C. Chair, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, J. Odijk, & S. Piperidis (Eds.), Proceedings of the eight international conference on language resources and evaluation (LREC’12), European Language Resources Association (ELRA), Istanbul, Turkey.Pla, F., & Hurtado, L. F. (2013a) ELiRF-UPV en TASS-2013: Análisis de sentimientos en Twitter. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Pla, F., & Hurtado, L. F. (2013b) ELiRF-UPV en TASS-2013: Análisis de sentimientos en Twitter. In XXIX Congreso de la Sociedad Espanola para el Procesamiento del Lenguaje Natural (SEPLN 2013) TASS (pp. 220–227).Pla, F., & Hurtado, L. F. (2014a) Political tendency identification in Twitter using sentiment analysis techniques. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical Papers (pp. 183–192), Dublin City University and Association for Computational Linguistics, Dublin, Ireland, http://www.aclweb.org/anthology/C14-1019 .Pla, F., & Hurtado, L. F. (2014b) Sentiment analysis in Twitter for Spanish. In International conference on applications of natural language to data bases/information systems (pp. 208–213), Springer International Publishing.Quirós, A., Segura-Bedmar, I., & Martínez, P. (2016). LABDA at the 2016 TASS challenge task: Using word embeddings for the sentiment analysis task. In Proceedings of TASS 2016: workshop on sentiment analysis at SEPLN co-located with 32nd SEPLN conference (SEPLN 2016) (pp. 29–33), Salamanca, Spain, September 13th, 2016.Ramón Quevedo, J., Luaces, O., & Bahamonde, A. (2012). Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recogn, 45(2), 876–883.Rosenthal, S., Nakov, P., Ritter, A., & Stoyanov, V. (2014). SemEval-2014 Task 9: Sentiment analysis in Twitter. In: P. Nakov, T. Zesch (Eds.), Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, Dublin, Ireland.Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S., Ritter, A., & Stoyanov, V. (2015). SemEval-2015 Task 10: Sentiment analysis in Twitter. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 451–463), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2078 .Rouvier, M., & Favre, B. (2016). SENSEI-LIF at SemEval-2016 task 4: Polarity embedding fusion for robust sentiment analysis. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 202–208), Association for Computational Linguistics, San Diego, California, http://www.aclweb.org/anthology/S16-1030 .San Vicente Roncal, I., & Saralegi Urizar, X. (2014). Looking for features for supervised tweet polarity classification. In Proceedings of the TASS workshop at SEPLN 2014.Santos-Deas, M., Biran, O., McKeown, K., & Rosenthal, S. (2015). Spanish Twitter messages polarized through the lens of an english system. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 81–86), Alicante, Spain, September 15, 2015.Saralegi, X., & San Vicente, I. (2012). TASS: Detecting sentiments in Spanish tweets. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Saralegi, X., & San Vicente, I. (2013). Elhuyar at TASS 2013. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47. doi: 10.1145/505282.505283 .Segura-Bedmar, I., Quiròs, A., & Martìnez, P. (2017). Exploring convolutional neural networks for sentiment analysis of Spanish tweets. In Proceedings of EACL (15th conference of the European chapter of the Association for Computational Linguistics) (pp. 1014–1022), Association for Computational Linguistics.Severyn, A., & Moschitti, A. (2015). Unitn: Training deep convolutional neural network for twitter sentiment classification. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 464–469), Association for Computational Linguistics, Denver, Colorado, http://www.aclweb.org/anthology/S15-2079 .Siordia, O. S., Moctezuma, D., Graff, M., Miranda-Jiménez, S., Téllez, E. S., & Villaseñor, E. (2015). Sentiment analysis for Twitter: TASS 2015. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN Conference (SEPLN 2015) (pp 65–70), Alicante, Spain, September 15, 2015.Sixto-Cesteros, J., Almeida, A., & López-de-Ipiña, D. (2015). DeustoTech Internet at TASS 2015: Sentiment analysis and polarity classification in Spanish tweets. In: Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 23–28), Alicante, Spain, September 15, 2015.Trilla, A., & Alías, F. (2012). Sentiment analysis of Twitter messages based on multinomial Naive Bayes. Technical report, http://www.sepln.org/workshops/tass/2012/participation.php .Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 2007, 1–13.Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In ACL (pp. 417–424), http://www.aclweb.org/anthology/P02-1053.pdf .Valverde-Tohalino, J., & Tejada-Cárcamo, J. (2015). Comparing supervised learning methods for classifying Spanish tweets. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 87–92), Alicante, Spain, September 15, 2015.Vilares, D., Alonso, M. A., & Gómez-Rodríguez, C. (2013). LyS at TASS 2013: Analysing Spanish tweets by means of dependency parsing, semantic-oriented lexicons and psychometric word-properties. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Vilares, D., Doval, Y., Alonso, M. A. & Gómez-Rodríguez, C. (2014). LyS at TASS 2014: A prototype for extracting and analysing aspects from Spanish tweets. In Proceedings of the TASS workshop at SEPLN 2014.Vilares, D., Doval, Y., Alonso, M. A., & Gómez-Rodríguez, C. (2015). LyS at TASS 2015: Deep learning experiments for sentiment analysis on Spanish tweets. In Proceedings of TASS 2015: Workshop on sentiment analysis at SEPLN co-located with 31st SEPLN conference (SEPLN 2015) (pp. 47–52), Alicante, Spain, September 15, 2015.Villar Rodríguez, E., Torre Bastida, A. I., García Serrano, A., & González Rodríguez, M. (2013). TECNALIA-UNED @ TASS: Uso de un enfoque lingüístico para el análisis de sentimientos. In Proceedings of the TASS workshop at SEPLN 2013, IV Congreso Español de Informática.Villena-Román, J., García Morera, J., García Cumbreras, MÁ., Martínez Cámara, E., Martín Valdivia, M. T., & Ureña López, L. A. (2013a). Workshop on sentiment analysis at SEPLN 2013: An overview. In Proceedings of the TASS workshop at SEPLN 2013, Villena-Román, Julio; García Morera, Janine; García Cumbreras, Miguel Ángel; Martínez Cámara, Eugenio; Martín Valdivia, M. Teresa; Ureña López, L. Alfonso.Villena-Román, J., Lana-Serrano, S., Martínez-Cámara, E., & González-Cristóbal, J. C. (2013b). TASS-workshop on sentiment analysis at SEPLN. Procesamiento del Lenguaje Natural, 50, 37–44.Villena-Román, J., García Morera, J., García Cumbreras, MÁ., Martínez Cámara, E., Martín Valdivia, M. T., & Ureña López, L.A. (2014). Workshop on sentiment analysis at SEPLN: Overview. In Proceedings of the TASS workshop at SEPLN 2014, Villena-Román, Julio; García Morera, Janine; García Cumbreras, Miguel Ángel; Martínez Cámara, Eugenio; Martín Val

    La democratización del aprendizaje profundo en TASS 2017

    Get PDF
    TASS 2017 has brought advances in the state-of-the-art in Sentiment Analysis in Spanish, because most of the systems submitted in 2017 were grounded on Deep Learning methods. Moreover, a new corpus of tweets written in Spanish was released, which is called InterTASS. The corpus is composed of tweets manually annotated at document level. The analysis of the results with InterTASS shows that the main challenge is the classification of tweets with a neutral opinion and those ones that do not express any opinion. Likewise, the organization exposed the project of extending InterTASS with tweets written in different versions of Spanish.TASS 2017 ha vuelto a suponer un avance en el estado del arte de análisis de opiniones en español, debido a la exposición de sistemas mayoritariamente fundamentados en métodos de Deep Learning. Además, en esta edición se ha presentado un nueva colección de tuits en español manualmente etiquetados a nivel de documento y que se llama InterTASS. El análisis de los resultados con InterTASS demuestra que en el futuro el esfuerzo investigador se tiene que centrar en la distinción del nivel de intensidad de opinión neutro y la ausencia de opinión. Asimismo, se presentó el proyecto de ampliar el nuevo corpus con tuits escritos en el español que se habla en España y en algunos países de América.This research work is partially supported by REDES project (TIN2015-65136-C2-1-R) and SMART project (TIN2017-89517-P) from the Spanish Government, and a grant from the Fondo Europeo de Desarrollo Regional (FEDER). Eugenio Martínez Cámara was supported by the Juan de la Cierva Formación Programme (FJCI-2016-28353) from the Spanish Government

    Detection of Sarcasm and Nastiness: New Resources for Spanish Language

    Get PDF
    The main goal of this work is to provide the cognitive computing community with valuable resources to analyze and simulate the intentionality and/or emotions embedded in the language employed in social media. Specifically, it is focused on the Spanish language and online dialogues, leading to the creation of SOFOCO (Spanish Online Forums Corpus). It is the first Spanish corpus consisting of dialogic debates extracted from social media and it is annotated by means of crowdsourcing in order to carry out automatic analysis of subjective language forms, like sarcasm or nastiness. Furthermore, the annotators were also asked about the context need when taking a decision. In this way, the users’ intentions and their behavior inside social networks can be better understood and more accurate text analysis is possible. An analysis of the annotation results is carried out and the reliability of the annotations is also explored. Additionally, sarcasm and nastiness detection results (around 0.76 F-Measure in both cases) are also reported. The obtained results show the presented corpus as a valuable resource that might be used in very diverse future work.This study was partially funded by the Spanish Government (TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R) by the European Unions’s H2020 program under grant 769872 and by the National Science Foundation of USA (NSF CISE R1 #1202668

    Resumen de TASS 2018: Opiniones, Salud y Emociones

    Get PDF
    This is an overview of the Workshop on Semantic Analysis at the SEPLN congress held in Sevilla, Spain, in September 2018. This forum proposes to participants four different semantic tasks on texts written in Spanish. Task 1 focuses on polarity classification; Task 2 encourages the development of aspect-based polarity classification systems; Task 3 provides a scenario for discovering knowledge from eHealth documents; finally, Task 4 is about automatic classification of news articles according to safety. The former two tasks are novel in this TASS's edition. We detail the approaches and the results of the submitted systems of the different groups in each task.Este artículo ofrece un resumen sobre el Taller de Análisis Semántico en la SEPLN (TASS) celebrado en Sevilla, España, en septiembre de 2018. Este foro propone a los participantes cuatro tareas diferentes de análisis semántico sobre textos en español. La Tarea 1 se centra en la clasificación de la polaridad; la Tarea 2 anima al desarrollo de sistemas de polaridad orientados a aspectos; la Tarea 3 consiste en descubrir conocimiento en documentos sobre salud; finalmente, la Tarea 4 propone la clasificación automática de noticias periodísticas según un nivel de seguridad. Las dos últimas tareas son nuevas en esta edición. Se ofrece una síntesis de los sistemas y los resultados aportados por los distintos equipos participantes, así como una discusión sobre los mismos.This work has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), the projects REDES (TIN2015-65136-C2-1-R, TIN2015-65136-C2-2-R) and SMART-DASCI (TIN2017-89517-P) from the Spanish Government, and “Plataforma Inteligente para Recuperación, Análisis y Representación de la Información Generada por Usuarios en Internet” (GRE16-01) from University of Alicante. Eugenio Martínez Cámara was supported by the Spanish Government Programme Juan de la Cierva Formación (FJCI-2016-28353)

    Attention-based Approaches for Text Analytics in Social Media and Automatic Summarization

    Full text link
    [ES] Hoy en día, la sociedad tiene acceso y posibilidad de contribuir a grandes cantidades de contenidos presentes en Internet, como redes sociales, periódicos online, foros, blogs o plataformas de contenido multimedia. Todo este tipo de medios han tenido, durante los últimos años, un impacto abrumador en el día a día de individuos y organizaciones, siendo actualmente medios predominantes para compartir, debatir y analizar contenidos online. Por este motivo, resulta de interés trabajar sobre este tipo de plataformas, desde diferentes puntos de vista, bajo el paraguas del Procesamiento del Lenguaje Natural. En esta tesis nos centramos en dos áreas amplias dentro de este campo, aplicadas al análisis de contenido en línea: análisis de texto en redes sociales y resumen automático. En paralelo, las redes neuronales también son un tema central de esta tesis, donde toda la experimentación se ha realizado utilizando enfoques de aprendizaje profundo, principalmente basados en mecanismos de atención. Además, trabajamos mayoritariamente con el idioma español, por ser un idioma poco explorado y de gran interés para los proyectos de investigación en los que participamos. Por un lado, para el análisis de texto en redes sociales, nos enfocamos en tareas de análisis afectivo, incluyendo análisis de sentimientos y detección de emociones, junto con el análisis de la ironía. En este sentido, se presenta un enfoque basado en Transformer Encoders, que consiste en contextualizar \textit{word embeddings} pre-entrenados con tweets en español, para abordar tareas de análisis de sentimiento y detección de ironía. También proponemos el uso de métricas de evaluación como funciones de pérdida, con el fin de entrenar redes neuronales, para reducir el impacto del desequilibrio de clases en tareas \textit{multi-class} y \textit{multi-label} de detección de emociones. Adicionalmente, se presenta una especialización de BERT tanto para el idioma español como para el dominio de Twitter, que tiene en cuenta la coherencia entre tweets en conversaciones de Twitter. El desempeño de todos estos enfoques ha sido probado con diferentes corpus, a partir de varios \textit{benchmarks} de referencia, mostrando resultados muy competitivos en todas las tareas abordadas. Por otro lado, nos centramos en el resumen extractivo de artículos periodísticos y de programas televisivos de debate. Con respecto al resumen de artículos, se presenta un marco teórico para el resumen extractivo, basado en redes jerárquicas siamesas con mecanismos de atención. También presentamos dos instancias de este marco: \textit{Siamese Hierarchical Attention Networks} y \textit{Siamese Hierarchical Transformer Encoders}. Estos sistemas han sido evaluados en los corpora CNN/DailyMail y NewsRoom, obteniendo resultados competitivos en comparación con otros enfoques extractivos coetáneos. Con respecto a los programas de debate, se ha propuesto una tarea que consiste en resumir las intervenciones transcritas de los ponentes, sobre un tema determinado, en el programa "La Noche en 24 Horas". Además, se propone un corpus de artículos periodísticos, recogidos de varios periódicos españoles en línea, con el fin de estudiar la transferibilidad de los enfoques propuestos, entre artículos e intervenciones de los participantes en los debates. Este enfoque muestra mejores resultados que otras técnicas extractivas, junto con una transferibilidad de dominio muy prometedora.[CA] Avui en dia, la societat té accés i possibilitat de contribuir a grans quantitats de continguts presents a Internet, com xarxes socials, diaris online, fòrums, blocs o plataformes de contingut multimèdia. Tot aquest tipus de mitjans han tingut, durant els darrers anys, un impacte aclaparador en el dia a dia d'individus i organitzacions, sent actualment mitjans predominants per compartir, debatre i analitzar continguts en línia. Per aquest motiu, resulta d'interès treballar sobre aquest tipus de plataformes, des de diferents punts de vista, sota el paraigua de l'Processament de el Llenguatge Natural. En aquesta tesi ens centrem en dues àrees àmplies dins d'aquest camp, aplicades a l'anàlisi de contingut en línia: anàlisi de text en xarxes socials i resum automàtic. En paral·lel, les xarxes neuronals també són un tema central d'aquesta tesi, on tota l'experimentació s'ha realitzat utilitzant enfocaments d'aprenentatge profund, principalment basats en mecanismes d'atenció. A més, treballem majoritàriament amb l'idioma espanyol, per ser un idioma poc explorat i de gran interès per als projectes de recerca en els que participem. D'una banda, per a l'anàlisi de text en xarxes socials, ens enfoquem en tasques d'anàlisi afectiu, incloent anàlisi de sentiments i detecció d'emocions, juntament amb l'anàlisi de la ironia. En aquest sentit, es presenta una aproximació basada en Transformer Encoders, que consisteix en contextualitzar \textit{word embeddings} pre-entrenats amb tweets en espanyol, per abordar tasques d'anàlisi de sentiment i detecció d'ironia. També proposem l'ús de mètriques d'avaluació com a funcions de pèrdua, per tal d'entrenar xarxes neuronals, per reduir l'impacte de l'desequilibri de classes en tasques \textit{multi-class} i \textit{multi-label} de detecció d'emocions. Addicionalment, es presenta una especialització de BERT tant per l'idioma espanyol com per al domini de Twitter, que té en compte la coherència entre tweets en converses de Twitter. El comportament de tots aquests enfocaments s'ha provat amb diferents corpus, a partir de diversos \textit{benchmarks} de referència, mostrant resultats molt competitius en totes les tasques abordades. D'altra banda, ens centrem en el resum extractiu d'articles periodístics i de programes televisius de debat. Pel que fa a l'resum d'articles, es presenta un marc teòric per al resum extractiu, basat en xarxes jeràrquiques siameses amb mecanismes d'atenció. També presentem dues instàncies d'aquest marc: \textit{Siamese Hierarchical Attention Networks} i \textit{Siamese Hierarchical Transformer Encoders}. Aquests sistemes s'han avaluat en els corpora CNN/DailyMail i Newsroom, obtenint resultats competitius en comparació amb altres enfocaments extractius coetanis. Pel que fa als programes de debat, s'ha proposat una tasca que consisteix a resumir les intervencions transcrites dels ponents, sobre un tema determinat, al programa "La Noche en 24 Horas". A més, es proposa un corpus d'articles periodístics, recollits de diversos diaris espanyols en línia, per tal d'estudiar la transferibilitat dels enfocaments proposats, entre articles i intervencions dels participants en els debats. Aquesta aproximació mostra millors resultats que altres tècniques extractives, juntament amb una transferibilitat de domini molt prometedora.[EN] Nowadays, society has access, and the possibility to contribute, to large amounts of the content present on the internet, such as social networks, online newspapers, forums, blogs, or multimedia content platforms. These platforms have had, during the last years, an overwhelming impact on the daily life of individuals and organizations, becoming the predominant ways for sharing, discussing, and analyzing online content. Therefore, it is very interesting to work with these platforms, from different points of view, under the umbrella of Natural Language Processing. In this thesis, we focus on two broad areas inside this field, applied to analyze online content: text analytics in social media and automatic summarization. Neural networks are also a central topic in this thesis, where all the experimentation has been performed by using deep learning approaches, mainly based on attention mechanisms. Besides, we mostly work with the Spanish language, due to it is an interesting and underexplored language with a great interest in the research projects we participated in. On the one hand, for text analytics in social media, we focused on affective analysis tasks, including sentiment analysis and emotion detection, along with the analysis of the irony. In this regard, an approach based on Transformer Encoders, based on contextualizing pretrained Spanish word embeddings from Twitter, to address sentiment analysis and irony detection tasks, is presented. We also propose the use of evaluation metrics as loss functions, in order to train neural networks for reducing the impact of the class imbalance in multi-class and multi-label emotion detection tasks. Additionally, a specialization of BERT both for the Spanish language and the Twitter domain, that takes into account inter-sentence coherence in Twitter conversation flows, is presented. The performance of all these approaches has been tested with different corpora, from several reference evaluation benchmarks, showing very competitive results in all the tasks addressed. On the other hand, we focused on extractive summarization of news articles and TV talk shows. Regarding the summarization of news articles, a theoretical framework for extractive summarization, based on siamese hierarchical networks with attention mechanisms, is presented. Also, we present two instantiations of this framework: Siamese Hierarchical Attention Networks and Siamese Hierarchical Transformer Encoders. These systems were evaluated on the CNN/DailyMail and the NewsRoom corpora, obtaining competitive results in comparison to other contemporary extractive approaches. Concerning the TV talk shows, we proposed a text summarization task, for summarizing the transcribed interventions of the speakers, about a given topic, in the Spanish TV talk shows of the ``La Noche en 24 Horas" program. In addition, a corpus of news articles, collected from several Spanish online newspapers, is proposed, in order to study the domain transferability of siamese hierarchical approaches, between news articles and interventions of debate participants. This approach shows better results than other extractive techniques, along with a very promising domain transferability.González Barba, JÁ. (2021). Attention-based Approaches for Text Analytics in Social Media and Automatic Summarization [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172245TESI
    corecore