    Deep Modeling of Latent Representations for Twitter Profiles on Hate Speech Spreaders Identification Task

    [EN] In this paper, we describe the system proposed by UO-UPV team for addressing the task Profiling Hate Speech Spreaders on Twitter shared at PAN 2021. The system relies on a modular architecture, combining Deep Learning models with an introduced variant of the Impostor Method (IM). It receives a single profile composed of a fixed quantity of tweets. These posts are encoded as dense feature vectors using a fine-tuned transformer model and later combined to represent the whole profile. For classifying a new profile as hate speech spreader or not, it is compared by a similarity function with the Impostor Method with respect to random sampled prototypical profiles. In the final evaluation phase, our model achieved 74% and 82% of accuracy for English and Spanish languages respectively, ranking our team at 2¿¿ position and giving a starting point for further improvements.The work of the third author was in the framework of the research project MISMIS-FAKEnHATE on MISinformation and MIScommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31), funded by Spanish Ministry of Science and Innovation, and DeepPattern (PROMETEO/2019/121), funded by the Generalitat Valenciana.Labadie Tamayo, R.; Castro Castro, D.; Ortega-Bueno, R. (2021). Deep Modeling of Latent Representations for Twitter Profiles on Hate Speech Spreaders Identification Task. CEUR. 2035-2046. http://hdl.handle.net/10251/1906692035204

    Uticaj klasifikacije teksta na primene u obradi prirodnih jezika

    The main goal of this dissertation is to put different text classification tasks in the same frame, by mapping the input data into the common vector space of linguistic attributes. Subsequently, several classification problems of great importance for natural language processing are solved by applying the appropriate classification algorithms. The dissertation deals with the problem of validation of bilingual translation pairs, so that the final goal is to construct a classifier which provides a substitute for human evaluation and which decides whether the pair is a proper translation between the appropriate languages by means of applying a variety of linguistic information and methods. In dictionaries it is useful to have a sentence that demonstrates use for a particular dictionary entry. This task is called the classification of good dictionary examples. In this thesis, a method is developed which automatically estimates whether an example is good or bad for a specific dictionary entry. Two cases of short message classification are also discussed in this dissertation. In the first case, classes are the authors of the messages, and the task is to assign each message to its author from that fixed set. This task is called authorship identification. The other observed classification of short messages is called opinion mining, or sentiment analysis. Starting from the assumption that a short message carries a positive or negative attitude about a thing, or is purely informative, classes can be: positive, negative and neutral. These tasks are of great importance in the field of natural language processing and the proposed solutions are language-independent, based on machine learning methods: support vector machines, decision trees and gradient boosting. For all of these tasks, a demonstration of the effectiveness of the proposed methods is shown on for the Serbian language.Osnovni cilj disertacije je stavljanje različitih zadataka klasifikacije teksta u isti okvir, preslikavanjem ulaznih podataka u isti vektorski prostor lingvističkih atributa..

    Author Profiling Tracks at FIRE

    [EN] Benchmarking activities are vital for fostering research and addressing new challenging problems. During the last 10 years of the FIRE initiative we have been involved in the organization of more than ten tracks, with the aim of the creation of new resources in several languages that were made available to the research community. This allowed to compare the new several approaches on the same datasets. In this chapter we will focus on the description of three author profiling tracks, on their data creation as well as the results analysis.The work on the author profiling data in Arabic was made possible by NPRP Grant #9-175-1-033 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authorsRosso, P.; Rangel Pardo, FM. (2020). Author Profiling Tracks at FIRE. SN Computer Science. 1:1-11. https://doi.org/10.1007/s42979-020-0073-1S1111Al Sukhni E, Alequr Q. 