7,432 research outputs found

    Fuzzy multi-task learning for hate speech type identification

    Get PDF
    In traditional machine learning, classifiers training is typically un-dertaken in the setting of single-task learning, so the trained classi-fier can discriminate between different classes. However, this must be based on the assumption that different classes are mutually exclusive. In real applications, the above assumption does not always hold. For example, the same book may belong to multiple subjects. From this point of view, researchers were motivated to formulate multi-label learning problems. In this context, each instance can be assigned multiple labels but the classifiers training is still typically undertaken in the setting of single-task learning. When probabilistic approaches are adopted for classifiers training, multi-task learning can be enabled through transformation of a multi-labelled data set into several binary data sets. The above data transformation could usually result in the class imbalance issue. Without the above data transformation, multi-labelling of data results in an exponential increase of the number of classes, leading to fewer instances for each class and a higher difficulty for identifying each class. In addition, multi-labelling of data is very time consuming and expensive in some application areas, such as hate speech detection. In this paper, we introduce a novel formulation of the hate speech type identification problem in the setting of multi-task learning through our proposed fuzzy ensemble approach. In this setting, single-labelled data can be used for semi-supervised multi-label learning and two new metrics (detection rate and irrelevance rate) are thus proposed to measure more effectively the performance for this kind of learning tasks. We report an experimental study on identification of four types of hate speech, namely: religion, race, disability and sexual orientation. The experimental results show that our proposed fuzzy ensemble approach outperforms other popular probabilistic approaches, with an overall detection rate of 0.93

    Data Analysis Methods for Software Systems

    Get PDF
    Using statistics, econometrics, machine learning, and functional data analysis methods, we evaluate the consequences of the lockdown during the COVID-19 pandemics for wage inequality and unemployment. We deduce that these two indicators mostly reacted to the first lockdown from March till June 2020. Also, analysing wage inequality, we conduct analysis separately for males and females and different age groups.We noticed that young females were affected mostly by the lockdown.Nevertheless, all the groups reacted to the lockdown at some level

    Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection

    Full text link
    Algorithmic bias often arises as a result of differential subgroup validity, in which predictive relationships vary across groups. For example, in toxic language detection, comments targeting different demographic groups can vary markedly across groups. In such settings, trained models can be dominated by the relationships that best fit the majority group, leading to disparate performance. We propose framing toxicity detection as multi-task learning (MTL), allowing a model to specialize on the relationships that are relevant to each demographic group while also leveraging shared properties across groups. With toxicity detection, each task corresponds to identifying toxicity against a particular demographic group. However, traditional MTL requires labels for all tasks to be present for every data point. To address this, we propose Conditional MTL (CondMTL), wherein only training examples relevant to the given demographic group are considered by the loss function. This lets us learn group specific representations in each branch which are not cross contaminated by irrelevant labels. Results on synthetic and real data show that using CondMTL improves predictive recall over various baselines in general and for the minority demographic group in particular, while having similar overall accuracy

    Women in Artificial intelligence (AI)

    Get PDF
    This Special Issue, entitled "Women in Artificial Intelligence" includes 17 papers from leading women scientists. The papers cover a broad scope of research areas within Artificial Intelligence, including machine learning, perception, reasoning or planning, among others. The papers have applications to relevant fields, such as human health, finance, or education. It is worth noting that the Issue includes three papers that deal with different aspects of gender bias in Artificial Intelligence. All the papers have a woman as the first author. We can proudly say that these women are from countries worldwide, such as France, Czech Republic, United Kingdom, Australia, Bangladesh, Yemen, Romania, India, Cuba, Bangladesh and Spain. In conclusion, apart from its intrinsic scientific value as a Special Issue, combining interesting research works, this Special Issue intends to increase the invisibility of women in AI, showing where they are, what they do, and how they contribute to developments in Artificial Intelligence from their different places, positions, research branches and application fields. We planned to issue this book on the on Ada Lovelace Day (11/10/2022), a date internationally dedicated to the first computer programmer, a woman who had to fight the gender difficulties of her times, in the XIX century. We also thank the publisher for making this possible, thus allowing for this book to become a part of the international activities dedicated to celebrating the value of women in ICT all over the world. With this book, we want to pay homage to all the women that contributed over the years to the field of AI

    A fuzzy approach to text classification with two-stage training for ambiguous instances

    Get PDF
    Sentiment analysis is a very popular application area of text mining and machine learning. The popular methods include Support Vector Machine, Naive Bayes, Decision Trees and Deep Neural Networks. However, these methods generally belong to discriminative learning, which aims to distinguish one class from others with a clear-cut outcome, under the presence of ground truth. In the context of text classification, instances are naturally fuzzy (can be multi-labeled in some application areas) and thus are not considered clear-cut, especially given the fact that labels assigned to sentiment in text represent an agreed level of subjective opinion for multiple human annotators rather than indisputable ground truth. This has motivated researchers to develop fuzzy methods, which typically train classifiers through generative learning, i.e. a fuzzy classifier is used to measure the degree to which an instance belongs to each class. Traditional fuzzy methods typically involve generation of a single fuzzy classifier and employ a fixed rule of defuzzification outputting the class with the maximum membership degree. The use of a single fuzzy classifier with the above fixed rule of defuzzification is likely to get the classifier encountering the text ambiguity situation on sentiment data, i.e. an instance may obtain equal membership degrees to both the positive and negative classes. In this paper, we focus on cyberhate classification, since the spread of hate speech via social media can have disruptive impacts on social cohesion and lead to regional and community tensions. Automatic detection of cyberhate has thus become a priority research area. In particular, we propose a modified fuzzy approach with two stage training for dealing with text ambiguity and classifying four types of hate speech, namely: religion, race, disability and sexual orientation - and compare its performance with those popular methods as well as some existing fuzzy approaches, while the features are prepared through the Bag-of-Words and Word Embedding feature extraction methods alongside the correlation based feature subset selection method. The experimental results show that the proposed fuzzy method outperforms the other methods in most cases

    Overview of the EVALITA 2018 Task on Irony Detection in Italian Tweets (IronITA)

    Get PDF
    IronITA is a new shared task in the EVALITA 2018 evaluation campaign, focused on the automatic classification of irony in Italian texts from Twitter. It includes two tasks: 1) irony detection and 2) detection of different types of irony, with a special focus on sarcasm identification. We received 17 submissions for the first task and 7 submissions for the second task from 7 teams.IronITA è un nuovo esercizio di valutazione della campagna di valutazione EVALITA 2018, specificamente dedicato alla classificazione automatica dell’ironia presente in testi estratti da Twitter. Comprende due task: 1) riconoscimento dell’ironia e 2) riconoscimento di diversi tipi di ironia, con particolare attenzione all’identificazione del sarcasmo. Abbiamo ricevuto 17 sottomissioni per il primo task e 7 per il secondo, da parte di 7 gruppi partecipanti

    Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

    Full text link
    [EN] The popularity of social media has created problems such as hate speech and sexism. The identification and classification of sexism in social media are very relevant tasks, as they would allow building a healthier social environment. Nevertheless, these tasks are considerably challenging. This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish. It was conducted in the context of the sEXism Identification in Social neTworks shared 2021 (EXIST 2021) task, proposed by the Iberian Languages Evaluation Forum (IberLEF). The proposed system and its main components are described, and an in-depth hyperparameters analysis is conducted. The main results observed were: (i) the system obtained better results than the baseline model (multilingual BERT); (ii) ensemble models obtained better results than monolingual models; and (iii) an ensemble model considering all individual models and the best standardized values obtained the best accuracies and F1-scores for both tasks. This work obtained first place in both tasks at EXIST, with the highest accuracies (0.780 for task 1 and 0.658 for task 2) and F1-scores (F1-binary of 0.780 for task 1 and F1-macro of 0.579 for task 2).Magnossao De Paula, AF.; Fray Da Silva, R.; Baris Schlicht, I. (2021). Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models. CEUR Workshop. 356-373. http://hdl.handle.net/10251/19056135637
    • …
    corecore