840 research outputs found
Deep Learning Methods for Register Classification
For this project the data used is the one collected by, Biber and Egbert (2018) related to various language articles from the internet. I am using BERT model (Bidirectional Encoder Representations from Transformers), which is a deep neural network and FastText, which is a shallow neural network, as a baseline to perform text classification. Also, I am using Deep Learning models like XLNet to see if classification accuracy is improved.
Also, it has been described by Biber and Egbert (2018) what is register. We can think of register as genre. According to Biber (1988), register is varieties defined in terms of general situational parameters. Hence, it can be inferred that there is a close relation between the language and the context of the situation in which it is being used. This work attempts register classification using deep learning methods that use attention mechanism. Working with the models, dealing with the imbalanced datasets in real life problems, tuning the hyperparameters for training the models was accomplished throughout the work. Also, proper evaluation metrics for various kind of data was determined.
The background study shows that how cumbersome the use classical Machine Learning approach used to be. Deep Learning, on the other hand, can accomplish the task with ease. The metric to be selected for the classification task for different types of datasets (balanced vs imbalanced), dealing with overfitting was also accomplished
Classifiers and text mining: application to a specific context
[Abstract]: The constant growth of social networks has not only brought us new ways of interacting
with each other, but has also given way to a severe increase in negative behaviors: hate
speech, racism, gender harassment, cyberbullying, etc. Manually trying to detect this kind of
behaviours in millions of daily social media posts is out of the question. The solution lies in
developing intelligent systems to automate such detection tasks.
As the nature of these texts is completely subjective, this problem falls under the field
of sentiment analysis, which aims to systematically identify and study affective states and
subjective information in textual data using natural language processing techniques.
In particular, this project is focused on the research of different machine learning techniques
related to natural language processing, in order to automate and perform a reliable
detection and classification of sexist-related behaviours in social media texts. We will tackle
the task of adequately processing the extracted data from social media, as well as researching
various text classification techniques and models that we will use to develop and evaluate a
variety of classifiers.Traballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2021/202
Recent Trends in Computational Intelligence
Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications
Incremental Coreference Resolution for German
The main contributions of this thesis are as follows:
1. We introduce a general model for coreference and explore its application to German.
• The model features an incremental discourse processing algorithm which allows it to coherently address issues caused by underspecification of mentions, which is an especially pressing problem regarding certain German pronouns.
• We introduce novel features relevant for the resolution of German pronouns. A subset of these features are made accessible through the incremental architecture of the discourse processing model.
• In evaluation, we show that the coreference model combined with our features provides new state-of-the-art results for coreference and pronoun resolution for German.
2. We elaborate on the evaluation of coreference and pronoun resolution.
• We discuss evaluation from the view of prospective downstream applications that benefit from coreference resolution as a preprocessing component. Addressing the shortcomings of the general evaluation framework in this regard, we introduce an alternative framework, the Application Related Coreference Scores (ARCS).
• The ARCS framework enables a thorough comparison of different system outputs and the quantification of their similarities and differences beyond the common coreference evaluation. We demonstrate how the framework is applied to state-of-the-art coreference systems. This provides a method to track specific differences in system outputs, which assists researchers in comparing their approaches to related work in detail.
3. We explore semantics for pronoun resolution.
• Within the introduced coreference model, we explore distributional approaches to estimate the compatibility of an antecedent candidate and the occurrence context of a pronoun. We compare a state-of-the-art approach for word embeddings to syntactic co-occurrence profiles to this end.
• In comparison to related work, we extend the notion of context and thereby increase the applicability of our approach. We find that a combination of both compatibility models, coupled with the coreference model, provides a large potential for improving pronoun resolution performance.
We make available all our resources, including a web demo of the system, at: http://pub.cl.uzh.ch/purl/coreference-resolutio
- …