7,538 research outputs found

    Sentiment analysis of financial Twitter posts on Twitter with the machine learning classifiers

    Get PDF
    This paper presents a sentiment analysis combining the lexicon-based and machine learning (ML)-based approaches in Turkish to investigate the public mood for the prediction of stock market behavior in BIST30, Borsa Istanbul. Our main motivation behind this study is to apply sentiment analysis to financial-related tweets in Turkish. We import 17189 tweets posted as "#Borsaistanbul, #Bist, #Bist30, #Bist100″ on Twitter between November 7, 2022, and November 15, 2022, via a MAXQDA 2020, a qualitative data analysis program. For the lexicon-based side, we use a multilingual sentiment offered by the Orange program to label the polarities of the 17189 samples as positive, negative, and neutral labels. Neutral labels are discarded for the machine learning experiments. For the machine learning side, we select 9076 data as positive and negative to implement the classification problem with six different supervised machine learning classifiers conducted in Python 3.6 with the sklearn library. In experiments, 80 % of the selected data is used for the training phase and the rest is used for the testing and validation phase. Results of the experiments show that the Support Vector Machine and Multilayer Perceptron classifier perform better than other classifiers with 0.89 and 0.88 accuracy and AUC values of 0.8729 and 0.8647 respectively. Other classifiers obtain approximately a 78,5 % accuracy rate. It is possible to increase sentiment analysis accuracy with parameter optimization on a larger, cleaner, and more balanced dataset by changing the pre-processing steps. This work can be expanded in the future to develop better sentiment analysis using deep learning approaches

    Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers

    Get PDF
    With the advent of the modern pre-trained Transformers, the text preprocessing has started to be neglected and not specifically addressed in recent NLP literature. However, both from a linguistic and from a computer science point of view, we believe that even when using modern Transformers, text preprocessing can significantly impact on the performance of a classification model. We want to investigate and compare, through this study, how preprocessing impacts on the Text Classification (TC) performance of modern and traditional classification models. We report and discuss the preprocessing techniques found in the literature and their most recent variants or applications to address TC tasks in different domains. In order to assess how much the preprocessing affects classification performance, we apply the three top referenced preprocessing techniques (alone or in combination) to four publicly available datasets from different domains. Then, nine machine learning models – including modern Transformers – get the preprocessed text as input. The results presented show that an educated choice on the text preprocessing strategy to employ should be based on the task as well as on the model considered. Outcomes in this survey show that choosing the best preprocessing technique – in place of the worst – can significantly improve accuracy on the classification (up to 25%, as in the case of an XLNet on the IMDB dataset). In some cases, by means of a suitable preprocessing strategy, even a simple Naïve Bayes classifier proved to outperform (i.e., by 2% in accuracy) the best performing Transformer. We found that Transformers and traditional models exhibit a higher impact of the preprocessing on the TC performance. Our main findings are: (1) also on modern pre-trained language models, preprocessing can affect performance, depending on the datasets and on the preprocessing technique or combination of techniques used, (2) in some cases, using a proper preprocessing strategy, simple models can outperform Transformers on TC tasks, (3) similar classes of models exhibit similar level of sensitivity to text preprocessing

    Statistical analysis of grouped text documents

    Get PDF
    L'argomento di questa tesi sono i modelli statistici per l'analisi dei dati testuali, con particolare attenzione ai contesti in cui i campioni di testo sono raggruppati. Quando si ha a che fare con dati testuali, il primo problema è quello di elaborarli, per renderli compatibili dal punto di vista computazionale e metodologico con i metodi matematici e statistici prodotti e continuamente sviluppati dalla comunità scientifica. Per questo motivo, la tesi passa in rassegna i metodi esistenti per la rappresentazione analitica e l'elaborazione di campioni di dati testuali, compresi i "Vector Space Models", le "rappresentazioni distribuite" di parole e documenti e i "contextualized embeddings". Questa rassegna comporta la standardizzazione di una notazione che, anche all'interno dello stesso approccio di rappresentazione, appare molto eterogenea in letteratura. Vengono poi esplorati due domini di applicazione: i social media e il turismo culturale. Per quanto riguarda il primo, viene proposto uno studio sull'autodescrizione di gruppi diversi di individui sulla piattaforma StockTwits, dove i mercati finanziari sono gli argomenti dominanti. La metodologia proposta ha integrato diversi tipi di dati, sia testuali che variabili categoriche. Questo studio ha agevolato la comprensione sul modo in cui le persone si presentano online e ha trovato stutture di comportamento ricorrenti all'interno di gruppi di utenti. Per quanto riguarda il turismo culturale, la tesi approfondisce uno studio condotto nell'ambito del progetto "Data Science for Brescia - Arts and Cultural Places", in cui è stato addestrato un modello linguistico per classificare le recensioni online scritte in italiano in quattro aree semantiche distinte relative alle attrazioni culturali della città di Brescia. Il modello proposto permette di identificare le attrazioni nei documenti di testo, anche quando non sono esplicitamente menzionate nei metadati del documento, aprendo così la possibilità di espandere il database relativo a queste attrazioni culturali con nuove fonti, come piattaforme di social media, forum e altri spazi online. Infine, la tesi presenta uno studio metodologico che esamina la specificità di gruppo delle parole, analizzando diversi stimatori di specificità di gruppo proposti in letteratura. Lo studio ha preso in considerazione documenti testuali raggruppati con variabile di "outcome" e variabile di gruppo. Il suo contributo consiste nella proposta di modellare il corpus di documenti come una distribuzione multivariata, consentendo la simulazione di corpora di documenti di testo con caratteristiche predefinite. La simulazione ha fornito preziose indicazioni sulla relazione tra gruppi di documenti e parole. Inoltre, tutti i risultati possono essere liberamente esplorati attraverso un'applicazione web, i cui componenti sono altresì descritti in questo manoscritto. In conclusione, questa tesi è stata concepita come una raccolta di studi, ognuno dei quali suggerisce percorsi di ricerca futuri per affrontare le sfide dell'analisi dei dati testuali raggruppati.The topic of this thesis is statistical models for the analysis of textual data, emphasizing contexts in which text samples are grouped. When dealing with text data, the first issue is to process it, making it computationally and methodologically compatible with the existing mathematical and statistical methods produced and continually developed by the scientific community. Therefore, the thesis firstly reviews existing methods for analytically representing and processing textual datasets, including Vector Space Models, distributed representations of words and documents, and contextualized embeddings. It realizes this review by standardizing a notation that, even within the same representation approach, appears highly heterogeneous in the literature. Then, two domains of application are explored: social media and cultural tourism. About the former, a study is proposed about self-presentation among diverse groups of individuals on the StockTwits platform, where finance and stock markets are the dominant topics. The methodology proposed integrated various types of data, including textual and categorical data. This study revealed insights into how people present themselves online and found recurring patterns within groups of users. About the latter, the thesis delves into a study conducted as part of the "Data Science for Brescia - Arts and Cultural Places" Project, where a language model was trained to classify Italian-written online reviews into four distinct semantic areas related to cultural attractions in the Italian city of Brescia. The model proposed allows for the identification of attractions in text documents, even when not explicitly mentioned in document metadata, thus opening possibilities for expanding the database related to these cultural attractions with new sources, such as social media platforms, forums, and other online spaces. Lastly, the thesis presents a methodological study examining the group-specificity of words, analyzing various group-specificity estimators proposed in the literature. The study considered grouped text documents with both outcome and group variables. Its contribution consists of the proposal of modeling the corpus of documents as a multivariate distribution, enabling the simulation of corpora of text documents with predefined characteristics. The simulation provided valuable insights into the relationship between groups of documents and words. Furthermore, all its results can be freely explored through a web application, whose components are also described in this manuscript. In conclusion, this thesis has been conceived as a collection of papers. It aimed to contribute to the field with both applications and methodological proposals, and each study presented here suggests paths for future research to address the challenges in the analysis of grouped textual data

    Dataflow Programming and Acceleration of Computationally-Intensive Algorithms

    Get PDF
    The volume of unstructured textual information continues to grow due to recent technological advancements. This resulted in an exponential growth of information generated in various formats, including blogs, posts, social networking, and enterprise documents. Numerous Enterprise Architecture (EA) documents are also created daily, such as reports, contracts, agreements, frameworks, architecture requirements, designs, and operational guides. The processing and computation of this massive amount of unstructured information necessitate substantial computing capabilities and the implementation of new techniques. It is critical to manage this unstructured information through a centralized knowledge management platform. Knowledge management is the process of managing information within an organization. This involves creating, collecting, organizing, and storing information in a way that makes it easily accessible and usable. The research involved the development textual knowledge management system, and two use cases were considered for extracting textual knowledge from documents. The first case study focused on the safety-critical documents of a railway enterprise. Safety is of paramount importance in the railway industry. There are several EA documents including manuals, operational procedures, and technical guidelines that contain critical information. Digitalization of these documents is essential for analysing vast amounts of textual knowledge that exist in these documents to improve the safety and security of railway operations. A case study was conducted between the University of Huddersfield and the Railway Safety Standard Board (RSSB) to analyse EA safety documents using Natural language processing (NLP). A graphical user interface was developed that includes various document processing features such as semantic search, document mapping, text summarization, and visualization of key trends. For the second case study, open-source data was utilized, and textual knowledge was extracted. Several features were also developed, including kernel distribution, analysis offkey trends, and sentiment analysis of words (such as unique, positive, and negative) within the documents. Additionally, a heterogeneous framework was designed using CPU/GPU and FPGAs to analyse the computational performance of document mapping

    UMSL Bulletin 2023-2024

    Get PDF
    The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp

    Multidisciplinary perspectives on Artificial Intelligence and the law

    Get PDF
    This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio

    From abuse to trust and back again

    Get PDF
    oai:westminsterresearch.westminster.ac.uk:w7qv

    Designing Gamification Concepts for Expert Explainable Artificial Intelligence Evaluation Tasks: A Problem Space Exploration

    Get PDF
    Artificial intelligence (AI) models are often complex and require additional explanations for use in high-stakes decision-making contexts like healthcare. To this end, explainable AI (XAI) developers must evaluate their explanations with domain experts to ensure understandability. As these evaluations are tedious and repetitive, we look at gamification as a means to motivate and engage experts in XAI evaluation tasks. We explore the problem space associated with gamified expert XAI evaluation. Based on a literature review of 22 relevant studies and seven interviews with experts in XAI evaluation, we elicit knowledge about affected stakeholders, eight needs, eight goals, and seven requirements. Our results help us understand better the problems associated with expert XAI evaluation and paint a broad application potential for gamification to improve XAI expert evaluations. In doing so, we lay the foundation for the design of successful gamification concepts for expert XAI evaluation

    Cognitive Inhibition as a Core Component of Executive Functions:Exploring Intra- and Interindividual Differences

    Get PDF
    Cognitive inhibition is an essential executive function that we use in our everyday lives. Numerous factors have been claimed to influence this construct including video gaming, exercise and expertise in musical instruments. However, in this thesis, I focus on an understudied factor, the alignment of chronotype and testing time, and a heavily studied yet controversial factor, bilingualism. Throughout this thesis, with one exception, I present a series of experiments which have been conducted online. In the first empirical chapter, I examined a relatively novel Faces task which the authors have claimed to measure three cognitive processes, including two different forms of inhibition and task switching (Chapter 2). Based on this chapter's findings, I decided to use the Faces task in Chapters 3, 4 and 6. The next two chapters determined whether the alignment of time of testing and chronotype influences inhibition and task switching among the young adult (Chapter 3) and older adult (Chapter 4) population. Afterwards, I explored how conflict is resolved through a mouse tracking paradigm and by extension, whether this paradigm can be used for a variety of inhibition tasks (Chapter 5). For the final empirical chapter, I identified whether training inhibition in a verbal domain impacts inhibition in a non-verbal domain (i.e., far transfer effects). To achieve this, I investigated whether bilingualism, which can be seen as a form of cognitive training within the verbal domain, influences performance in non-verbal tasks which index inhibition (Chapter 6). The main findings of this thesis suggest that cognitive inhibition is not substantially impacted by synchrony effects nor by bilingualism. Furthermore, the findings imply that mouse tracking could be a promising tool to use to examine cognitive inhibition
    • …
    corecore