174 research outputs found
Understanding user behavior aspects on emergency mobile applications during emergency communications using NLP and text mining techniques
Abstract. The use of mobile devices has been skyrocketing in our society. Users can access and share any type of information in a timely manner through these devices using different social media applications. This enabled users to increase their awareness of ongoing events such as election campaigns, sports updates, movie releases, disaster occurrences, and studies. The attractiveness, affordability, and two-way communication capabilities empowered these mobile devices that support various social media platforms to be central to emergency communication as well. This makes a mobile-based emergency application an attractive communication tool during emergencies. The emergence of mobile-based emergency communication has intrigued us to learn about the user behavior related to the usage of these applications. Our study was mainly conducted on emergency apps in Nordic countries such as Finland, Sweden, and Norway. To understand the user objects regarding the usage of emergency mobile applications we leveraged various Natural Language Processing and Text Mining techniques. VADER sentiment tool was used to predict and track usersβ review polarity of a particular application over time. Lately, to identify factors that affect usersβ sentiments, we employed topic modeling techniques such as the Latent Dirichlet Allocation (LDA) model. This model identifies various themes discussed in the user reviews and the result of each theme will be represented by the weighted sum of words in the corpus. Even though LDA succeeds in highlighting the user-related factors, it fails to identify the aspects of the user, and the topic definition from the LDA model is vague. Hence we leveraged Aspect Based Sentiment Analysis (ABSA) methods to extract the user aspects from the user reviews. To perform this task we consider fine-tuning DeBERTa (a variant of the BERT model). BERT is a Bidirectional Encoder Representation of transformer architecture which allows the model to learn the context in the text. Following this, we performed a sentence pair sentiment classification task using different variants of BERT. Later, we dwell on different sentiments to highlight the factors and the categories that impact user behavior most by leveraging the Empath categorization technique. Finally, we construct a word association by considering different Ontological vocabularies related to mobile applications and emergency response and management systems. The insights from the study can be used to identify the user aspect terms, predict the sentiment of the aspect term in the review provided, and find how the aspect term impacts the user perspective on the usage of mobile emergency applications
Traitement automatique de rapports dβincidents et accidents : application Γ la gestion du risque dans lβaviation civile
Π’oΠ·ΠΈ ΡΠ΅ΡΠ΅ΡΠ°Ρ ΠΎΠΏΠΈΡΠ²Π° ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΠ΅ΡΠΎ Π½Π° Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ½Π°ΡΠ° ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠ° Π½Π° Π΅ΡΡΠ΅ΡΡΠ²Π΅Π½ Π΅Π·ΠΈΠΊ (ΠΠΠ) Π² ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡΠ° Π½Π° ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ΡΠΎ Π½Π° ΡΠΈΡΠΊΠ° Π² Π³ΡΠ°ΠΆΠ΄Π°Π½ΡΠΊΠΎΡΠΎ Π²ΡΠ·Π΄ΡΡ
ΠΎΠΏΠ»Π°Π²Π°Π½Π΅. Π ΡΠ°Π·ΠΈ ΠΎΠ±Π»Π°ΡΡ Π΄ΠΎΠΊΠ»Π°Π΄Π²Π°Π½Π΅ΡΠΎ Π½Π° ΠΈΠ½ΡΠΈΠ΄Π΅Π½ΡΠΈ ΠΈ ΡΠ°Π·ΡΠ»Π΅Π΄Π²Π°Π½Π΅ΡΠΎ Π½Π° ΠΏΡΠΎΠΈΠ·ΡΠ΅ΡΡΠ²ΠΈΡ Π³Π΅Π½Π΅ΡΠΈΡΠ°Ρ Π³ΠΎΠ»ΡΠΌΠΎ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡ, Π³Π»Π°Π²Π½ΠΎ ΠΏΠΎΠ΄ ΡΠΎΡΠΌΠ°ΡΠ° Π½Π° ΡΠ΅ΠΊΡΡΠΎΠ²ΠΈ ΠΎΠΏΠΈΡΠ°Π½ΠΈΡ Π½Π° Π½Π΅ΠΎΠ±ΠΈΡΠ°ΠΉΠ½ΠΈ ΡΡΠ±ΠΈΡΠΈΡ. ΠΠ° ΠΏΡΡΠ²ΠΎ Π²ΡΠ΅ΠΌΠ΅ ΠΎΠΏΠΈΡΠ²Π°ΠΌΠ΅ ΡΠ°Π»ΠΈΡΠ½ΠΈΡΠ΅ ΡΠΈΠΏΠΎΠ²Π΅ (ΡΠ΅ΠΊΡΡΠΎΠ²ΠΈ) Π΄Π°Π½Π½ΠΈ, ΠΊΠΎΠΈΡΠΎ ΡΠ΅ΠΊΡΠΎΡΡΡ ΠΏΡΠΎΠΈΠ·Π²Π΅ΠΆΠ΄Π°. ΠΠ½Π°Π»ΠΈΠ·ΠΈΡΠ°ΠΌΠ΅ ΡΠ°ΠΌΠΈΡΠ΅ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΈ, ΠΌΠ΅ΡΠΎΠ΄ΠΈΡΠ΅ Π·Π° ΡΡΡ
ΡΠ°Π½ΡΠ²Π°Π½Π΅ΡΠΎ ΠΈΠΌ, ΠΊΠ°ΠΊ ΡΠ° ΠΎΡΠ³Π°Π½ΠΈΠ·ΠΈΡΠ°Π½ΠΈ, ΠΊΠ°ΠΊΡΠΎ ΠΈ ΡΠ΅Ρ
Π½ΠΈΡΠ΅ ΡΠΏΠΎΡΡΠ΅Π±ΠΈ ΠΎΡ Π΅ΠΊΠΏΠ΅ΡΡΠΈ ΠΏΠΎ ΡΠΈΠ³ΡΡΠ½ΠΎΡΡΡΠ°. ΠΠΎΠΊΠ°Π·Π²Π°ΠΌΠ΅, ΡΠ΅ ΡΡΠ²ΡΠ΅ΠΌΠ΅Π½ΠΈΡΠ΅ ΠΏΠ°ΡΠ°Π΄ΠΈΠ³ΠΌΠΈ Π·Π° ΡΡΡ
ΡΠ°Π½ΡΠ²Π°Π½Π΅ ΠΈ ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΡ Π½Π΅ ΡΠ° Π΄ΠΎΠ±ΡΠ΅ ΠΏΡΠΈΡΠΏΠΎΡΠΎΠ±Π΅Π½ΠΈ ΠΊΡΠΌ ΡΠ΅Π°Π»Π½Π°ΡΠ° ΡΠΏΠΎΡΡΠ΅Π±Π° Π½Π° ΡΠΎΠ·ΠΈ ΡΠΈΠΏ Π΄Π°Π½Π½ΠΈ ΠΈ ΡΡΡΠ°Π½ΠΎΠ²ΡΠ²Π°ΠΌΠ΅ ΠΏΡΠΎΠ±Π»Π΅ΠΌΠ½ΠΈΡΠ΅ Π·ΠΎΠ½ΠΈ, Π² ΠΊΠΎΠΈΡΠΎ ΠΠΠ Π΅ ΡΠ°ΡΡ ΠΎΡ ΡΠ΅ΡΠ΅Π½ΠΈΠ΅ΡΠΎ. ΠΠ²Π΅ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡ, ΠΎΡΠ³ΠΎΠ²Π°ΡΡΡΠΈ ΠΏΡΠ΅ΡΠΈΠ·Π½ΠΎ Π½Π° Π½ΡΠΆΠ΄ΠΈΡΠ΅ Π½Π° Π΅ΠΊΡΠΏΠ΅ΡΡΠΈ ΠΏΠΎ Π°Π²ΠΈΠ°ΡΠΈΠΎΠ½Π½Π° ΡΠΈΠ³ΡΡΠ½ΠΎΡΡ, ΡΠ° ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠΈΡΠ°Π½ΠΈ: Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ½Π° ΠΊΠ»Π°ΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ Π½Π° Π΄ΠΎΠΊΠ»Π°Π΄ΠΈ Π·Π° ΠΈΠ½ΡΠΈΠ΄Π΅Π½ΡΠΈ ΠΈ ΡΠΈΡΡΠ΅ΠΌΠ° Π·Π° ΠΏΡΠΎΡΡΠ²Π°Π½Π΅ Π½Π° Π½Π° ΠΊΠΎΠ»Π΅ΠΊΡΠΈΠΈ, ΠΎΡΠ½ΠΎΠ²Π°Π²Π°ΡΠ° ΡΠ΅ Π²ΡΡΡ
Ρ ΡΠ΅ΠΊΡΡΠΎΠ²ΠΎΡΠΎ ΡΡ
ΠΎΠ΄ΡΡΠ²ΠΎ. ΠΡΠ· ΠΎΡΠ½ΠΎΠ²Π° Π½Π° Π½Π°Π±Π»ΡΠ΄Π΅Π½ΠΈΡ Π½Π° ΡΠ΅Π°Π»Π½Π°ΡΠ° ΡΠΏΠΎΡΡΠ΅Π±Π° Π½Π° ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡΡΠ°, ΠΏΡΠ΅Π΄Π»Π°Π³Π°ΠΌΠ΅ Π½ΡΠΊΠΎΠ»ΠΊΠΎ ΠΌΠ΅ΡΠΎΠ΄Π° Π·Π° ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠ° Π½Π° Π΄ΠΎΠΊΠ»Π°Π΄ΠΈ Π·Π° ΠΈΠ½ΡΠΈΠ΄Π΅Π½ΡΠΈ ΠΈ ΠΏΡΠΎΠΈΠ·ΡΠ΅ΡΡΠ²ΠΈΡ ΠΈ ΠΎΠ±ΡΡΠΆΠ΄Π°ΠΌΠ΅ Π² Π΄ΡΠ»Π±ΠΎΡΠΈΠ½Π° ΠΊΠ°ΠΊ ΠΠΠ ΠΌΠΎΠΆΠ΅ Π΄Π° Π±ΡΠ΄Π΅ ΠΏΡΠΎΠ»ΠΎΠΆΠ΅Π½ΠΎ Π½Π° ΡΠ°Π·Π»ΠΈΡΠ½ΠΈ Π½ΠΈΠ²Π° Π² ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½o-ΠΎΠ±ΡΠ°Π±ΠΎΡΠ²Π°ΡΠΈΡΠ΅ ΡΡΡΡΠΊΡΡΡΠΈ Π½Π° Π΅Π΄ΠΈΠ½ Π²ΠΈΡΠΎΠΊΠΎΡΠΈΡΠΊΠΎΠ² ΡΠ΅ΠΊΡΠΎΡ. ΠΡΠ΅Π½ΡΠ²Π°ΠΉΠΊΠΈ ΠΌΠ΅ΡΠΎΠ΄ΠΈΡΠ΅ ΠΏΠΎΠΊΠ°Π·Π²Π°ΠΌΠ΅, ΡΠ΅ ΡΡΡΠ΄Π½ΠΎΡΡΠΈΡΠ΅ ΡΠ²ΡΡΠ·Π°Π½ΠΈ Ρ ΠΌΠ½ΠΎΠ³ΠΎΠΈΠ·ΠΌΠ΅ΡΠ½ΠΎΡΡΡΠ° ΠΈ ΠΈΠ·ΠΌΠ΅Π½ΠΈΠΌΠΎΡΡΡΠ° Π½Π° ΡΠΎΠ²Π΅ΡΠΊΠΈΡ Π΅Π·ΠΈΠΊ ΠΌΠΎΠ³Π°Ρ Π΄Π° Π±ΡΠ΄Π°Ρ Π΅ΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎ Π°Π΄ΡΠ΅ΡΠΈΡΠ°Π½ΠΈ ΠΈ ΠΏΡΠ΅Π΄Π»Π°Π³Π°ΠΌΠ΅ Π½Π°Π΄Π΅ΠΆΠ΄Π½ΠΈ Π²ΡΠ·Ρ
ΠΎΠ΄ΡΡΠΈ ΠΌΠ΅ΡΠΎΠ΄ΠΈ Π·Π° ΡΠΏΡΠ°Π²ΡΠ½Π΅ ΡΡΡ ΡΠ²ΡΡΡ
ΠΈΠ·ΠΎΠ±ΠΈΠ»ΠΈΠ΅ΡΠΎ Π½Π° Π΄ΠΎΠΊΠ»Π°Π΄ΠΈ Π·Π° ΠΈΠ½ΡΠΈΠ΄Π΅Π½ΡΠΈ Π² ΡΠ΅ΠΊΡΡΠΎΠ²ΠΈ ΡΠΎΡΠΌΠ°ΡThis thesis describes the applications of natural language processing (NLP) to industrial risk management. We focus on the domain of civil aviation, where incident reporting and accident investigations produce vast amounts of information, mostly in the form of textual accounts of abnormal events, and where efficient access to the information contained in the reports is required. We start by drawing a panorama of the different types of data produced in this particular domain. We analyse the documents themselves, how they are stored and organised as well as how they are used within the community. We show that the current storage and organisation paradigms are not well adapted to the data analysis requirements, and we identify the problematic areas, for which NLP technologies are part of the solution. Specifically addressing the needs of aviation safety professionals, two initial solutions are implemented: automatic classification for assisting in the coding of reports within existing taxonomies and a system based on textual similarity for exploring collections of reports. Based on the observation of real-world tool usage and on user feedback, we propose different methods and approaches for processing incident and accident reports and comprehensively discuss how NLP can be applied within the safety information processing framework of a high-risk sector. By deploying and evaluating certain approaches, we show how elusive aspects related to the variability and multidimensionality of language can be addressed in a practical manner and we propose bottom-up methods for managing the overabundance of textual feedback dataCette thΓ¨se dΓ©crit les applications du traitement automatique des langues (TAL) Γ la gestion des risques industriels. Elle se concentre sur le domaine de l'aviation civile, oΓΉ le retour d'expΓ©rience (REX) gΓ©nΓ¨re de grandes quantitΓ©s de donnΓ©es, sous la forme de rapports d'accidents et d'incidents. Nous commenΓ§ons par faire un panorama des diffΓ©rentes types de donnΓ©es gΓ©nΓ©rΓ©es dans ce secteur d'activitΓ©. Nous analysons les documents, comment ils sont produits, collectΓ©s, stockΓ©s et organisΓ©s ainsi que leurs utilisations. Nous montrons que le paradigme actuel de stockage et dβorganisation est mal adaptΓ© Γ lβutilisation rΓ©elle de ces documents et identifions des domaines problΓ©matiques ou les technologies du langage constituent une partie de la solution. RΓ©pondant prΓ©cisΓ©ment aux besoins d'experts en sΓ©curitΓ©, deux solutions initiales sont implΓ©mentΓ©es : la catΓ©gorisation automatique de documents afin d'aider le codage des rapports dans des taxonomies prΓ©existantes et un outil pour l'exploration de collections de rapports, basΓ© sur la similaritΓ© textuelle. En nous basant sur des observations de l'usage de ces outils et sur les retours de leurs utilisateurs, nous proposons diffΓ©rentes mΓ©thodes d'analyse des textes issus du REX et discutons des maniΓ¨res dont le TAL peut Γͺtre appliquΓ© dans le cadre de la gestion de la sΓ©curitΓ© dans un secteur Γ haut risque. En dΓ©ployant et Γ©valuant certaines solutions, nous montrons que mΓͺme des aspects subtils liΓ©s Γ la variation et Γ la multidimensionnalitΓ© du langage peuvent Γͺtre traitΓ©s en pratique afin de gΓ©rer la surabondance de donnΓ©es REX textuelles de maniΓ¨re ascendant
Unsupervised and knowledge-poor approaches to sentiment analysis
Sentiment analysis focuses upon automatic classiffication of a document's sentiment (and more generally extraction of opinion from text). Ways of expressing sentiment have been
shown to be dependent on what a document is about (domain-dependency). This complicates supervised methods for sentiment analysis which rely on extensive use of training data or linguistic resources that are usually either domain-specific or generic. Both kinds of resources prevent classiffiers from performing well across a range of domains, as this requires appropriate in-domain (domain-specific) data.
This thesis presents a novel unsupervised, knowledge-poor approach to sentiment analysis aimed at creating a domain-independent and multilingual sentiment analysis system.
The approach extracts domain-specific resources from documents that are to be processed, and uses them for sentiment analysis. This approach does not require any training corpora, large sets of rules or generic sentiment lexicons, which makes it domain- and languageindependent but at the same time able to utilise domain- and language-specific information.
The thesis describes and tests the approach, which is applied to diffeerent data, including customer reviews of various types of products, reviews of films and books, and news items; and to four languages: Chinese, English, Russian and Japanese. The approach is applied not only to binary sentiment classiffication, but also to three-way sentiment classiffication (positive, negative and neutral), subjectivity classifiation of documents and sentences, and to the extraction of opinion holders and opinion targets. Experimental results suggest that the approach is often a viable alternative to supervised systems, especially when applied to large document collections
Graph-Based Conversation Analysis in Social Media
Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on user relationships or on the shared content, while ignoring the valuable information hidden in the digital conversations, in terms of structure of the discussion and relation between contents, which is essential for understanding online communication behavior. This work proposes a graph-based framework to assess the shape and structure of online conversations. The analysis was composed of two main stages: intent analysis and network generation. Users' intention was detected using keyword-based classification, followed by the implementation of machine learning-based classification algorithms for uncategorized comments. Afterwards, human-in-the-loop was involved in improving the keyword-based classification. To extract essential information on social media communication patterns among the users, we built conversation graphs using a directed multigraph network and we show our model at work in two real-life experiments. The first experiment used data from a real social media challenge and it was able to categorize 90% of comments with 98% accuracy. The second experiment focused on COVID vaccine-related discussions in online forums and investigated the stance and sentiment to understand how the comments are affected by their parent discussion. Finally, the most popular online discussion patterns were mined and interpreted. We see that the dynamics obtained from conversation graphs are similar to traditional communication activities
Enabling parallelism and optimizations in data mining algorithms for power-law data
Today's data mining tasks aim to extract meaningful information from a large amount of data in a reasonable time mainly via means of --- a) algorithmic advances, such as fast approximate algorithms and efficient learning algorithms, and b) architectural advances, such as machines with massive compute capacity involving distributed multi-core processors and high throughput accelerators. For current and future generation processors, parallel algorithms are critical for fully utilizing computing resources. Furthermore, exploiting data properties for performance gain becomes crucial for data mining applications. In this work, we focus our attention on power-law behavior β-- a common property found in a large class of data, such as text data, internet traffic, and click-stream data. Specifically, we address the following questions in the context of power-law data: How well do the critical data mining algorithms of current interest fit with today's parallel architectures? Which algorithmic and mapping opportunities can be leveraged to further improve performance?, and What are the relative challenges and gains for such approaches? Specifically, we first investigate the suitability of the "frequency estimation" problem for GPU-scale parallelism. Sketching algorithms are a popular choice for this task due to their desirable trade-off between estimation accuracy and space-time efficiency. However, most of the past work on sketch-based frequency estimation focused on CPU implementations. In our work, we propose a novel approach for sketches, which exploits the natural skewness in the power-law data to efficiently utilize the massive amounts of parallelism in modern GPUs. Next, we explore the problem of "identifying top-K frequent elements" for distributed data streams on modern distributed settings with both multi-core and multi-node CPU parallelism. Sketch-based approaches, such as Count-Min Sketch (CMS) with top-K heap, have an excellent update time but lacks the important property of reducibility, which is needed for exploiting data parallelism. On the other end, the popular Frequent Algorithm (FA) leads to reducible summaries, but its update costs are high. Our approach Topkapi, gives the best of both worlds, i.e., it is reducible like FA and has an efficient update time similar to CMS. For power-law data, Topkapi possesses strong theoretical guarantees and leads to significant performance gains, relative to past work. Finally, we study Word2Vec, a popular word embedding method widely used in Machine learning and Natural Language Processing applications, such as machine translation, sentiment analysis, and query answering. This time, we target Single Instruction Multiple Data (SIMD) parallelism. With the increasing vector lengths in commodity CPUs, such as AVX-512 with a vector length of 512 bits, efficient vector processing unit utilization becomes a major performance game-changer. By employing a static multi-version code generation strategy coupled with an algorithmic approximation based on the power-law frequency distribution of words, we achieve significant reductions in training time relative to the state-of-the-art.Ph.D
- β¦