174 research outputs found

    Understanding user behavior aspects on emergency mobile applications during emergency communications using NLP and text mining techniques

    Get PDF
    Abstract. The use of mobile devices has been skyrocketing in our society. Users can access and share any type of information in a timely manner through these devices using different social media applications. This enabled users to increase their awareness of ongoing events such as election campaigns, sports updates, movie releases, disaster occurrences, and studies. The attractiveness, affordability, and two-way communication capabilities empowered these mobile devices that support various social media platforms to be central to emergency communication as well. This makes a mobile-based emergency application an attractive communication tool during emergencies. The emergence of mobile-based emergency communication has intrigued us to learn about the user behavior related to the usage of these applications. Our study was mainly conducted on emergency apps in Nordic countries such as Finland, Sweden, and Norway. To understand the user objects regarding the usage of emergency mobile applications we leveraged various Natural Language Processing and Text Mining techniques. VADER sentiment tool was used to predict and track users’ review polarity of a particular application over time. Lately, to identify factors that affect users’ sentiments, we employed topic modeling techniques such as the Latent Dirichlet Allocation (LDA) model. This model identifies various themes discussed in the user reviews and the result of each theme will be represented by the weighted sum of words in the corpus. Even though LDA succeeds in highlighting the user-related factors, it fails to identify the aspects of the user, and the topic definition from the LDA model is vague. Hence we leveraged Aspect Based Sentiment Analysis (ABSA) methods to extract the user aspects from the user reviews. To perform this task we consider fine-tuning DeBERTa (a variant of the BERT model). BERT is a Bidirectional Encoder Representation of transformer architecture which allows the model to learn the context in the text. Following this, we performed a sentence pair sentiment classification task using different variants of BERT. Later, we dwell on different sentiments to highlight the factors and the categories that impact user behavior most by leveraging the Empath categorization technique. Finally, we construct a word association by considering different Ontological vocabularies related to mobile applications and emergency response and management systems. The insights from the study can be used to identify the user aspect terms, predict the sentiment of the aspect term in the review provided, and find how the aspect term impacts the user perspective on the usage of mobile emergency applications

    Modeling Anticipatory Event Transitions

    Get PDF

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Traitement automatique de rapports d’incidents et accidents : application Γ  la gestion du risque dans l’aviation civile

    Get PDF
    Π’oΠ·ΠΈ Ρ€Π΅Ρ„Π΅Ρ€Π°Ρ‚ описва ΠΏΡ€ΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΠ΅Ρ‚ΠΎ Π½Π° Π°Π²Ρ‚ΠΎΠΌΠ°Ρ‚ΠΈΡ‡Π½Π°Ρ‚Π° ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠ° Π½Π° СстСствСн Π΅Π·ΠΈΠΊ (ΠžΠ•Π•) Π² контСкста Π½Π° ΡƒΠΏΡ€Π°Π²Π»Π΅Π½ΠΈΠ΅Ρ‚ΠΎ Π½Π° риска Π² граТданското Π²ΡŠΠ·Π΄ΡƒΡ…ΠΎΠΏΠ»Π°Π²Π°Π½Π΅. Π’ Ρ‚Π°Π·ΠΈ област Π΄ΠΎΠΊΠ»Π°Π΄Π²Π°Π½Π΅Ρ‚ΠΎ Π½Π° ΠΈΠ½Ρ†ΠΈΠ΄Π΅Π½Ρ‚ΠΈ ΠΈ разслСдванСто Π½Π° ΠΏΡ€ΠΎΠΈΠ·ΡˆΠ΅ΡΡ‚Π²ΠΈΡ Π³Π΅Π½Π΅Ρ€ΠΈΡ€Π°Ρ‚ голямо количСство информация, Π³Π»Π°Π²Π½ΠΎ ΠΏΠΎΠ΄ Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Π° Π½Π° тСкстови описания Π½Π° Π½Π΅ΠΎΠ±ΠΈΡ‡Π°ΠΉΠ½ΠΈ ΡΡŠΠ±ΠΈΡ‚ΠΈΡ. На ΠΏΡŠΡ€Π²ΠΎ Π²Ρ€Π΅ΠΌΠ΅ описвамС Ρ€Π°Π»ΠΈΡ‡Π½ΠΈΡ‚Π΅ Ρ‚ΠΈΠΏΠΎΠ²Π΅ (тСкстови) Π΄Π°Π½Π½ΠΈ, ΠΊΠΎΠΈΡ‚ΠΎ ΡΠ΅ΠΊΡ‚ΠΎΡ€ΡŠΡ‚ ΠΏΡ€ΠΎΠΈΠ·Π²Π΅ΠΆΠ΄Π°. АнализирамС самитС Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΈ, ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΡ‚Π΅ Π·Π° ΡΡŠΡ…Ρ€Π°Π½ΡΠ²Π°Π½Π΅Ρ‚ΠΎ ΠΈΠΌ, ΠΊΠ°ΠΊ са ΠΎΡ€Π³Π°Π½ΠΈΠ·ΠΈΡ€Π°Π½ΠΈ, ΠΊΠ°ΠΊΡ‚ΠΎ ΠΈ Ρ‚Π΅Ρ…Π½ΠΈΡ‚Π΅ ΡƒΠΏΠΎΡ‚Ρ€Π΅Π±ΠΈ ΠΎΡ‚ Π΅ΠΊΠΏΠ΅Ρ€Ρ‚ΠΈ ΠΏΠΎ сигурността. ПоказвамС, Ρ‡Π΅ ΡΡŠΠ²Ρ€Π΅ΠΌΠ΅Π½ΠΈΡ‚Π΅ ΠΏΠ°Ρ€Π°Π΄ΠΈΠ³ΠΌΠΈ Π·Π° ΡΡŠΡ…Ρ€Π°Π½ΡΠ²Π°Π½Π΅ ΠΈ организация Π½Π΅ са Π΄ΠΎΠ±Ρ€Π΅ приспособСни към Ρ€Π΅Π°Π»Π½Π°Ρ‚Π° ΡƒΠΏΠΎΡ‚Ρ€Π΅Π±Π° Π½Π° Ρ‚ΠΎΠ·ΠΈ Ρ‚ΠΈΠΏ Π΄Π°Π½Π½ΠΈ ΠΈ установявамС ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΠ½ΠΈΡ‚Π΅ Π·ΠΎΠ½ΠΈ, Π² ΠΊΠΎΠΈΡ‚ΠΎ ΠžΠ•Π• Π΅ част ΠΎΡ‚ Ρ€Π΅ΡˆΠ΅Π½ΠΈΠ΅Ρ‚ΠΎ. Π”Π²Π΅ прилоТСния, отговарящи ΠΏΡ€Π΅Ρ†ΠΈΠ·Π½ΠΎ Π½Π° Π½ΡƒΠΆΠ΄ΠΈΡ‚Π΅ Π½Π° СкспСрти ΠΏΠΎ Π°Π²ΠΈΠ°Ρ†ΠΈΠΎΠ½Π½Π° сигурност, са ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚ΠΈΡ€Π°Π½ΠΈ: Π°Π²Ρ‚ΠΎΠΌΠ°Ρ‚ΠΈΡ‡Π½Π° класификация Π½Π° Π΄ΠΎΠΊΠ»Π°Π΄ΠΈ Π·Π° ΠΈΠ½Ρ†ΠΈΠ΄Π΅Π½Ρ‚ΠΈ ΠΈ систСма Π·Π° ΠΏΡ€ΠΎΡƒΡ‡Π²Π°Π½Π΅ Π½Π° Π½Π° ΠΊΠΎΠ»Π΅ΠΊΡ†ΠΈΠΈ, основаваща сС Π²ΡŠΡ€Ρ…Ρƒ тСкстовото сходство. Π’ΡŠΠ· основа Π½Π° наблюдСния Π½Π° Ρ€Π΅Π°Π»Π½Π°Ρ‚Π° ΡƒΠΏΠΎΡ‚Ρ€Π΅Π±Π° Π½Π° прилоТСнията, ΠΏΡ€Π΅Π΄Π»Π°Π³Π°ΠΌΠ΅ няколко ΠΌΠ΅Ρ‚ΠΎΠ΄Π° Π·Π° ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠ° Π½Π° Π΄ΠΎΠΊΠ»Π°Π΄ΠΈ Π·Π° ΠΈΠ½Ρ†ΠΈΠ΄Π΅Π½Ρ‚ΠΈ ΠΈ ΠΏΡ€ΠΎΠΈΠ·ΡˆΠ΅ΡΡ‚Π²ΠΈΡ ΠΈ обсъТдамС Π² Π΄ΡŠΠ»Π±ΠΎΡ‡ΠΈΠ½Π° ΠΊΠ°ΠΊ ΠžΠ•Π• ΠΌΠΎΠΆΠ΅ Π΄Π° бъдС ΠΏΡ€ΠΎΠ»ΠΎΠΆΠ΅Π½ΠΎ Π½Π° Ρ€Π°Π·Π»ΠΈΡ‡Π½ΠΈ Π½ΠΈΠ²Π° Π² ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½o-ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚Π²Π°Ρ‰ΠΈΡ‚Π΅ структури Π½Π° Π΅Π΄ΠΈΠ½ високорисков сСктор. ΠžΡ†Π΅Π½ΡΠ²Π°ΠΉΠΊΠΈ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΡ‚Π΅ ΠΏΠΎΠΊΠ°Π·Π²Π°ΠΌΠ΅, Ρ‡Π΅ трудноститС ΡΠ²ΡŠΡ€Π·Π°Π½ΠΈ с многоизмСрността ΠΈ измСнимостта Π½Π° Ρ‡ΠΎΠ²Π΅ΡˆΠΊΠΈΡ Π΅Π·ΠΈΠΊ ΠΌΠΎΠ³Π°Ρ‚ Π΄Π° Π±ΡŠΠ΄Π°Ρ‚ Π΅Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½ΠΎ адрСсирани ΠΈ ΠΏΡ€Π΅Π΄Π»Π°Π³Π°ΠΌΠ΅ Π½Π°Π΄Π΅ΠΆΠ΄Π½ΠΈ Π²ΡŠΠ·Ρ…ΠΎΠ΄ΡΡ‰ΠΈ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈ Π·Π° справянС със ΡΠ²Ρ€ΡŠΡ…ΠΈΠ·ΠΎΠ±ΠΈΠ»ΠΈΠ΅Ρ‚ΠΎ Π½Π° Π΄ΠΎΠΊΠ»Π°Π΄ΠΈ Π·Π° ΠΈΠ½Ρ†ΠΈΠ΄Π΅Π½Ρ‚ΠΈ Π² тСкстови Ρ„ΠΎΡ€ΠΌΠ°Ρ‚This thesis describes the applications of natural language processing (NLP) to industrial risk management. We focus on the domain of civil aviation, where incident reporting and accident investigations produce vast amounts of information, mostly in the form of textual accounts of abnormal events, and where efficient access to the information contained in the reports is required. We start by drawing a panorama of the different types of data produced in this particular domain. We analyse the documents themselves, how they are stored and organised as well as how they are used within the community. We show that the current storage and organisation paradigms are not well adapted to the data analysis requirements, and we identify the problematic areas, for which NLP technologies are part of the solution. Specifically addressing the needs of aviation safety professionals, two initial solutions are implemented: automatic classification for assisting in the coding of reports within existing taxonomies and a system based on textual similarity for exploring collections of reports. Based on the observation of real-world tool usage and on user feedback, we propose different methods and approaches for processing incident and accident reports and comprehensively discuss how NLP can be applied within the safety information processing framework of a high-risk sector. By deploying and evaluating certain approaches, we show how elusive aspects related to the variability and multidimensionality of language can be addressed in a practical manner and we propose bottom-up methods for managing the overabundance of textual feedback dataCette thΓ¨se dΓ©crit les applications du traitement automatique des langues (TAL) Γ  la gestion des risques industriels. Elle se concentre sur le domaine de l'aviation civile, oΓΉ le retour d'expΓ©rience (REX) gΓ©nΓ¨re de grandes quantitΓ©s de donnΓ©es, sous la forme de rapports d'accidents et d'incidents. Nous commenΓ§ons par faire un panorama des diffΓ©rentes types de donnΓ©es gΓ©nΓ©rΓ©es dans ce secteur d'activitΓ©. Nous analysons les documents, comment ils sont produits, collectΓ©s, stockΓ©s et organisΓ©s ainsi que leurs utilisations. Nous montrons que le paradigme actuel de stockage et d’organisation est mal adaptΓ© Γ  l’utilisation rΓ©elle de ces documents et identifions des domaines problΓ©matiques ou les technologies du langage constituent une partie de la solution. RΓ©pondant prΓ©cisΓ©ment aux besoins d'experts en sΓ©curitΓ©, deux solutions initiales sont implΓ©mentΓ©es : la catΓ©gorisation automatique de documents afin d'aider le codage des rapports dans des taxonomies prΓ©existantes et un outil pour l'exploration de collections de rapports, basΓ© sur la similaritΓ© textuelle. En nous basant sur des observations de l'usage de ces outils et sur les retours de leurs utilisateurs, nous proposons diffΓ©rentes mΓ©thodes d'analyse des textes issus du REX et discutons des maniΓ¨res dont le TAL peut Γͺtre appliquΓ© dans le cadre de la gestion de la sΓ©curitΓ© dans un secteur Γ  haut risque. En dΓ©ployant et Γ©valuant certaines solutions, nous montrons que mΓͺme des aspects subtils liΓ©s Γ  la variation et Γ  la multidimensionnalitΓ© du langage peuvent Γͺtre traitΓ©s en pratique afin de gΓ©rer la surabondance de donnΓ©es REX textuelles de maniΓ¨re ascendant

    Unsupervised and knowledge-poor approaches to sentiment analysis

    Get PDF
    Sentiment analysis focuses upon automatic classiffication of a document's sentiment (and more generally extraction of opinion from text). Ways of expressing sentiment have been shown to be dependent on what a document is about (domain-dependency). This complicates supervised methods for sentiment analysis which rely on extensive use of training data or linguistic resources that are usually either domain-specific or generic. Both kinds of resources prevent classiffiers from performing well across a range of domains, as this requires appropriate in-domain (domain-specific) data. This thesis presents a novel unsupervised, knowledge-poor approach to sentiment analysis aimed at creating a domain-independent and multilingual sentiment analysis system. The approach extracts domain-specific resources from documents that are to be processed, and uses them for sentiment analysis. This approach does not require any training corpora, large sets of rules or generic sentiment lexicons, which makes it domain- and languageindependent but at the same time able to utilise domain- and language-specific information. The thesis describes and tests the approach, which is applied to diffeerent data, including customer reviews of various types of products, reviews of films and books, and news items; and to four languages: Chinese, English, Russian and Japanese. The approach is applied not only to binary sentiment classiffication, but also to three-way sentiment classiffication (positive, negative and neutral), subjectivity classifiation of documents and sentences, and to the extraction of opinion holders and opinion targets. Experimental results suggest that the approach is often a viable alternative to supervised systems, especially when applied to large document collections

    EDM 2011: 4th international conference on educational data mining : Eindhoven, July 6-8, 2011 : proceedings

    Get PDF

    Graph-Based Conversation Analysis in Social Media

    Get PDF
    Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on user relationships or on the shared content, while ignoring the valuable information hidden in the digital conversations, in terms of structure of the discussion and relation between contents, which is essential for understanding online communication behavior. This work proposes a graph-based framework to assess the shape and structure of online conversations. The analysis was composed of two main stages: intent analysis and network generation. Users' intention was detected using keyword-based classification, followed by the implementation of machine learning-based classification algorithms for uncategorized comments. Afterwards, human-in-the-loop was involved in improving the keyword-based classification. To extract essential information on social media communication patterns among the users, we built conversation graphs using a directed multigraph network and we show our model at work in two real-life experiments. The first experiment used data from a real social media challenge and it was able to categorize 90% of comments with 98% accuracy. The second experiment focused on COVID vaccine-related discussions in online forums and investigated the stance and sentiment to understand how the comments are affected by their parent discussion. Finally, the most popular online discussion patterns were mined and interpreted. We see that the dynamics obtained from conversation graphs are similar to traditional communication activities

    Enabling parallelism and optimizations in data mining algorithms for power-law data

    Get PDF
    Today's data mining tasks aim to extract meaningful information from a large amount of data in a reasonable time mainly via means of --- a) algorithmic advances, such as fast approximate algorithms and efficient learning algorithms, and b) architectural advances, such as machines with massive compute capacity involving distributed multi-core processors and high throughput accelerators. For current and future generation processors, parallel algorithms are critical for fully utilizing computing resources. Furthermore, exploiting data properties for performance gain becomes crucial for data mining applications. In this work, we focus our attention on power-law behavior –-- a common property found in a large class of data, such as text data, internet traffic, and click-stream data. Specifically, we address the following questions in the context of power-law data: How well do the critical data mining algorithms of current interest fit with today's parallel architectures? Which algorithmic and mapping opportunities can be leveraged to further improve performance?, and What are the relative challenges and gains for such approaches? Specifically, we first investigate the suitability of the "frequency estimation" problem for GPU-scale parallelism. Sketching algorithms are a popular choice for this task due to their desirable trade-off between estimation accuracy and space-time efficiency. However, most of the past work on sketch-based frequency estimation focused on CPU implementations. In our work, we propose a novel approach for sketches, which exploits the natural skewness in the power-law data to efficiently utilize the massive amounts of parallelism in modern GPUs. Next, we explore the problem of "identifying top-K frequent elements" for distributed data streams on modern distributed settings with both multi-core and multi-node CPU parallelism. Sketch-based approaches, such as Count-Min Sketch (CMS) with top-K heap, have an excellent update time but lacks the important property of reducibility, which is needed for exploiting data parallelism. On the other end, the popular Frequent Algorithm (FA) leads to reducible summaries, but its update costs are high. Our approach Topkapi, gives the best of both worlds, i.e., it is reducible like FA and has an efficient update time similar to CMS. For power-law data, Topkapi possesses strong theoretical guarantees and leads to significant performance gains, relative to past work. Finally, we study Word2Vec, a popular word embedding method widely used in Machine learning and Natural Language Processing applications, such as machine translation, sentiment analysis, and query answering. This time, we target Single Instruction Multiple Data (SIMD) parallelism. With the increasing vector lengths in commodity CPUs, such as AVX-512 with a vector length of 512 bits, efficient vector processing unit utilization becomes a major performance game-changer. By employing a static multi-version code generation strategy coupled with an algorithmic approximation based on the power-law frequency distribution of words, we achieve significant reductions in training time relative to the state-of-the-art.Ph.D
    • …
    corecore