141 research outputs found

    Analyzing and Visualizing Twitter Streams based on Trending Hashtags

    Get PDF

    Multi-task Pairwise Neural Ranking for Hashtag Segmentation

    Full text link
    Hashtags are often employed on social media and beyond to add metadata to a textual utterance with the goal of increasing discoverability, aiding search, or providing additional semantics. However, the semantic content of hashtags is not straightforward to infer as these represent ad-hoc conventions which frequently include multiple words joined together and can include abbreviations and unorthodox spellings. We build a dataset of 12,594 hashtags split into individual segments and propose a set of approaches for hashtag segmentation by framing it as a pairwise ranking problem between candidate segmentations. Our novel neural approaches demonstrate 24.6% error reduction in hashtag segmentation accuracy compared to the current state-of-the-art method. Finally, we demonstrate that a deeper understanding of hashtag semantics obtained through segmentation is useful for downstream applications such as sentiment analysis, for which we achieved a 2.6% increase in average recall on the SemEval 2017 sentiment analysis dataset.Comment: 12 pages, ACL 201

    Artificial Intelligence, Social Media and Supply Chain Management: The Way Forward

    Get PDF
    Supply chain management (SCM) is a complex network of multiple entities ranging from business partners to end consumers. These stakeholders frequently use social media platforms, such as Twitter and Facebook, to voice their opinions and concerns. AI-based applications, such as sentiment analysis, allow us to extract relevant information from these deliberations. We argue that the context-specific application of AI, compared to generic approaches, is more efficient in retrieving meaningful insights from social media data for SCM. We present a conceptual overview of prevalent techniques and available resources for information extraction. Subsequently, we have identified specific areas of SCM where context-aware sentiment analysis can enhance the overall efficiency

    Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

    Full text link
    Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNN-CRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.Comment: The 14th International Conference on Knowledge and Systems Engineering (KSE 2022

    Patterns and Variation in English Language Discourse

    Get PDF
    The publication is reviewed post-conference proceedings from the international 9th Brno Conference on Linguistics Studies in English, held on 16–17 September 2021 and organised by the Faculty of Education, Masaryk University in Brno. The papers revolve around the themes of patterns and variation in specialised discourses (namely the media, academic, business, tourism, educational and learner discourses), effective interaction between the addressor and addressees and the current trends and development in specialised discourses. The principal methodological perspectives are the comparative approach involving discourses in English and another language, critical and corpus analysis, as well as identification of pragmatic strategies and appropriate rhetorical means. The authors of papers are researchers from the Czech Republic, Italy, Luxembourg, Serbia and Georgia

    Automatic stance detection on political discourse in Twitter

    Get PDF
    The majority of opinion mining tasks in natural language processing (NLP) have been focused on sentiment analysis of texts about products and services while there is comparatively less research on automatic detection of political opinion. Almost all previous research work has been done for English, while this thesis is focused on the automatic detection of stance (whether he or she is favorable or not towards important political topic) from Twitter posts in Catalan, Spanish and English. The main objective of this work is to build and compare automatic stance detection systems using supervised both classic machine and deep learning techniques. We also study the influence of text normalization and perform experiments with differentt methods for word representations such as TF-IDF measures for unigrams, word embeddings, tweet embeddings, and contextual character-based embeddings. We obtain state-of-the-art results in the stance detection task on the IberEval 2018 dataset. Our research shows that text normalization and feature selection is important for the systems with unigram features, and does not affect the performance when working with word vector representations. Classic methods such as unigrams and SVM classifier still outperform deep learning techniques, but seem to be prone to overfitting. The classifiers trained using word vector representations and the neural network models encoded with contextual character-based vectors show greater robustness

    Automatic stance detection on political discourse in Twitter

    Get PDF
    The majority of opinion mining tasks in natural language processing (NLP) have been focused on sentiment analysis of texts about products and services while there is comparatively less research on automatic detection of political opinion. Almost all previous research work has been done for English, while this thesis is focused on the automatic detection of stance (whether he or she is favorable or not towards important political topic) from Twitter posts in Catalan, Spanish and English. The main objective of this work is to build and compare automatic stance detection systems using supervised both classic machine and deep learning techniques. We also study the influence of text normalization and perform experiments with differentt methods for word representations such as TF-IDF measures for unigrams, word embeddings, tweet embeddings, and contextual character-based embeddings. We obtain state-of-the-art results in the stance detection task on the IberEval 2018 dataset. Our research shows that text normalization and feature selection is important for the systems with unigram features, and does not affect the performance when working with word vector representations. Classic methods such as unigrams and SVM classifier still outperform deep learning techniques, but seem to be prone to overfitting. The classifiers trained using word vector representations and the neural network models encoded with contextual character-based vectors show greater robustness

    Humor Detection

    Get PDF
    Humor is a very complex characteristic concept that defines us as human beings and social entities. Humor is an essential component in personal communication. How to create a method or model to discover the structures behind humor, recognize humor and even extraction of humor remains a challenge because of its subjective nature. Humor also provides valuable information related to linguistic, psychological, neurological and sociological phenomena. However, because of its complexity, humor is still an undefined phenomenon. Because the reaction that make people laugh can hardly be generalized or formalized. For instance, cognitive aspects as well as cultural knowledge, are some of the multi-factorial variables that should be analyzed in order to understand humor\u27s properties. Although it is impossible to understand universal humor characteristics, one can still capture the possible latent structures behind humor. In my work, I will try to uncover several latent semantic structures behind humor, in terms of meaning incongruity, ambiguity, phonetic style and personal affect. In addition to humor recognition, identifying anchors, or which words prompt humor in a sentence, is essential in understanding the phenomenon of humor in language. Proposed technique is created using the concepts of linguistics and it has significant accuracy of over 70+% compared to 23.06% of Word Index power method
    corecore