Search CORE

141 research outputs found

Analyzing and Visualizing Twitter Streams based on Trending Hashtags

Author: Kaschura Manuel
Publication venue: Karlsruher Institut für Technologie
Publication date: 22/12/2020
Field of study

Multi-task Pairwise Neural Ranking for Hashtag Segmentation

Author: Maddela Mounica
Preoţiuc-Pietro Daniel
Xu Wei
Publication venue
Publication date: 01/01/2019
Field of study

Hashtags are often employed on social media and beyond to add metadata to a textual utterance with the goal of increasing discoverability, aiding search, or providing additional semantics. However, the semantic content of hashtags is not straightforward to infer as these represent ad-hoc conventions which frequently include multiple words joined together and can include abbreviations and unorthodox spellings. We build a dataset of 12,594 hashtags split into individual segments and propose a set of approaches for hashtag segmentation by framing it as a pairwise ranking problem between candidate segmentations. Our novel neural approaches demonstrate 24.6% error reduction in hashtag segmentation accuracy compared to the current state-of-the-art method. Finally, we demonstrate that a deeper understanding of hashtag semantics obtained through segmentation is useful for downstream applications such as sentiment analysis, for which we achieved a 2.6% increase in average recall on the SemEval 2017 sentiment analysis dataset.Comment: 12 pages, ACL 201

arXiv.org e-Print Archive

Crossref

Artificial Intelligence, Social Media and Supply Chain Management: The Way Forward

Author: Cambria Erik
Chi Xu
Khatua Apalak
Khatua Aparup
Publication venue: Basel : MDPI
Publication date: 01/01/2021
Field of study

Supply chain management (SCM) is a complex network of multiple entities ranging from business partners to end consumers. These stakeholders frequently use social media platforms, such as Twitter and Facebook, to voice their opinions and concerns. AI-based applications, such as sentiment analysis, allow us to extract relevant information from these deliberations. We argue that the context-specific application of AI, compared to generic approaches, is more efficient in retrieving meaningful insights from social media data for SCM. We present a conceptual overview of prevalent techniques and available resources for information extraction. Subsequently, we have identified specific areas of SCM where context-aware sentiment analysis can enhance the overall efficiency

Directory of Open Access Journals

Institutionelles Repositorium der Leibniz Universität Hannover

TA-COS 2018 : 2nd Workshop on Text Analytics for Cybersecurity and Online Safety : Proceedings

Author: De Pauw Guy
Desmet Bart
Lefever Els
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

Author: Dang Huu-Tien
Phan Xuan-Hieu
Vuong Thi-Hai-Yen
Publication venue
Publication date: 07/09/2022
Field of study

Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNN-CRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.Comment: The 14th International Conference on Knowledge and Systems Engineering (KSE 2022

arXiv.org e-Print Archive

Patterns and Variation in English Language Discourse

Author: Gumbaridze Zhuzhuna
Gvarishvili Zeinab
Lovrits Veronika
Malá Markéta
Marcella Vanessa
Mazmishvili Nana
Radovanović Aleksandra
Sládková Věra
Trnová Michaela
Tárnyiková Jarmila
Publication venue: 'Masaryk University Press'
Publication date: 10/02/2023
Field of study

The publication is reviewed post-conference proceedings from the international 9th Brno Conference on Linguistics Studies in English, held on 16–17 September 2021 and organised by the Faculty of Education, Masaryk University in Brno. The papers revolve around the themes of patterns and variation in specialised discourses (namely the media, academic, business, tourism, educational and learner discourses), effective interaction between the addressor and addressees and the current trends and development in specialised discourses. The principal methodological perspectives are the comparative approach involving discourses in English and another language, critical and corpus analysis, as well as identification of pragmatic strategies and appropriate rhetorical means. The authors of papers are researchers from the Czech Republic, Italy, Luxembourg, Serbia and Georgia

Directory of Open Access Books (DOAB)

Automatic stance detection on political discourse in Twitter

Author: Zotova Elena
Publication venue
Publication date: 23/09/2019
Field of study

The majority of opinion mining tasks in natural language processing (NLP) have been focused on sentiment analysis of texts about products and services while there is comparatively less research on automatic detection of political opinion. Almost all previous research work has been done for English, while this thesis is focused on the automatic detection of stance (whether he or she is favorable or not towards important political topic) from Twitter posts in Catalan, Spanish and English. The main objective of this work is to build and compare automatic stance detection systems using supervised both classic machine and deep learning techniques. We also study the influence of text normalization and perform experiments with differentt methods for word representations such as TF-IDF measures for unigrams, word embeddings, tweet embeddings, and contextual character-based embeddings. We obtain state-of-the-art results in the stance detection task on the IberEval 2018 dataset. Our research shows that text normalization and feature selection is important for the systems with unigram features, and does not affect the performance when working with word vector representations. Classic methods such as unigrams and SVM classifier still outperform deep learning techniques, but seem to be prone to overfitting. The classifiers trained using word vector representations and the neural network models encoded with contextual character-based vectors show greater robustness

Archivo Digital para la Docencia y la Investigación

Automatic stance detection on political discourse in Twitter

Author: Zotova Elena
Publication venue
Publication date: 01/01/2019
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Humor Detection

Author: Jain Manan
Publication venue: Digital Commons at Harrisburg University
Publication date: 01/08/2017
Field of study

Humor is a very complex characteristic concept that defines us as human beings and social entities. Humor is an essential component in personal communication. How to create a method or model to discover the structures behind humor, recognize humor and even extraction of humor remains a challenge because of its subjective nature. Humor also provides valuable information related to linguistic, psychological, neurological and sociological phenomena. However, because of its complexity, humor is still an undefined phenomenon. Because the reaction that make people laugh can hardly be generalized or formalized. For instance, cognitive aspects as well as cultural knowledge, are some of the multi-factorial variables that should be analyzed in order to understand humor\u27s properties. Although it is impossible to understand universal humor characteristics, one can still capture the possible latent structures behind humor. In my work, I will try to uncover several latent semantic structures behind humor, in terms of meaning incongruity, ambiguity, phonetic style and personal affect. In addition to humor recognition, identifying anchors, or which words prompt humor in a sentence, is essential in understanding the phenomenon of humor in language. Proposed technique is created using the concepts of linguistics and it has significant accuracy of over 70+% compared to 23.06% of Word Index power method

Digital Commons @ Harrisburg University of Science and Technology