Search CORE

308 research outputs found

Topical co-attention networks for hashtag recommendation on microblogs

Author: HU Jingwen
JIANG Jing
LI Yang
LIU Ting
Publication venue: 'Elsevier BV'
Publication date: 01/02/2019
Field of study

Institutional Knowledge at Singapore Management University

Data Sets: Word Embeddings Learned from Tweets and General Data

Author: Li Quanzhi
Liu Xiaomo
Nourbakhsh Armineh
Shah Sameena
Publication venue
Publication date: 03/05/2017
Field of study

A word embedding is a low-dimensional, dense and real- valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually gener- ated from a large text corpus. The embedding of a word cap- tures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, it is necessary to have word embeddings learned specifically from tweets. In this paper, we present ten word embedding data sets. In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets with the general data. The general data consist of news articles, Wikipedia data and other web data. These ten embedding models were learned from about 400 million tweets and 7 billion words from the general text. In this paper, we also present two experiments demonstrating how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification tasks

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Improving Marketing Intelligence Using Online User-Generated Contents

Author: Jiang Jinling
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2019
Field of study

VBN

Stance detection on social media: State of the art and trends

Author: Aker
Al-Ayyoub
Aldayel
Aldayel
Allaway
Allcott
Augenstein
Baird
Banegas
Bar-Haim
Barberá
Bassiouney
Beigman Klebanov
Belkaroui
Benamara
Bessi
Biber
Borge-Holthoefer
Borges
Bucholtz
Chauhan
Cignarella
Conforti
Cramér
Darwish
Darwish
Darwish
Demszky
Dong
Dori-Hacohen
Du Bois
Ebrahimi
Ferreira
Ferreira
Fraisier
Fuchs
Garimella
Garimella
Gautam
Ghosh
Gottipati
Graells-Garrido
Grcar
Gu
Gu
Hanawa
Himelboim
Jaffe
Jang
Jurafsky
Küçük
Lahoti
Lai
Lai
Lai
Li
Li
Liebetrau
Lin
Ma
Magdy
McKendrick
Mohammad
Mohammad
Mohammad
Mohtarami
Murakami
Newman
Pang
Pennacchiotti
Qazvinian
Qiu
Rajadesingan
Sen
Shu
Siddiqua
Siddiqua
Siddiqua
Simaki
Simaki
Singh
Sobhani
Sobhani
Sobhani
Somasundaran
Stefanov
Sun
Tanaka
Taulé
Thonet
Trabelsi
Walker
Walker
Wang
Weber
Wei
Wei
Xi
Zhang
Zhang
Zhou
Zhu
Zubiaga
Zubiaga
Publication venue: 'Elsevier BV'
Publication date: 24/02/2021
Field of study

Stance detection on social media is an emerging opinion mining paradigm for various social and political applications in which sentiment analysis may be sub-optimal. There has been a growing research interest for developing effective methods for stance detection methods varying among multiple communities including natural language processing, web science, and social computing. This paper surveys the work on stance detection within those communities and situates its usage within current opinion mining techniques in social media. It presents an exhaustive review of stance detection techniques on social media, including the task definition, different types of targets in stance detection, features set used, and various machine learning approaches applied. The survey reports state-of-the-art results on the existing benchmark datasets on stance detection, and discusses the most effective approaches. In addition, this study explores the emerging trends and different applications of stance detection on social media. The study concludes by discussing the gaps in the current existing research and highlights the possible future directions for stance detection on social media.Comment: We request withdrawal of this article sincerely. We will re-edit this paper. Please withdraw this article before we finish the new versio

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

A Semantic and Syntactic Similarity Measure for Political Tweets

Author: Crockett Keeley
Edmonds Bruce
Little Claire
McLean David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Measurement of the semantic and syntactic similarity of human utterances is essential in allowing machines to understand dialogue with users. However, human language is complex, and the semantic meaning of an utterance is usually dependent upon the context at a given time and learnt experience of the meaning of the words that are used. This is particularly challenging when automatically understanding the meaning of social media, such as tweets, which can contain non-standard language. Short Text Semantic Similarity measures can be adapted to measure the degree of similarity of a pair of tweets. This work presents a new Semantic and Syntactic Similarity Measure (TSSSM) for political tweets. The approach uses word embeddings to determine semantic similarity and extracts syntactic features to overcome the limitations of current measures which may miss identical sequences of words. A large dataset of tweets focusing on the political domain were collected, pre-processed and used to train the word embedding model, with various experiments performed to determine the optimal model and parameters. A selection of tweet pairs were evaluated by humans for semantic equivalence and correlated against the measure. The new measure can be used in a variety of applications, including for identifying and analyzing political narratives. Experiments on three diverse human-labelled test datasets demonstrate that the measure outperforms an existing measure, performs well on tweets from the political domain and may also generalize outside the political domain

Crossref

E-space: Manchester Metropolitan University's Research Repository

Knowledge extraction from unstructured data

Author: Sakor Ahmad
Publication venue: Hannover : Institutionelles Repositorium der Leibniz Universität Hannover
Publication date: 01/01/2023
Field of study

Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models

Institutionelles Repositorium der Leibniz Universität Hannover