Search CORE

150 research outputs found

Tibetan Microblog Emotional Analysis Based on Sequential Model in Online Social Platforms

Author: Huili Zhang
Lirong Qiu
Qiumei Pu
Zhen Zhang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Detecting Traffic Information From Social Media Texts With Deep Learning Approaches

Author: Chen Yuanyuan
Li Lingxi
Lv Yisheng
Wang Fei-Yue
Wang Xiao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2018
Field of study

Mining traffic-relevant information from social media data has become an emerging topic due to the real-time and ubiquitous features of social media. In this paper, we focus on a specific problem in social media mining which is to extract traffic relevant microblogs from Sina Weibo, a Chinese microblogging platform. It is transformed into a machine learning problem of short text classification. First, we apply the continuous bag-of-word model to learn word embedding representations based on a data set of three billion microblogs. Compared to the traditional one-hot vector representation of words, word embedding can capture semantic similarity between words and has been proved effective in natural language processing tasks. Next, we propose using convolutional neural networks (CNNs), long short-term memory (LSTM) models and their combination LSTM-CNN to extract traffic relevant microblogs with the learned word embeddings as inputs. We compare the proposed methods with competitive approaches, including the support vector machine (SVM) model based on a bag of n-gram features, the SVM model based on word vector features, and the multi-layer perceptron model based on word vector features. Experiments show the effectiveness of the proposed deep learning approaches

IUPUIScholarWorks

Rumor Identification with Maximum Entropy in MicroNet

Author: Fengming Liu
Mingcai Li
Suisheng Yu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Crossref

Hashtag biased ranking for keyword extraction from microblog posts

Author: Li L
Su C
Sun Y
Xiong S
Xu G
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

© Springer International Publishing Switzerland 2015. Nowadays, a huge amount of text is being generated for social networking purpose on the Web. Keyword extraction from such text benefit many applications such as advertising, search, and content filtering. Recent studies show that graph based ranking is more effective than traditional term or document frequecy based approaches. However, most work in the literature constructs word to word graph within a document or a collection of documents before applying a kind of random walk. Such a graph does not consider the influence of document importance on keyword extraction. Moreover, social text like a microblog post usually has speical social features such as hashtag and so on, which can help us understand its topic. In this paper, we propose hashtag biased ranking for keyword extraction from a collection of microblog posts. We first build a word-post weighted graph by taking into account the posts themselves. Then, a hashtag biased random walk is applied on this graph, which guides our approach to extract keywords according to the hashtag topic. Last, the final ranking of a word is determined by the stationary probability after a number of interations. We evaluate our proposed method on a real Chinese microblog posts. Experiments show that our method is more effective than the traditional word to word graph based ranking in terms of precision

OPUS - University of Technology Sydney

Crowdsourcing High-Quality Parallel Data Extraction from Twitter *

Author
Publication venue
Publication date: 06/03/2020
Field of study

Abstract High-quality parallel data is crucial for a range of multilingual applications, from tuning and evaluating machine translation systems to cross-lingual annotation projection. Unfortunately, automatically obtained parallel data (which is available in relative abundance) tends to be quite noisy. To obtain high-quality parallel data, we introduce a crowdsourcing paradigm in which workers with only basic bilingual proficiency identify translations from an automatically extracted corpus of parallel microblog messages. For less than $350, we obtained over 5000 parallel segments in five language pairs. Evaluated against expert annotations, the quality of the crowdsourced corpus is significantly better than existing automatic methods: it obtains an performance comparable to expert annotations when used in MERT tuning of a microblog MT system; and training a parallel sentence classifier with it leads also to improved results. The crowdsourced corpora will be made available i

CiteSeerX

ADDRESSING INFORMALITY IN PROCESSING CHINESE MICROTEXT

Author: WANG AOBO
Publication venue
Publication date: 08/01/2015
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Mining Event-Oriented Topics in Microblog Stream with Unsupervised Multi-View Hierarchical Embedding

Author: Li X
Peng M
Tian G
Wang Hua
Zhang X
Zhang Y
Zhu J
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

This article presents an unsupervised multi-view hierarchical embedding (UMHE) framework to sufficiently reveal the intrinsic topical knowledge in social events. Event-oriented topics are highly related to such events as it can provide explicit descriptions of what have happened in social community. In many real-world cases, however, it is difficult to include all attributes of microblogs, more often, textual aspects only are available. Traditional topic modelling methods have failed to generate event-oriented topics with the textual aspects, since the inherent relations between topics are often overlooked in these methods. Meanwhile, the metrics in original word vocabulary space might not effectively capture semantic distances. Our UMHE framework overcomes the severe information deficiency and poor feature representation. The UMHE first develops a multi-view Bayesian rose tree to preliminarily generate prior knowledge for latent topics and their relations. With such prior knowledge, we design an unsupervised translation-based hierarchical embedding method to make a better representation of these latent topics. By applying self-adaptive spectral clustering on the embedding space and the original space concomitantly, we eventually extract event-oriented topics in word distributions to express social events. Our framework is purely data-driven and unsupervised, without any external knowledge. Experimental results on TREC Tweets2011 dataset and Sina Weibo dataset demonstrate that the UMHE framework can construct hierarchical structure with high fitness, but also yield topic embeddings with salient semantics; therefore, it can derive event-oriented topics with meaningful descriptions

Crossref

RMIT Research Repository

Victoria University Eprints Repository

ANALYZING IMAGE TWEETS IN MICROBLOGS

Author: CHEN TAO
Publication venue
Publication date: 22/01/2016
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS