323,911 research outputs found
A meta-analysis of state-of-the-art electoral prediction from Twitter data
Electoral prediction from Twitter data is an appealing research topic. It
seems relatively straightforward and the prevailing view is overly optimistic.
This is problematic because while simple approaches are assumed to be good
enough, core problems are not addressed. Thus, this paper aims to (1) provide a
balanced and critical review of the state of the art; (2) cast light on the
presume predictive power of Twitter data; and (3) depict a roadmap to push
forward the field. Hence, a scheme to characterize Twitter prediction methods
is proposed. It covers every aspect from data collection to performance
evaluation, through data processing and vote inference. Using that scheme,
prior research is analyzed and organized to explain the main approaches taken
up to date but also their weaknesses. This is the first meta-analysis of the
whole body of research regarding electoral prediction from Twitter data. It
reveals that its presumed predictive power regarding electoral prediction has
been rather exaggerated: although social media may provide a glimpse on
electoral outcomes current research does not provide strong evidence to support
it can replace traditional polls. Finally, future lines of research along with
a set of requirements they must fulfill are provided.Comment: 19 pages, 3 table
Data mining for assessing the credit risk of local government units in Croatia
Over the past few decades, data mining techniques, especially artificial neural networks, have been used for modelling many real-world problems. This paper aims to test the performance of three methods: (1) an artificial neural network (ANN), (2) a hybrid artificial neural network and genetic algorithm approach (ANN-GA), and (2) the Tobit regression approach in determining the credit risk of local government units in Croatia. The evaluation of credit risk and prediction of debtor bankruptcy have long been regarded as an important topic in accounting and finance literature. In this research, credit risk is modelled under a regression approach unlike typical credit risk analysis, which is generally viewed as a classification problem. Namely, a standard evaluation of credit risk is not possible due to a lack of bankruptcy data. Thus, the credit risk of a local unit is approximated using the ratio of outstanding liabilities maturing in a given year to total expenditure of the local unit in the same period. The results indicate that the ANN-GA hybrid approach performs significantly better than the Tobit model by providing a significantly smaller average mean squared error. This work is beneficial to researchers and the government in evaluating a local government unit’s credit score
Generating Video Descriptions with Topic Guidance
Generating video descriptions in natural language (a.k.a. video captioning)
is a more challenging task than image captioning as the videos are
intrinsically more complicated than images in two aspects. First, videos cover
a broader range of topics, such as news, music, sports and so on. Second,
multiple topics could coexist in the same video. In this paper, we propose a
novel caption model, topic-guided model (TGM), to generate topic-oriented
descriptions for videos in the wild via exploiting topic information. In
addition to predefined topics, i.e., category tags crawled from the web, we
also mine topics in a data-driven way based on training captions by an
unsupervised topic mining model. We show that data-driven topics reflect a
better topic schema than the predefined topics. As for testing video topic
prediction, we treat the topic mining model as teacher to train the student,
the topic prediction model, by utilizing the full multi-modalities in the video
especially the speech modality. We propose a series of caption models to
exploit topic guidance, including implicitly using the topics as input features
to generate words related to the topic and explicitly modifying the weights in
the decoder with topics to function as an ensemble of topic-aware language
decoders. Our comprehensive experimental results on the current largest video
caption dataset MSR-VTT prove the effectiveness of our topic-guided model,
which significantly surpasses the winning performance in the 2016 MSR video to
language challenge.Comment: Appeared at ICMR 201
Latent Space Model for Multi-Modal Social Data
With the emergence of social networking services, researchers enjoy the
increasing availability of large-scale heterogenous datasets capturing online
user interactions and behaviors. Traditional analysis of techno-social systems
data has focused mainly on describing either the dynamics of social
interactions, or the attributes and behaviors of the users. However,
overwhelming empirical evidence suggests that the two dimensions affect one
another, and therefore they should be jointly modeled and analyzed in a
multi-modal framework. The benefits of such an approach include the ability to
build better predictive models, leveraging social network information as well
as user behavioral signals. To this purpose, here we propose the Constrained
Latent Space Model (CLSM), a generalized framework that combines Mixed
Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA)
incorporating a constraint that forces the latent space to concurrently
describe the multiple data modalities. We derive an efficient inference
algorithm based on Variational Expectation Maximization that has a
computational cost linear in the size of the network, thus making it feasible
to analyze massive social datasets. We validate the proposed framework on two
problems: prediction of social interactions from user attributes and behaviors,
and behavior prediction exploiting network information. We perform experiments
with a variety of multi-modal social systems, spanning location-based social
networks (Gowalla), social media services (Instagram, Orkut), e-commerce and
review sites (Amazon, Ciao), and finally citation networks (Cora). The results
indicate significant improvement in prediction accuracy over state of the art
methods, and demonstrate the flexibility of the proposed approach for
addressing a variety of different learning problems commonly occurring with
multi-modal social data.Comment: 12 pages, 7 figures, 2 table
Video Captioning with Guidance of Multimodal Latent Topics
The topic diversity of open-domain videos leads to various vocabularies and
linguistic expressions in describing video contents, and therefore, makes the
video captioning task even more challenging. In this paper, we propose an
unified caption framework, M&M TGM, which mines multimodal topics in
unsupervised fashion from data and guides the caption decoder with these
topics. Compared to pre-defined topics, the mined multimodal topics are more
semantically and visually coherent and can reflect the topic distribution of
videos better. We formulate the topic-aware caption generation as a multi-task
learning problem, in which we add a parallel task, topic prediction, in
addition to the caption task. For the topic prediction task, we use the mined
topics as the teacher to train a student topic prediction model, which learns
to predict the latent topics from multimodal contents of videos. The topic
prediction provides intermediate supervision to the learning process. As for
the caption task, we propose a novel topic-aware decoder to generate more
accurate and detailed video descriptions with the guidance from latent topics.
The entire learning procedure is end-to-end and it optimizes both tasks
simultaneously. The results from extensive experiments conducted on the MSR-VTT
and Youtube2Text datasets demonstrate the effectiveness of our proposed model.
M&M TGM not only outperforms prior state-of-the-art methods on multiple
evaluation metrics and on both benchmark datasets, but also achieves better
generalization ability.Comment: ACM Multimedia 201
Comprehensive Review of Opinion Summarization
The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe
- …