Search CORE

2,324 research outputs found

Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

Author: A Halavais
A Ishii
A Spoerri
A Spoerri
Attila Szolnoki
B Suh
C Castillo
CA Hidalgo
G Eysenbach
HS Moat
J Bollen
J Ginsberg
J Ratkiewicz
J Török
János Kertész
Márton Mestyán
R Kimmons
R Sharda
RK Pan
S Saavedra
S Sinha
S Sreenivasan
T Brody
T Holloway
T Preis
T Preis
T Yasseri
T Yasseri
T Yasseri
T Yasseri
Taha Yasseri
X Shuai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Aaltodoc Publication Archive

Oxford University Research Archive

FigShare

Keyword Assisted Topic Models

Author: Eshima Shusei
Imai Kosuke
Sasaki Tomoya
Publication venue
Publication date: 13/04/2020
Field of study

For a long time, many social scientists have conducted content analysis by using their substantive knowledge and manually coding documents. In recent years, however, fully automated content analysis based on probabilistic topic models has become increasingly popular because of their scalability. Unfortunately, applied researchers find that these models often fail to yield topics of their substantive interest by inadvertently creating multiple topics with similar content and combining different themes into a single topic. In this paper, we empirically demonstrate that providing topic models with a small number of keywords can substantially improve their performance. The proposed keyword assisted topic model (keyATM) offers an important advantage that the specification of keywords requires researchers to label topics prior to fitting a model to the data. This contrasts with a widespread practice of post-hoc topic interpretation and adjustments that compromises the objectivity of empirical findings. In our applications, we find that the keyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models. Finally, we show that the keyATM can also incorporate covariates and model time trends. An open-source software package is available for implementing the proposed methodology

arXiv.org e-Print Archive

The Effects of Political Martyrdom on Election Results: The Assassination of Abe

Author: Takagi Miu Nicole
Publication venue
Publication date: 29/05/2023
Field of study

In developed nations assassinations are rare and thus the impact of such acts on the electoral and political landscape is understudied. In this paper, we focus on Twitter data to examine the effects of Japan's former Primer Minister Abe's assassination on the Japanese House of Councillors elections in 2022. We utilize sentiment analysis and emotion detection together with topic modeling on over 2 million tweets and compare them against tweets during previous election cycles. Our findings indicate that Twitter sentiments were negatively impacted by the event in the short term and that social media attention span has shortened. We also discuss how "necropolitics" affected the outcome of the elections in favor of the deceased's party meaning that there seems to have been an effect of Abe's death on the election outcome though the findings warrant further investigation for conclusive results

arXiv.org e-Print Archive

Episciences.org

Directory of Open Access Journals

Analysing and Visualizing Tweets for U.S. President Popularity

Author: De Luca Ernesto
Fallucchi Francesca
Giuliano Romeo
Incarnato Giuseppe
Mazzenga Franco
Publication venue: 'Insight Society'
Publication date: 12/04/2019
Field of study

In our society we are continually invested by a stream of information (opinions, preferences, comments, etc.). This shows how Twitter users react to news or events that they attend or take part in real time and with interest. In this context it becomes essential to have the appropriate tools in order to be able to analyze and extract data and information hidden in their large number of tweets. Social networks are a source of information with no rivals in terms of amount and variety of information that can be extracted from them. We propose an approach to analyze, with the help of automated tools, comments and opinions taken from social media in a real time environment. We developed a software system in R based on the Bayesian approach for text categorization. We aim of identifying sentiments expressed by the tweets posted on the Twitter social platform. The analysis of sentiment spread on social networks allows to identify the free thoughts, expressed authentically. In particular, we analyze the sentiments related to U.S President popularity by also visualizing tweets on a map. This allows to make an additional analysis of the real time reactions of people by associating the reaction of the single person who posted the tweet to his real time position in Unites States. In particular, we provide a visualization based on the geographical analysis of the sentiments of the users who posted the tweets

International Journal on Advanced Science, Engineering and Information Technology

Global disease monitoring and forecasting with Wikipedia

Author: Del Valle Sara Y.
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Priedhorsky Reid
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 15/07/2014
Field of study

Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with

r^2

up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein and adjust novelty claims accordingly; revise title; various revisions for clarit

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PubMed Central

FigShare

Investigation into the Predictive Capability of Macro-Economic Features in Modelling Credit Risk for Small Medium Enterprises

Author: McTiernan Kevin
Publication venue: Dublin Institute of Technology
Publication date: 30/05/2016
Field of study

This research project investigates the predictive capability of macro-economic features in modelling credit risk for small medium enterprises (SME/SMEs). There have been indications that there is strong correlation between economic growth and the size of the SME sector in an economy. However, since the financial crisis and consequent policies and regulations, SMEs have been hampered in attempts to access credit. It has also been noted that while there is a substantial amount of credit risk literature, there is little research on how macro-economic factors affect credit risk. Being able to improve credit scoring by even a small amount can have a very positive effect on a financial institution\u27s profits, reputation and ability to support the economy. Typically, in the credit scoring process two methods of scoring are carried out, application scoring model and behavioural scoring model. These models for predicting customers who are likely to default usually rely upon financial, demographic and transactional data as the predictive inputs. This research investigates the use of a much coarser source of data at a macro-economic level by a low level and high level regions in Ireland. Features such as level of employment/unemployment, education attainment, consumer spending trends and default levels by different banking products will be evaluated as part of the research project. In the course of this research, techniques and methods are established for evaluating the usefulness of macro-economic features. These are subsequently introduced into the predictive models to be evaluated. It was found that while employing coarse classification and subsequently choosing the macro-economic features with the highest information value in the predictive model, the accuracy across all performance measures improved significantly. This has proven that macro-economic features have the potential to be used in modelling credit risk for SMEs in the future

Arrow@TUDublin

スタンス分類における領域知識の活用に関する研究

Author: Sasaki Akira
Publication venue
Publication date: 31/08/2018
Field of study

Tohoku University乾健太郎課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Human Enhancement Technologies and Our Merger with Machines

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

A cross-disciplinary approach is offered to consider the challenge of emerging technologies designed to enhance human bodies and minds. Perspectives from philosophy, ethics, law, and policy are applied to a wide variety of enhancements, including integration of technology within human bodies, as well as genetic, biological, and pharmacological modifications. Humans may be permanently or temporarily enhanced with artificial parts by manipulating (or reprogramming) human DNA and through other enhancement techniques (and combinations thereof). We are on the cusp of significantly modifying (and perhaps improving) the human ecosystem. This evolution necessitates a continuing effort to re-evaluate current laws and, if appropriate, to modify such laws or develop new laws that address enhancement technology. A legal, ethical, and policy response to current and future human enhancements should strive to protect the rights of all involved and to recognize the responsibilities of humans to other conscious and living beings, regardless of what they look like or what abilities they have (or lack). A potential ethical approach is outlined in which rights and responsibilities should be respected even if enhanced humans are perceived by non-enhanced (or less-enhanced) humans as “no longer human” at all

Directory of Open Access Books (DOAB)

A text segmentation approach for automated annotation of online customer reviews, based on topic modeling

Author: Hananto Valentinus Roby
Kryssanov Victor
Serdült Uwe
Publication venue: 'MDPI AG'
Publication date: 27/03/2022
Field of study

Online customer review classification and analysis have been recognized as an important problem in many domains, such as business intelligence, marketing, and e-governance. To solve this problem, a variety of machine learning methods was developed in the past decade. Existing methods, however, either rely on human labeling or have high computing cost, or both. This makes them a poor fit to deal with dynamic and ever-growing collections of short but semantically noisy texts of customer reviews. In the present study, the problem of multi-topic online review clustering is addressed by generating high quality bronze-standard labeled sets for training efficient classifier models. A novel unsupervised algorithm is developed to break reviews into sequential semantically homogeneous segments. Segment data is then used to fine-tune a Latent Dirichlet Allocation (LDA) model obtained for the reviews, and to classify them along categories detected through topic modeling. After testing the segmentation algorithm on a benchmark text collection, it was successfully applied in a case study of tourism review classification. In all experiments conducted, the proposed approach produced results similar to or better than baseline methods. The paper critically discusses the main findings and paves ways for future work

ZORA