2,324 research outputs found

    Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

    Get PDF
    Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi

    Keyword Assisted Topic Models

    Full text link
    For a long time, many social scientists have conducted content analysis by using their substantive knowledge and manually coding documents. In recent years, however, fully automated content analysis based on probabilistic topic models has become increasingly popular because of their scalability. Unfortunately, applied researchers find that these models often fail to yield topics of their substantive interest by inadvertently creating multiple topics with similar content and combining different themes into a single topic. In this paper, we empirically demonstrate that providing topic models with a small number of keywords can substantially improve their performance. The proposed keyword assisted topic model (keyATM) offers an important advantage that the specification of keywords requires researchers to label topics prior to fitting a model to the data. This contrasts with a widespread practice of post-hoc topic interpretation and adjustments that compromises the objectivity of empirical findings. In our applications, we find that the keyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models. Finally, we show that the keyATM can also incorporate covariates and model time trends. An open-source software package is available for implementing the proposed methodology

    The Effects of Political Martyrdom on Election Results: The Assassination of Abe

    Full text link
    In developed nations assassinations are rare and thus the impact of such acts on the electoral and political landscape is understudied. In this paper, we focus on Twitter data to examine the effects of Japan's former Primer Minister Abe's assassination on the Japanese House of Councillors elections in 2022. We utilize sentiment analysis and emotion detection together with topic modeling on over 2 million tweets and compare them against tweets during previous election cycles. Our findings indicate that Twitter sentiments were negatively impacted by the event in the short term and that social media attention span has shortened. We also discuss how "necropolitics" affected the outcome of the elections in favor of the deceased's party meaning that there seems to have been an effect of Abe's death on the election outcome though the findings warrant further investigation for conclusive results

    Analysing and Visualizing Tweets for U.S. President Popularity

    Get PDF
    In our society we are continually invested by a stream of information (opinions, preferences, comments, etc.). This shows how Twitter users react to news or events that they attend or take part in real time and with interest. In this context it becomes essential to have the appropriate tools in order to be able to analyze and extract data and information hidden in their large number of tweets. Social networks are a source of information with no rivals in terms of amount and variety of information that can be extracted from them. We propose an approach to analyze, with the help of automated tools, comments and opinions taken from social media in a real time environment. We developed a software system in R based on the Bayesian approach for text categorization. We aim of identifying sentiments expressed by the tweets posted on the Twitter social platform. The analysis of sentiment spread on social networks allows to identify the free thoughts, expressed authentically. In particular, we analyze the sentiments related to U.S President popularity by also visualizing tweets on a map. This allows to make an additional analysis of the real time reactions of people by associating the reaction of the single person who posted the tweet to his real time position in Unites States. In particular, we provide a visualization based on the geographical analysis of the sentiments of the users who posted the tweets

    Global disease monitoring and forecasting with Wikipedia

    Full text link
    Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with r2r^2 up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein and adjust novelty claims accordingly; revise title; various revisions for clarit

    Investigation into the Predictive Capability of Macro-Economic Features in Modelling Credit Risk for Small Medium Enterprises

    Get PDF
    This research project investigates the predictive capability of macro-economic features in modelling credit risk for small medium enterprises (SME/SMEs). There have been indications that there is strong correlation between economic growth and the size of the SME sector in an economy. However, since the financial crisis and consequent policies and regulations, SMEs have been hampered in attempts to access credit. It has also been noted that while there is a substantial amount of credit risk literature, there is little research on how macro-economic factors affect credit risk. Being able to improve credit scoring by even a small amount can have a very positive effect on a financial institution\u27s profits, reputation and ability to support the economy. Typically, in the credit scoring process two methods of scoring are carried out, application scoring model and behavioural scoring model. These models for predicting customers who are likely to default usually rely upon financial, demographic and transactional data as the predictive inputs. This research investigates the use of a much coarser source of data at a macro-economic level by a low level and high level regions in Ireland. Features such as level of employment/unemployment, education attainment, consumer spending trends and default levels by different banking products will be evaluated as part of the research project. In the course of this research, techniques and methods are established for evaluating the usefulness of macro-economic features. These are subsequently introduced into the predictive models to be evaluated. It was found that while employing coarse classification and subsequently choosing the macro-economic features with the highest information value in the predictive model, the accuracy across all performance measures improved significantly. This has proven that macro-economic features have the potential to be used in modelling credit risk for SMEs in the future

    Human Enhancement Technologies and Our Merger with Machines

    Get PDF
    A cross-disciplinary approach is offered to consider the challenge of emerging technologies designed to enhance human bodies and minds. Perspectives from philosophy, ethics, law, and policy are applied to a wide variety of enhancements, including integration of technology within human bodies, as well as genetic, biological, and pharmacological modifications. Humans may be permanently or temporarily enhanced with artificial parts by manipulating (or reprogramming) human DNA and through other enhancement techniques (and combinations thereof). We are on the cusp of significantly modifying (and perhaps improving) the human ecosystem. This evolution necessitates a continuing effort to re-evaluate current laws and, if appropriate, to modify such laws or develop new laws that address enhancement technology. A legal, ethical, and policy response to current and future human enhancements should strive to protect the rights of all involved and to recognize the responsibilities of humans to other conscious and living beings, regardless of what they look like or what abilities they have (or lack). A potential ethical approach is outlined in which rights and responsibilities should be respected even if enhanced humans are perceived by non-enhanced (or less-enhanced) humans as “no longer human” at all

    A text segmentation approach for automated annotation of online customer reviews, based on topic modeling

    Full text link
    Online customer review classification and analysis have been recognized as an important problem in many domains, such as business intelligence, marketing, and e-governance. To solve this problem, a variety of machine learning methods was developed in the past decade. Existing methods, however, either rely on human labeling or have high computing cost, or both. This makes them a poor fit to deal with dynamic and ever-growing collections of short but semantically noisy texts of customer reviews. In the present study, the problem of multi-topic online review clustering is addressed by generating high quality bronze-standard labeled sets for training efficient classifier models. A novel unsupervised algorithm is developed to break reviews into sequential semantically homogeneous segments. Segment data is then used to fine-tune a Latent Dirichlet Allocation (LDA) model obtained for the reviews, and to classify them along categories detected through topic modeling. After testing the segmentation algorithm on a benchmark text collection, it was successfully applied in a case study of tourism review classification. In all experiments conducted, the proposed approach produced results similar to or better than baseline methods. The paper critically discusses the main findings and paves ways for future work
    corecore