89 research outputs found

    A meta-analysis of state-of-the-art electoral prediction from Twitter data

    Full text link
    Electoral prediction from Twitter data is an appealing research topic. It seems relatively straightforward and the prevailing view is overly optimistic. This is problematic because while simple approaches are assumed to be good enough, core problems are not addressed. Thus, this paper aims to (1) provide a balanced and critical review of the state of the art; (2) cast light on the presume predictive power of Twitter data; and (3) depict a roadmap to push forward the field. Hence, a scheme to characterize Twitter prediction methods is proposed. It covers every aspect from data collection to performance evaluation, through data processing and vote inference. Using that scheme, prior research is analyzed and organized to explain the main approaches taken up to date but also their weaknesses. This is the first meta-analysis of the whole body of research regarding electoral prediction from Twitter data. It reveals that its presumed predictive power regarding electoral prediction has been rather exaggerated: although social media may provide a glimpse on electoral outcomes current research does not provide strong evidence to support it can replace traditional polls. Finally, future lines of research along with a set of requirements they must fulfill are provided.Comment: 19 pages, 3 table

    Predicting self‐declared movie watching behavior using Facebook data and information‐fusion sensitivity analysis

    Get PDF
    The main purpose of this paper is to evaluate the feasibility of predicting whether yes or no a Facebook user has self-reported to have watched a given movie genre. Therefore, we apply a data analytical framework that (1) builds and evaluates several predictive models explaining self-declared movie watching behavior, and (2) provides insight into the importance of the predictors and their relationship with self-reported movie watching behavior. For the first outcome, we benchmark several algorithms (logistic regression, random forest, adaptive boosting, rotation forest, and naive Bayes) and evaluate their performance using the area under the receiver operating characteristic curve. For the second outcome, we evaluate variable importance and build partial dependence plots using information-fusion sensitivity analysis for different movie genres. To gather the data, we developed a custom native Facebook app. We resampled our dataset to make it representative of the general Facebook population with respect to age and gender. The results indicate that adaptive boosting outperforms all other algorithms. Time- and frequency-based variables related to media (movies, videos, and music) consumption constitute the list of top variables. To the best of our knowledge, this study is the first to fit predictive models of self-reported movie watching behavior and provide insights into the relationships that govern these models. Our models can be used as a decision tool for movie producers to target potential movie-watchers and market their movies more efficiently

    Desempenho de porcas e leitoes em maternidades com diferentes sistemas de acondionamento termico no inverno.

    Get PDF
    O objetivo desta pesquisa foi estudar os efeitos de sistemas de acondicionamento termico, em maternidades para suinos, sobre as caracteristicas fisiologicas e o desempenho dos animais, durante o inverno. Um experimento em delineamento inteiramente casualizado em parcelas subdivididas, com dois tratamentos e dois periodos nas sub-parcelas, foi realizado. Os tratamentos usados foram: sala convencional (SSV) e sala com amplas aberturas de janelas e sistema de regulacao das aberturas por deio de cortinas (SAC). Foram coletados os seguintes dados: temperatural retal (TR), e frequencia respiratoria (FR) das porcas, perda de peso das porcas (PPP) e ganho de peso dos leitoes (GPL), consumo de racao (CRP), consumo de agua (CA), intervalo de desmamecio (IDC). As porcas do tratamento SAC apresentaram os menores valores de TR e FR. Nao houve diferencas entre os tratamentos sobre CRP e IDC. Houve diferenca entre os tratamentos para GPP, constatando-se que no tratamento SAC as porcas apresentaram as menores perdas. Nao houve diferenca entre os tratamento no GPL. Houve efeito do sistema de acondicionamento sobre as caracteristicas fisiologicas e o ganho de peso das porcas. O tratamento SAC e o melhor para as porcas no inverno

    On the Relationship between Novelty and Popularity of User-Generated Content

    No full text

    Source code retrieval using conceptual similarity

    No full text
    We propose a method for retrieving segments of source code from a large repository. The method is based on conceptual modeling of the code, combining information extracted from the structure of the code and standard informationdistance measures. Our results show an improvement over traditional retrieval models, indicating that, for this type of highly-structured documents, usage of structure is indeed beneficial for retrieval

    Learning to Predict the Future using Web Knowledge and Dynamics

    No full text

    Multi Word Term Queries for Focused Information Retrieval.

    Get PDF
    International audienceIn this paper, we address both standard and focused retrieval tasks based on comprehensible language models and interactive query expansion (IQE). Query topics are expanded using an initial set of Multi Word Terms (MWTs) selected from top n ranked documents. MWTs are special text units that represent domain concepts and objects. As such, they can better represent query topics than ordinary phrases or n-grams. We tested different query representations: bag-of-words, phrases, flat list of MWTs, subsets of MWTs. We also combined the initial set of MWTs obtained in an IQE process with automatic query expansion (AQE) using language models and smoothing mechanism. We chose as baseline the Indri IR engine based on the language model using Dirichlet smoothing. The experiment is carried out on two benchmarks: TREC Enterprise track (TRECent) 2007 and 2008 collections; INEX 2008 Ad-hoc track using the Wikipedia collection
    corecore