86 research outputs found

    A meta-analysis of state-of-the-art electoral prediction from Twitter data

    Full text link
    Electoral prediction from Twitter data is an appealing research topic. It seems relatively straightforward and the prevailing view is overly optimistic. This is problematic because while simple approaches are assumed to be good enough, core problems are not addressed. Thus, this paper aims to (1) provide a balanced and critical review of the state of the art; (2) cast light on the presume predictive power of Twitter data; and (3) depict a roadmap to push forward the field. Hence, a scheme to characterize Twitter prediction methods is proposed. It covers every aspect from data collection to performance evaluation, through data processing and vote inference. Using that scheme, prior research is analyzed and organized to explain the main approaches taken up to date but also their weaknesses. This is the first meta-analysis of the whole body of research regarding electoral prediction from Twitter data. It reveals that its presumed predictive power regarding electoral prediction has been rather exaggerated: although social media may provide a glimpse on electoral outcomes current research does not provide strong evidence to support it can replace traditional polls. Finally, future lines of research along with a set of requirements they must fulfill are provided.Comment: 19 pages, 3 table

    Predicting self‐declared movie watching behavior using Facebook data and information‐fusion sensitivity analysis

    Get PDF
    The main purpose of this paper is to evaluate the feasibility of predicting whether yes or no a Facebook user has self-reported to have watched a given movie genre. Therefore, we apply a data analytical framework that (1) builds and evaluates several predictive models explaining self-declared movie watching behavior, and (2) provides insight into the importance of the predictors and their relationship with self-reported movie watching behavior. For the first outcome, we benchmark several algorithms (logistic regression, random forest, adaptive boosting, rotation forest, and naive Bayes) and evaluate their performance using the area under the receiver operating characteristic curve. For the second outcome, we evaluate variable importance and build partial dependence plots using information-fusion sensitivity analysis for different movie genres. To gather the data, we developed a custom native Facebook app. We resampled our dataset to make it representative of the general Facebook population with respect to age and gender. The results indicate that adaptive boosting outperforms all other algorithms. Time- and frequency-based variables related to media (movies, videos, and music) consumption constitute the list of top variables. To the best of our knowledge, this study is the first to fit predictive models of self-reported movie watching behavior and provide insights into the relationships that govern these models. Our models can be used as a decision tool for movie producers to target potential movie-watchers and market their movies more efficiently

    Multi Word Term Queries for Focused Information Retrieval.

    Get PDF
    International audienceIn this paper, we address both standard and focused retrieval tasks based on comprehensible language models and interactive query expansion (IQE). Query topics are expanded using an initial set of Multi Word Terms (MWTs) selected from top n ranked documents. MWTs are special text units that represent domain concepts and objects. As such, they can better represent query topics than ordinary phrases or n-grams. We tested different query representations: bag-of-words, phrases, flat list of MWTs, subsets of MWTs. We also combined the initial set of MWTs obtained in an IQE process with automatic query expansion (AQE) using language models and smoothing mechanism. We chose as baseline the Indri IR engine based on the language model using Dirichlet smoothing. The experiment is carried out on two benchmarks: TREC Enterprise track (TRECent) 2007 and 2008 collections; INEX 2008 Ad-hoc track using the Wikipedia collection

    Which features of repetitive negative thinking and positive reappraisal predict depression?:An in depth investigation using artificial neural networks with feature selection

    No full text
    Emotion regulation habits have long been implicated in risk for depression. However, research in this area traditionally adopts an approach that ignores the multifaceted nature of emotion regulation strategies, the clinical heterogeneity of depression, and potential differential relations between emotion regulation features and individual symptoms. To address limitations associated with the dominant aggregate-level approach, this study aimed to identify which features of key emotion regulation strategies are most predictive and when those features are most predictive of individual symptoms of depression across different time lags. Leveraging novel developments in the field of machine learning, artificial neural network models with feature selection were estimated using data from 460 participants who participated in a twenty-wave longitudinal study with weekly assessments. At each wave, participants completed measures of repetitive negative thinking, positive reappraisal, perceived stress, and depression symptoms. Results revealed that specific features of repetitive negative thinking (wondering “why can’t I get going?” and having thoughts or images about feelings of loneliness) and positive reappraisal (looking for positive sides) were important indicators for detecting various depressive symptoms, above and beyond perceived stress. These features had overlapping and unique predictive relations with individual cognitive, affective, and somatic symptoms. Examining temporal fluctuations in the predictive utility, results showed that the utility of these emotion regulation features was stable over time. These findings illuminate potential pathways through which emotion regulation features may confer risk for depression and help to identify actionable targets for its prevention and treatment

    Which features of repetitive negative thinking and positive reappraisal predict depression?: An in depth investigation using artificial neural networks with feature selection

    No full text
    Emotion regulation habits have long been implicated in risk for depression. However, research in this area traditionally adopts an approach that ignores the multifaceted nature of emotion regulation strategies, the clinical heterogeneity of depression, and potential differential relations between emotion regulation features and individual symptoms. To address limitations associated with the dominant aggregate-level approach, this study aimed to identify which features of key emotion regulation strategies are most predictive and when those features are most predictive of individual symptoms of depression across different time lags. Leveraging novel developments in the field of machine learning, artificial neural network models with feature selection were estimated using data from 460 participants who participated in a twenty-wave longitudinal study with weekly assessments. At each wave, participants completed measures of repetitive negative thinking, positive reappraisal, perceived stress, and depression symptoms. Results revealed that specific features of repetitive negative thinking (wondering “why can’t I get going?” and having thoughts or images about feelings of loneliness) and positive reappraisal (looking for positive sides) were important indicators for detecting various depressive symptoms, above and beyond perceived stress. These features had overlapping and unique predictive relations with individual cognitive, affective, and somatic symptoms. Examining temporal fluctuations in the predictive utility, results showed that the utility of these emotion regulation features was stable over time. These findings illuminate potential pathways through which emotion regulation features may confer risk for depression and help to identify actionable targets for its prevention and treatment

    Language Models for Searching in Web Corpora

    No full text
    We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and documents titles, with a range of webcentric priors. We provide a detailed analysis of the effect on relevance of document length, URL structure, and link topology. The resulting web-centric priors are applied to three types of topicsÂżdistillation, home page, and named pageÂżand improve effectiveness for all topic types, as well as for the mixed query set. For the terabyte track, we experimented with building an index just based on the document titles, or on the incoming anchor texts. Very selective indexing leads to a compact index that is effective in terms of early precision, catering for the typical web searcher behavior
    • 

    corecore