140,050 research outputs found
This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News
The problem of fake news has gained a lot of attention as it is claimed to
have had a significant impact on 2016 US Presidential Elections. Fake news is
not a new problem and its spread in social networks is well-studied. Often an
underlying assumption in fake news discussion is that it is written to look
like real news, fooling the reader who does not check for reliability of the
sources or the arguments in its content. Through a unique study of three data
sets and features that capture the style and the language of articles, we show
that this assumption is not true. Fake news in most cases is more similar to
satire than to real news, leading us to conclude that persuasion in fake news
is achieved through heuristics rather than the strength of arguments. We show
overall title structure and the use of proper nouns in titles are very
significant in differentiating fake from real. This leads us to conclude that
fake news is targeted for audiences who are not likely to read beyond titles
and is aimed at creating mental associations between entities and claims.Comment: Published at The 2nd International Workshop on News and Public
Opinion at ICWS
Application of support vector machines on the basis of the first Hungarian bankruptcy model
In our study we rely on a data mining procedure known as support vector machine (SVM) on the database of the first Hungarian bankruptcy model. The models constructed are then contrasted with the results of earlier bankruptcy models with the use of classification accuracy and the area under the ROC curve. In using the SVM technique, in addition to conventional kernel functions, we also examine the possibilities of applying the ANOVA kernel function and take a detailed look at data preparation tasks recommended in using the SVM method (handling of outliers). The results of the models assembled suggest that a significant improvement of classification accuracy can be achieved on the database of the first Hungarian bankruptcy model when using the SVM method as opposed to neural networks
Adjusting Imperfect Data: Overview and Case Studies
[Excerpt] In this chapter, instead of using the similarity in the cleaned datasets to investigate economic fundamentals, we focus on the differences in the underlying ‘dirty’ data. We describe two data elements that remain fundamentally different across countries, and the extent to which they differ. We then proceed to document some of the problems that affect longitudinally linked administrative data in general, and we describe some of the solutions analysts and statistical agencies have implemented, and some that they did not implement. In each case, we explain the reasons for and against implementing a particular adjustment, and explore, through a select set of case studies, how each adjustment or absence thereof might affect the data. By giving the reader a look behind the scenes, we intend to strengthen the reader’s understanding of the data. Thus equipped, the reader can form his or her own opinion as to the degree of comparability of the findings across the different countries
A principal component analysis of 39 scientific impact measures
The impact of scientific publications has traditionally been expressed in
terms of citation counts. However, scientific activity has moved online over
the past decade. To better capture scientific impact in the digital era, a
variety of new impact measures has been proposed on the basis of social network
analysis and usage log data. Here we investigate how these new measures relate
to each other, and how accurately and completely they express scientific
impact. We performed a principal component analysis of the rankings produced by
39 existing and proposed measures of scholarly impact that were calculated on
the basis of both citation and usage log data. Our results indicate that the
notion of scientific impact is a multi-dimensional construct that can not be
adequately measured by any single indicator, although some measures are more
suitable than others. The commonly used citation Impact Factor is not
positioned at the core of this construct, but at its periphery, and should thus
be used with caution
- …