11,913 research outputs found
The role of handbooks in knowledge creation and diffusion: A case of science and technology studies
Genre is considered to be an important element in scholarly communication and
in the practice of scientific disciplines. However, scientometric studies have
typically focused on a single genre, the journal article. The goal of this
study is to understand the role that handbooks play in knowledge creation and
diffusion and their relationship with the genre of journal articles,
particularly in highly interdisciplinary and emergent social science and
humanities disciplines. To shed light on these questions we focused on
handbooks and journal articles published over the last four decades belonging
to the research area of Science and Technology Studies (STS), broadly defined.
To get a detailed picture we used the full-text of five handbooks (500,000
words) and a well-defined set of 11,700 STS articles. We confirmed the
methodological split of STS into qualitative and quantitative (scientometric)
approaches. Even when the two traditions explore similar topics (e.g., science
and gender) they approach them from different starting points. The change in
cognitive foci in both handbooks and articles partially reflects the changing
trends in STS research, often driven by technology. Using text similarity
measures we found that, in the case of STS, handbooks play no special role in
either focusing the research efforts or marking their decline. In general, they
do not represent the summaries of research directions that have emerged since
the previous edition of the handbook.Comment: Accepted for publication in Journal of Informetric
Towards Real-Time, Country-Level Location Classification of Worldwide Tweets
In contrast to much previous work that has focused on location classification
of tweets restricted to a specific country, here we undertake the task in a
broader context by classifying global tweets at the country level, which is so
far unexplored in a real-time scenario. We analyse the extent to which a
tweet's country of origin can be determined by making use of eight
tweet-inherent features for classification. Furthermore, we use two datasets,
collected a year apart from each other, to analyse the extent to which a model
trained from historical tweets can still be leveraged for classification of new
tweets. With classification experiments on all 217 countries in our datasets,
as well as on the top 25 countries, we offer some insights into the best use of
tweet-inherent features for an accurate country-level classification of tweets.
We find that the use of a single feature, such as the use of tweet content
alone -- the most widely used feature in previous work -- leaves much to be
desired. Choosing an appropriate combination of both tweet content and metadata
can actually lead to substantial improvements of between 20\% and 50\%. We
observe that tweet content, the user's self-reported location and the user's
real name, all of which are inherent in a tweet and available in a real-time
scenario, are particularly useful to determine the country of origin. We also
experiment on the applicability of a model trained on historical tweets to
classify new tweets, finding that the choice of a particular combination of
features whose utility does not fade over time can actually lead to comparable
performance, avoiding the need to retrain. However, the difficulty of achieving
accurate classification increases slightly for countries with multiple
commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data
Engineering (IEEE TKDE
De retibus socialibus et legibus momenti
Online Social Networks (OSNs) are a cutting edge topic. Almost everybody
--users, marketers, brands, companies, and researchers-- is approaching OSNs to
better understand them and take advantage of their benefits. Maybe one of the
key concepts underlying OSNs is that of influence which is highly related,
although not entirely identical, to those of popularity and centrality.
Influence is, according to Merriam-Webster, "the capacity of causing an effect
in indirect or intangible ways". Hence, in the context of OSNs, it has been
proposed to analyze the clicks received by promoted URLs in order to check for
any positive correlation between the number of visits and different "influence"
scores. Such an evaluation methodology is used in this paper to compare a
number of those techniques with a new method firstly described here. That new
method is a simple and rather elegant solution which tackles with influence in
OSNs by applying a physical metaphor.Comment: Changes made for third revision: Brief description of the dataset
employed added to Introduction. Minor changes to the description of
preparation of the bit.ly datasets. Minor changes to the captions of Tables 1
and 3. Brief addition in the Conclusions section (future line of work added).
Added references 16 and 18. Some typos and grammar polishe
Growing Story Forest Online from Massive Breaking News
We describe our experience of implementing a news content organization system
at Tencent that discovers events from vast streams of breaking news and evolves
news story structures in an online fashion. Our real-world system has distinct
requirements in contrast to previous studies on topic detection and tracking
(TDT) and event timeline or graph generation, in that we 1) need to accurately
and quickly extract distinguishable events from massive streams of long text
documents that cover diverse topics and contain highly redundant information,
and 2) must develop the structures of event stories in an online manner,
without repeatedly restructuring previously formed stories, in order to
guarantee a consistent user viewing experience. In solving these challenges, we
propose Story Forest, a set of online schemes that automatically clusters
streaming documents into events, while connecting related events in growing
trees to tell evolving stories. We conducted extensive evaluation based on 60
GB of real-world Chinese news data, although our ideas are not
language-dependent and can easily be extended to other languages, through
detailed pilot user experience studies. The results demonstrate the superior
capability of Story Forest to accurately identify events and organize news text
into a logical structure that is appealing to human readers, compared to
multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page
Laws and Limits of Econometrics
We start by discussing some general weaknesses and limitations of the econometric approach. A template from sociology is used to formulate six laws that characterize mainstream activities of econometrics and the scientific limits of those activities, we discuss some proximity theorems that quantify by means of explicit bounds how close we can get to the generating mechanism of the data and the optimal forecasts of next period observations using a finite number of observations. The magnitude of the bound depends on the characteristics of the model and the trajectory of the observed data. The results show that trends are more elusive to model than stationary processes in the sense that the proximity bounds are larger. By contrast, the bounds are of smaller order for models that are unidentified or nearly unidentified, so that lack or near lack of identification may not be as fatal to the use of a model in practice as some recent results on inference suggest, we look at one possible future of econometrics that involves the use of advanced econometric methods interactively by way of a web browser. With these methods users may access a suite of econometric methods and data sets online. They may also upload data to remote servers and by simple web browser selections initiate the implementation of advanced econometric software algorithms, returning the results online and by file and graphics downloads.Activities and limitations of econometrics, automated modeling, nearly unidentified models, nonstationarity, online econometrics, policy analysis, prediction, quantitative bounds, trends, unit roots, weak instruments
Studying Social Networks at Scale: Macroscopic Anatomy of the Twitter Social Graph
Twitter is one of the largest social networks using exclusively directed
links among accounts. This makes the Twitter social graph much closer to the
social graph supporting real life communications than, for instance, Facebook.
Therefore, understanding the structure of the Twitter social graph is
interesting not only for computer scientists, but also for researchers in other
fields, such as sociologists. However, little is known about how the
information propagation in Twitter is constrained by its inner structure. In
this paper, we present an in-depth study of the macroscopic structure of the
Twitter social graph unveiling the highways on which tweets propagate, the
specific user activity associated with each component of this macroscopic
structure, and the evolution of this macroscopic structure with time for the
past 6 years. For this study, we crawled Twitter to retrieve all accounts and
all social relationships (follow links) among accounts; the crawl completed in
July 2012 with 505 million accounts interconnected by 23 billion links. Then,
we present a methodology to unveil the macroscopic structure of the Twitter
social graph. This macroscopic structure consists of 8 components defined by
their connectivity characteristics. Each component group users with a specific
usage of Twitter. For instance, we identified components gathering together
spammers, or celebrities. Finally, we present a method to approximate the
macroscopic structure of the Twitter social graph in the past, validate this
method using old datasets, and discuss the evolution of the macroscopic
structure of the Twitter social graph during the past 6 years.Comment: ACM Sigmetrics 2014 (2014
- âŠ