61,805 research outputs found
From Social Data Mining to Forecasting Socio-Economic Crisis
Socio-economic data mining has a great potential in terms of gaining a better
understanding of problems that our economy and society are facing, such as
financial instability, shortages of resources, or conflicts. Without
large-scale data mining, progress in these areas seems hard or impossible.
Therefore, a suitable, distributed data mining infrastructure and research
centers should be built in Europe. It also appears appropriate to build a
network of Crisis Observatories. They can be imagined as laboratories devoted
to the gathering and processing of enormous volumes of data on both natural
systems such as the Earth and its ecosystem, as well as on human
techno-socio-economic systems, so as to gain early warnings of impending
events. Reality mining provides the chance to adapt more quickly and more
accurately to changing situations. Further opportunities arise by individually
customized services, which however should be provided in a privacy-respecting
way. This requires the development of novel ICT (such as a self- organizing
Web), but most likely new legal regulations and suitable institutions as well.
As long as such regulations are lacking on a world-wide scale, it is in the
public interest that scientists explore what can be done with the huge data
available. Big data do have the potential to change or even threaten democratic
societies. The same applies to sudden and large-scale failures of ICT systems.
Therefore, dealing with data must be done with a large degree of responsibility
and care. Self-interests of individuals, companies or institutions have limits,
where the public interest is affected, and public interest is not a sufficient
justification to violate human rights of individuals. Privacy is a high good,
as confidentiality is, and damaging it would have serious side effects for
society.Comment: 65 pages, 1 figure, Visioneer White Paper, see
http://www.visioneer.ethz.c
Investigating the Impact of the Blogsphere: Using PageRank to Determine the Distribution of Attention
Much has been written in recent years about the blogosphere and its impact on political, educational and scientific debates. Lately the issue has received significant attention from the industry. As the blogosphere continues to grow, even doubling its size every six months, this paper investigates its apparent impact on the overall Web itself. We use the popular Google PageRank algorithm which employs a model of Web used to measure the distribution of user attention across sites in the blogosphere. The paper is based on an analysis of the PageRank distribution for 8.8 million blogs in 2005 and 2006. This paper addresses the following key questions: How is PageRank distributed across the blogosphere? Does it indicate the existence of measurable, visible effects of blogs on the overall mediasphere? Can we compare the distribution of attention to blogs as characterised by the PageRank with the situation for other forms of Web content? Has there been a growth in the impact of the blogosphere on the Web over the two years analysed here? Finally, it will also be necessary to examine the limitations of a PageRank-centred approach
Big Data and Analysis of Data Transfers for International Research Networks Using NetSage
Modern science is increasingly data-driven and collaborative in nature. Many scientific disciplines, including genomics, high-energy physics, astronomy, and atmospheric science, produce petabytes of data that must be shared with collaborators all over the world. The National Science Foundation-supported International Research Network Connection (IRNC) links have been essential to enabling this collaboration, but as data sharing has increased, so has the amount of information being collected to understand network performance. New capabilities to measure and analyze the performance of international wide-area networks are essential to ensure end-users are able to take full advantage of such infrastructure for their big data applications. NetSage is a project to develop a unified, open, privacy-aware network measurement, and visualization service to address the needs of monitoring today's high-speed international research networks. NetSage collects data on both backbone links and exchange points, which can be as much as 1Tb per month. This puts a significant strain on hardware, not only in terms storage needs to hold multi-year historical data, but also in terms of processor and memory needs to analyze the data to understand network behaviors. This paper addresses the basic NetSage architecture, its current data collection and archiving approach, and details the constraints of dealing with this big data problem of handling vast amounts of monitoring data, while providing useful, extensible visualization to end users
Scienceography: the study of how science is written
Scientific literature has itself been the subject of much scientific study,
for a variety of reasons: understanding how results are communicated, how ideas
spread, and assessing the influence of areas or individuals. However, most
prior work has focused on extracting and analyzing citation and stylistic
patterns. In this work, we introduce the notion of 'scienceography', which
focuses on the writing of science. We provide a first large scale study using
data derived from the arXiv e-print repository. Crucially, our data includes
the "source code" of scientific papers-the LaTEX source-which enables us to
study features not present in the "final product", such as the tools used and
private comments between authors. Our study identifies broad patterns and
trends in two example areas-computer science and mathematics-as well as
highlighting key differences in the way that science is written in these
fields. Finally, we outline future directions to extend the new topic of
scienceography.Comment: 13 pages,16 figures. Sixth International Conference on FUN WITH
ALGORITHMS, 201
Usage Bibliometrics
Scholarly usage data provides unique opportunities to address the known
shortcomings of citation analysis. However, the collection, processing and
analysis of usage data remains an area of active research. This article
provides a review of the state-of-the-art in usage-based informetric, i.e. the
use of usage data to study the scholarly process.Comment: Publisher's PDF (by permission). Publisher web site:
books.infotoday.com/asist/arist44.shtm
Quantifying the digital traces of Hurricane Sandy on Flickr
Society’s increasing interactions with technology are creating extensive “digital traces” of our collective human behavior. These new data sources are fuelling the rapid development of the new field of computational social science. To investigate user attention to the Hurricane Sandy disaster in 2012, we analyze data from Flickr, a popular website for sharing personal photographs. In this case study, we find that the number of photos taken and subsequently uploaded to Flickr with titles, descriptions or tags related to Hurricane Sandy bears a striking correlation to the atmospheric pressure in the US state New Jersey during this period. Appropriate leverage of such information could be useful to policy makers and others charged with emergency crisis management
- …