120,854 research outputs found
Visual and interactive exploration of point data
Point data, such as Unit Postcodes (UPC), can provide very detailed information at fine
scales of resolution. For instance, socio-economic attributes are commonly assigned to
UPC. Hence, they can be represented as points and observable at the postcode level.
Using UPC as a common field allows the concatenation of variables from disparate data
sources that can potentially support sophisticated spatial analysis. However, visualising
UPC in urban areas has at least three limitations. First, at small scales UPC occurrences
can be very dense making their visualisation as points difficult. On the other hand,
patterns in the associated attribute values are often hardly recognisable at large scales.
Secondly, UPC can be used as a common field to allow the concatenation of highly
multivariate data sets with an associated postcode. Finally, socio-economic variables
assigned to UPC (such as the ones used here) can be non-Normal in their distributions
as a result of a large presence of zero values and high variances which constrain their
analysis using traditional statistics.
This paper discusses a Point Visualisation Tool (PVT), a proof-of-concept system
developed to visually explore point data. Various well-known visualisation techniques
were implemented to enable their interactive and dynamic interrogation. PVT provides
multiple representations of point data to facilitate the understanding of the relations
between attributes or variables as well as their spatial characteristics. Brushing between
alternative views is used to link several representations of a single attribute, as well as
to simultaneously explore more than one variable. PVT’s functionality shows how the
use of visual techniques embedded in an interactive environment enable the exploration
of large amounts of multivariate point data
Functional Data Analysis in Electronic Commerce Research
This paper describes opportunities and challenges of using functional data
analysis (FDA) for the exploration and analysis of data originating from
electronic commerce (eCommerce). We discuss the special data structures that
arise in the online environment and why FDA is a natural approach for
representing and analyzing such data. The paper reviews several FDA methods and
motivates their usefulness in eCommerce research by providing a glimpse into
new domain insights that they allow. We argue that the wedding of eCommerce
with FDA leads to innovations both in statistical methodology, due to the
challenges and complications that arise in eCommerce data, and in online
research, by being able to ask (and subsequently answer) new research questions
that classical statistical methods are not able to address, and also by
expanding on research questions beyond the ones traditionally asked in the
offline environment. We describe several applications originating from online
transactions which are new to the statistics literature, and point out
statistical challenges accompanied by some solutions. We also discuss some
promising future directions for joint research efforts between researchers in
eCommerce and statistics.Comment: Published at http://dx.doi.org/10.1214/088342306000000132 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning
Fine tuning distributed systems is considered to be a craftsmanship, relying
on intuition and experience. This becomes even more challenging when the
systems need to react in near real time, as streaming engines have to do to
maintain pre-agreed service quality metrics. In this article, we present an
automated approach that builds on a combination of supervised and reinforcement
learning methods to recommend the most appropriate lever configurations based
on previous load. With this, streaming engines can be automatically tuned
without requiring a human to determine the right way and proper time to deploy
them. This opens the door to new configurations that are not being applied
today since the complexity of managing these systems has surpassed the
abilities of human experts. We show how reinforcement learning systems can find
substantially better configurations in less time than their human counterparts
and adapt to changing workloads
Community Detection and Growth Potential Prediction from Patent Citation Networks
The scoring of patents is useful for technology management analysis.
Therefore, a necessity of developing citation network clustering and prediction
of future citations for practical patent scoring arises. In this paper, we
propose a community detection method using the Node2vec. And in order to
analyze growth potential we compare three ''time series analysis methods'', the
Long Short-Term Memory (LSTM), ARIMA model, and Hawkes Process. The results of
our experiments, we could find common technical points from those clusters by
Node2vec. Furthermore, we found that the prediction accuracy of the ARIMA model
was higher than that of other models.Comment: arXiv admin note: text overlap with arXiv:1607.00653 by other author
Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science
The purpose of the New York Workshop on Computer, Earth and Space Sciences is
to bring together the New York area's finest Astronomers, Statisticians,
Computer Scientists, Space and Earth Scientists to explore potential synergies
between their respective fields. The 2011 edition (CESS2011) was a great
success, and we would like to thank all of the presenters and participants for
attending. This year was also special as it included authors from the upcoming
book titled "Advances in Machine Learning and Data Mining for Astronomy". Over
two days, the latest advanced techniques used to analyze the vast amounts of
information now available for the understanding of our universe and our planet
were presented. These proceedings attempt to provide a small window into what
the current state of research is in this vast interdisciplinary field and we'd
like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011
in New York City, Goddard Institute for Space Studie
- …