15,945 research outputs found
Interactive tag maps and tag clouds for the multiscale exploration of large spatio-temporal datasets
'Tag clouds' and 'tag maps' are introduced to represent geographically referenced text. In combination, these aspatial and spatial views are used to explore a large structured spatio-temporal data set by providing overviews and filtering by text and geography. Prototypes are implemented using freely available technologies including Google Earth and Yahoo! 's Tag Map applet. The interactive tag map and tag cloud techniques and the rapid prototyping method used are informally evaluated through successes and limitations encountered. Preliminary evaluation suggests that the techniques may be useful for generating insights when visualizing large data sets containing geo-referenced text strings. The rapid prototyping approach enabled the technique to be developed and evaluated, leading to geovisualization through which a number of ideas were generated. Limitations of this approach are reflected upon. Tag placement, generalisation and prominence at different scales are issues which have come to light in this study that warrant further work
Revisiting Guerry's data: Introducing spatial constraints in multivariate analysis
Standard multivariate analysis methods aim to identify and summarize the main
structures in large data sets containing the description of a number of
observations by several variables. In many cases, spatial information is also
available for each observation, so that a map can be associated to the
multivariate data set. Two main objectives are relevant in the analysis of
spatial multivariate data: summarizing covariation structures and identifying
spatial patterns. In practice, achieving both goals simultaneously is a
statistical challenge, and a range of methods have been developed that offer
trade-offs between these two objectives. In an applied context, this
methodological question has been and remains a major issue in community
ecology, where species assemblages (i.e., covariation between species
abundances) are often driven by spatial processes (and thus exhibit spatial
patterns). In this paper we review a variety of methods developed in community
ecology to investigate multivariate spatial patterns. We present different ways
of incorporating spatial constraints in multivariate analysis and illustrate
these different approaches using the famous data set on moral statistics in
France published by Andr\'{e}-Michel Guerry in 1833. We discuss and compare the
properties of these different approaches both from a practical and theoretical
viewpoint.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS356 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Approximate Inference in Continuous Determinantal Point Processes
Determinantal point processes (DPPs) are random point processes well-suited
for modeling repulsion. In machine learning, the focus of DPP-based models has
been on diverse subset selection from a discrete and finite base set. This
discrete setting admits an efficient sampling algorithm based on the
eigendecomposition of the defining kernel matrix. Recently, there has been
growing interest in using DPPs defined on continuous spaces. While the
discrete-DPP sampler extends formally to the continuous case, computationally,
the steps required are not tractable in general. In this paper, we present two
efficient DPP sampling schemes that apply to a wide range of kernel functions:
one based on low rank approximations via Nystrom and random Fourier feature
techniques and another based on Gibbs sampling. We demonstrate the utility of
continuous DPPs in repulsive mixture modeling and synthesizing human poses
spanning activity spaces
A review of data visualization: opportunities in manufacturing sequence management.
Data visualization now benefits from developments in technologies that offer innovative ways of presenting complex data. Potentially these have widespread application in communicating the complex information domains typical of manufacturing sequence management environments for global enterprises. In this paper the authors review the visualization functionalities, techniques and applications reported in literature, map these to manufacturing sequence information presentation requirements and identify the opportunities available and likely development paths. Current leading-edge practice in dynamic updating and communication with suppliers is not being exploited in manufacturing sequence management; it could provide significant benefits to manufacturing business. In the context of global manufacturing operations and broad-based user communities with differing needs served by common data sets, tool functionality is generally ahead of user application
Curriculum Guidelines for Undergraduate Programs in Data Science
The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program
met for the purpose of composing guidelines for undergraduate programs in Data
Science. The group consisted of 25 undergraduate faculty from a variety of
institutions in the U.S., primarily from the disciplines of mathematics,
statistics and computer science. These guidelines are meant to provide some
structure for institutions planning for or revising a major in Data Science
- …