33 research outputs found
Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh
We present a case study about the spatial indexing and regional
classification of billions of geographic coordinates from geo-tagged social
network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft
SQL Server. Due to the lack of certain features of the HTM library, we use it
in conjunction with the GIS functions of SQL Server to significantly increase
the efficiency of pre-filtering of spatial filter and join queries. For
example, we implemented a new algorithm to compute the HTM tessellation of
complex geographic regions and precomputed the intersections of HTM triangles
and geographic regions for faster false-positive filtering. With full control
over the index structure, HTM-based pre-filtering of simple containment
searches outperforms SQL Server spatial indices by a factor of ten and
HTM-based spatial joins run about a hundred times faster.Comment: appears in Proceedings of the 26th International Conference on
Scientific and Statistical Database Management (2014
ANTARES: Progress towards building a `Broker' of time-domain alerts
The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is
a joint effort of NOAO and the Department of Computer Science at the University
of Arizona to build prototype software to process alerts from time-domain
surveys, especially LSST, to identify those alerts that must be followed up
immediately. Value is added by annotating incoming alerts with existing
information from previous surveys and compilations across the electromagnetic
spectrum and from the history of past alerts. Comparison against a knowledge
repository of properties and features of known or predicted kinds of variable
phenomena is used for categorization. The architecture and algorithms being
employed are described
Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages
Principal component analysis (PCA) and related techniques have been
successfully employed in natural language processing. Text mining applications
in the age of the online social media (OSM) face new challenges due to
properties specific to these use cases (e.g. spelling issues specific to texts
posted by users, the presence of spammers and bots, service announcements,
etc.). In this paper, we employ a Robust PCA technique to separate typical
outliers and highly localized topics from the low-dimensional structure present
in language use in online social networks. Our focus is on identifying
geospatial features among the messages posted by the users of the Twitter
microblogging service. Using a dataset which consists of over 200 million
geolocated tweets collected over the course of a year, we investigate whether
the information present in word usage frequencies can be used to identify
regional features of language use and topics of interest. Using the PCA pursuit
method, we are able to identify important low-dimensional features, which
constitute smoothly varying functions of the geographic location
Scaling in Words on Twitter
Scaling properties of language are a useful tool for understanding generative
processes in texts. We investigate the scaling relations in citywise Twitter
corpora coming from the Metropolitan and Micropolitan Statistical Areas of the
United States. We observe a slightly superlinear urban scaling with the city
population for the total volume of the tweets and words created in a city. We
then find that a certain core vocabulary follows the scaling relationship of
that of the bulk text, but most words are sensitive to city size, exhibiting a
super- or a sublinear urban scaling. For both regimes we can offer a plausible
explanation based on the meaning of the words. We also show that the parameters
for Zipf's law and Heaps law differ on Twitter from that of other texts, and
that the exponent of Zipf's law changes with city size
The Application of the Montage Image Mosaic Engine To The Visualization Of Astronomical Images
The Montage Image Mosaic Engine was designed as a scalable toolkit, written
in C for performance and portability across *nix platforms, that assembles FITS
images into mosaics. The code is freely available and has been widely used in
the astronomy and IT communities for research, product generation and for
developing next-generation cyber-infrastructure. Recently, it has begun to
finding applicability in the field of visualization. This has come about
because the toolkit design allows easy integration into scalable systems that
process data for subsequent visualization in a browser or client. And it
includes a visualization tool suitable for automation and for integration into
Python: mViewer creates, with a single command, complex multi-color images
overlaid with coordinate displays, labels, and observation footprints, and
includes an adaptive image histogram equalization method that preserves the
structure of a stretched image over its dynamic range. The Montage toolkit
contains functionality originally developed to support the creation and
management of mosaics but which also offers value to visualization: a
background rectification algorithm that reveals the faint structure in an
image; and tools for creating cutout and down-sampled versions of large images.
Version 5 of Montage offers support for visualizing data written in HEALPix
sky-tessellation scheme, and functionality for processing and organizing images
to comply with the TOAST sky-tessellation scheme required for consumption by
the World Wide Telescope (WWT). Four online tutorials enable readers to
reproduce and extend all the visualizations presented in this paper.Comment: 16 pages, 9 figures; accepted for publication in the PASP Special
Focus Issue: Techniques and Methods for Astrophysical Data Visualizatio