33 research outputs found

    Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh

    Get PDF
    We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-filtering of spatial filter and join queries. For example, we implemented a new algorithm to compute the HTM tessellation of complex geographic regions and precomputed the intersections of HTM triangles and geographic regions for faster false-positive filtering. With full control over the index structure, HTM-based pre-filtering of simple containment searches outperforms SQL Server spatial indices by a factor of ten and HTM-based spatial joins run about a hundred times faster.Comment: appears in Proceedings of the 26th International Conference on Scientific and Statistical Database Management (2014

    ANTARES: Progress towards building a `Broker' of time-domain alerts

    Full text link
    The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is a joint effort of NOAO and the Department of Computer Science at the University of Arizona to build prototype software to process alerts from time-domain surveys, especially LSST, to identify those alerts that must be followed up immediately. Value is added by annotating incoming alerts with existing information from previous surveys and compilations across the electromagnetic spectrum and from the history of past alerts. Comparison against a knowledge repository of properties and features of known or predicted kinds of variable phenomena is used for categorization. The architecture and algorithms being employed are described

    Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages

    Full text link
    Principal component analysis (PCA) and related techniques have been successfully employed in natural language processing. Text mining applications in the age of the online social media (OSM) face new challenges due to properties specific to these use cases (e.g. spelling issues specific to texts posted by users, the presence of spammers and bots, service announcements, etc.). In this paper, we employ a Robust PCA technique to separate typical outliers and highly localized topics from the low-dimensional structure present in language use in online social networks. Our focus is on identifying geospatial features among the messages posted by the users of the Twitter microblogging service. Using a dataset which consists of over 200 million geolocated tweets collected over the course of a year, we investigate whether the information present in word usage frequencies can be used to identify regional features of language use and topics of interest. Using the PCA pursuit method, we are able to identify important low-dimensional features, which constitute smoothly varying functions of the geographic location

    Scaling in Words on Twitter

    Get PDF
    Scaling properties of language are a useful tool for understanding generative processes in texts. We investigate the scaling relations in citywise Twitter corpora coming from the Metropolitan and Micropolitan Statistical Areas of the United States. We observe a slightly superlinear urban scaling with the city population for the total volume of the tweets and words created in a city. We then find that a certain core vocabulary follows the scaling relationship of that of the bulk text, but most words are sensitive to city size, exhibiting a super- or a sublinear urban scaling. For both regimes we can offer a plausible explanation based on the meaning of the words. We also show that the parameters for Zipf's law and Heaps law differ on Twitter from that of other texts, and that the exponent of Zipf's law changes with city size

    The Application of the Montage Image Mosaic Engine To The Visualization Of Astronomical Images

    Get PDF
    The Montage Image Mosaic Engine was designed as a scalable toolkit, written in C for performance and portability across *nix platforms, that assembles FITS images into mosaics. The code is freely available and has been widely used in the astronomy and IT communities for research, product generation and for developing next-generation cyber-infrastructure. Recently, it has begun to finding applicability in the field of visualization. This has come about because the toolkit design allows easy integration into scalable systems that process data for subsequent visualization in a browser or client. And it includes a visualization tool suitable for automation and for integration into Python: mViewer creates, with a single command, complex multi-color images overlaid with coordinate displays, labels, and observation footprints, and includes an adaptive image histogram equalization method that preserves the structure of a stretched image over its dynamic range. The Montage toolkit contains functionality originally developed to support the creation and management of mosaics but which also offers value to visualization: a background rectification algorithm that reveals the faint structure in an image; and tools for creating cutout and down-sampled versions of large images. Version 5 of Montage offers support for visualizing data written in HEALPix sky-tessellation scheme, and functionality for processing and organizing images to comply with the TOAST sky-tessellation scheme required for consumption by the World Wide Telescope (WWT). Four online tutorials enable readers to reproduce and extend all the visualizations presented in this paper.Comment: 16 pages, 9 figures; accepted for publication in the PASP Special Focus Issue: Techniques and Methods for Astrophysical Data Visualizatio
    corecore