Search CORE

33 research outputs found

Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh

Author: Bodor András
Budavári Tamás
Csabai István
Dobos László
Kondor Dániel
Szalay Alexander S.
Vattay Gábor
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-filtering of spatial filter and join queries. For example, we implemented a new algorithm to compute the HTM tessellation of complex geographic regions and precomputed the intersections of HTM triangles and geographic regions for faster false-positive filtering. With full control over the index structure, HTM-based pre-filtering of simple containment searches outperforms SQL Server spatial indices by a factor of ten and HTM-based spatial joins run about a hundred times faster.Comment: appears in Proceedings of the 26th International Conference on Scientific and Statistical Database Management (2014

arXiv.org e-Print Archive

Crossref

ELTE Digital Institutional Repository (EDIT)

ANTARES: Progress towards building a `Broker' of time-domain alerts

Author: Axelrod Tim
Jenness Tim
Kececioglu John
Matheson Thomas
Narayan Gautham
Ridgway Stephen
Saha Abhijit
Scheidegger Carlos
Seaman Robert
Snodgrass Richard
Taylor Clark
Toeniskoetter Jackson
Wang Zhe
Welch Eric
Yang Shuo
Zaidi Tayeb
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 17/11/2016
Field of study

The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is a joint effort of NOAO and the Department of Computer Science at the University of Arizona to build prototype software to process alerts from time-domain surveys, especially LSST, to identify those alerts that must be followed up immediately. Value is added by annotating incoming alerts with existing information from previous surveys and compilations across the electromagnetic spectrum and from the history of past alerts. Comparison against a knowledge repository of properties and features of known or predicted kinds of variable phenomena is used for categorization. The architecture and algorithms being employed are described

arXiv.org e-Print Archive

Crossref

Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages

Author: Barankai Norbert
Csabai István
Dobos László
Hanyecz Tamás
Kallus Zsófia
Kondor Dániel
Sebők Tamás
Szüle János
Vattay Gábor
Publication venue
Publication date: 01/01/2013
Field of study

Principal component analysis (PCA) and related techniques have been successfully employed in natural language processing. Text mining applications in the age of the online social media (OSM) face new challenges due to properties specific to these use cases (e.g. spelling issues specific to texts posted by users, the presence of spammers and bots, service announcements, etc.). In this paper, we employ a Robust PCA technique to separate typical outliers and highly localized topics from the low-dimensional structure present in language use in online social networks. Our focus is on identifying geospatial features among the messages posted by the users of the Twitter microblogging service. Using a dataset which consists of over 200 million geolocated tweets collected over the course of a year, we investigate whether the information present in word usage frequencies can be used to identify regional features of language use and topics of interest. Using the PCA pursuit method, we are able to identify important low-dimensional features, which constitute smoothly varying functions of the geographic location

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scaling in Words on Twitter

Author: Bokányi Eszter
Kondor Dániel
Vattay Gábor
Publication venue
Publication date: 01/01/2019
Field of study

Scaling properties of language are a useful tool for understanding generative processes in texts. We investigate the scaling relations in citywise Twitter corpora coming from the Metropolitan and Micropolitan Statistical Areas of the United States. We observe a slightly superlinear urban scaling with the city population for the total volume of the tweets and words created in a city. We then find that a certain core vocabulary follows the scaling relationship of that of the bulk text, but most words are sensitive to city size, exhibiting a super- or a sublinear urban scaling. For both regimes we can offer a plausible explanation based on the meaning of the words. We also show that the parameters for Zipf's law and Heaps law differ on Twitter from that of other texts, and that the exponent of Zipf's law changes with city size

arXiv.org e-Print Archive

Repository of the Academy's Library

The Application of the Montage Image Mosaic Engine To The Visualization Of Astronomical Images

Author: Berriman G. Bruce
Good J. C.
Publication venue: 'IOP Publishing'
Publication date: 08/02/2017
Field of study

The Montage Image Mosaic Engine was designed as a scalable toolkit, written in C for performance and portability across *nix platforms, that assembles FITS images into mosaics. The code is freely available and has been widely used in the astronomy and IT communities for research, product generation and for developing next-generation cyber-infrastructure. Recently, it has begun to finding applicability in the field of visualization. This has come about because the toolkit design allows easy integration into scalable systems that process data for subsequent visualization in a browser or client. And it includes a visualization tool suitable for automation and for integration into Python: mViewer creates, with a single command, complex multi-color images overlaid with coordinate displays, labels, and observation footprints, and includes an adaptive image histogram equalization method that preserves the structure of a stretched image over its dynamic range. The Montage toolkit contains functionality originally developed to support the creation and management of mosaics but which also offers value to visualization: a background rectification algorithm that reveals the faint structure in an image; and tools for creating cutout and down-sampled versions of large images. Version 5 of Montage offers support for visualizing data written in HEALPix sky-tessellation scheme, and functionality for processing and organizing images to comply with the TOAST sky-tessellation scheme required for consumption by the World Wide Telescope (WWT). Four online tutorials enable readers to reproduce and extend all the visualizations presented in this paper.Comment: 16 pages, 9 figures; accepted for publication in the PASP Special Focus Issue: Techniques and Methods for Astrophysical Data Visualizatio

arXiv.org e-Print Archive

Caltech Authors