25 research outputs found
Bigger Buffer k-d Trees on Multi-Many-Core Systems
A buffer k-d tree is a k-d tree variant for massively-parallel nearest neighbor search. While providing valuable speed-ups on modern many-core devices in case both a large number of reference and query points are given, buffer k-d trees are limited by the amount of points that can fit on a single device. In this work, we show how to modify the original data structure and the associated workflow to make the overall approach capable of dealing with massive data sets. We further provide a simple yet efficient way of using multiple devices given in a single workstation. The applicability of the modified framework is demonstrated in the context of astronomy, a field that is faced with huge amounts of data
Bigger Buffer k-d Trees on Multi-Many-Core Systems
A buffer k-d tree is a k-d tree variant for massively-parallel nearest neighbor search. While providing valuable speed-ups on modern many-core devices in case both a large number of reference and query points are given, buffer k-d trees are limited by the amount of points that can fit on a single device. In this work, we show how to modify the original data structure and the associated workflow to make the overall approach capable of dealing with massive data sets. We further provide a simple yet efficient way of using multiple devices given in a single workstation. The applicability of the modified framework is demonstrated in the context of astronomy, a field that is faced with huge amounts of data
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
A First Catalog of Variable Stars Measured by the Asteroid Terrestrial-impact Last Alert System (ATLAS)
The Asteroid Terrestrial-impact Last Alert System (ATLAS) carries out its
primary planetary defense mission by surveying about 13000 deg^2 at least four
times per night. The resulting data set is useful for the discovery of variable
stars to a magnitude limit fainter than r~18, with amplitudes down to 0.01 mag
for bright objects. Here we present a Data Release One catalog of variable
stars based on analyzing 142 million stars measured at least 100 times in the
first two years of ATLAS operations. Using a Lomb-Scargle periodogram and other
variability metrics, we identify 4.7 million candidate variables which we
analyze in detail. Through Space Telescope Science Institute, we publicly
release lightcurves for all of them, together with a vector of 169
classification features for each star. We do this at the level of unconfirmed
candidate variables in order to provide the community with a large set of
homogeneously analyzed photometry and avoid pre-judging which types of objects
others may find most interesting. We use machine learning to classify the
candidates into fifteen different broad categories based on lightcurve
morphology. About 10% (430,000 stars) pass extensive tests designed to screen
out spurious variability detections: we label these as `probable' variables. Of
these, 230,000 receive specific classifications as eclipsing binaries,
pulsating, Mira-type, or sinusoidal variables: these are the `classified'
variables. New discoveries among the probable variables number more than
300,000, while 150,000 of the classified variables are new, including about
10,000 pulsating variables, 2,000 Mira stars, and 70,000 eclipsing binaries.Comment: Accepted by AJ; gives instructions for querying ATLAS variable star
database; this new version has nicer lightcurve figure
The Evryscope Fast Transient Engine: Real-time Discovery of Rapidly Evolving Transients with Evryscope and the Argus Optical Array
Modern synoptic sky surveys are typically designed to detect supernovae-like transients, using a tiling strategy to identify objects that evolve on day-to-month timescales. Astrophysical phenomena with sub-hour durations, ranging from galactic stellar flares to optical flashes accompanying gamma-ray bursts, have largely escaped scrutiny. Due to their low intrinsic rates and short durations, surveys for fast transients must simultaneously cover significant fractions of the sky at sub-hour cadences, often by combining multiple telescopes. The Evryscopes represent an extreme of this approach, combining 43 small telescopes to image 38% of the entire sky every two minutes. To investigate bright and fast transients with the Evryscopes, I developed the Evryscope Fast Transient Engine (EFTE), a real-time transient detection and photometric analysis pipeline. EFTE uses a unique direct image subtraction routine suited to continuously monitoring the transient sky at minute cadence. Candidates are produced within two minutes for 98.5% of images, and are internally filtered using VetNet, a machine learning algorithm trained to sort real astrophysical events from false positives, both instrumental and astronomical, including millisecond-timescale reflections, or “glints” from satellites and debris in Earth orbit. Glints are a dominating foreground for astronomical surveys in the extreme time domain. I present the first measurements of the glint rate, noting that it exceeds the combined rate of public alerts from all active all-sky, fast-timescale transient searches, including neutrino, gravitational-wave, gamma-ray, and radio observatories. I further report spectroscopic followup of two stellar flares identified in real-time from the EFTE alert stream using glint-mitigation and science-driven selection metrics. These are the closest spectra relative to peak ever observed for flare stars outside of dedicated starting campaigns on known active stars, and provide unique constraints on the evolution of the flare continuum and temperature. Finally, EFTE is the software test bed for the pipelines of the Argus Optical Array, an upcoming all-sky survey based on the Evryscope concept scaled to the depths of the deepest operating sky surveys and a terabit per second data rate. This work concludes with a description of the Argus prototype series and pipelines, and an overview of fast transient science with the Array.Doctor of Philosoph