4,948 research outputs found

    DROP: Dimensionality Reduction Optimization for Time Series

    Full text link
    Dimensionality reduction is a critical step in scaling machine learning pipelines. Principal component analysis (PCA) is a standard tool for dimensionality reduction, but performing PCA over a full dataset can be prohibitively expensive. As a result, theoretical work has studied the effectiveness of iterative, stochastic PCA methods that operate over data samples. However, termination conditions for stochastic PCA either execute for a predetermined number of iterations, or until convergence of the solution, frequently sampling too many or too few datapoints for end-to-end runtime improvements. We show how accounting for downstream analytics operations during DR via PCA allows stochastic methods to efficiently terminate after operating over small (e.g., 1%) subsamples of input data, reducing whole workload runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups of up to 5x over Singular-Value-Decomposition-based PCA techniques, and exceeds conventional approaches like FFT and PAA by up to 16x in end-to-end workloads

    Time Series Management Systems:A Survey

    Get PDF
    The collection of time series data increases as more monitoring and automation are being deployed. These deployments range in scale from an Internet of things (IoT) device located in a household to enormous distributed Cyber-Physical Systems (CPSs) producing large volumes of data at high velocity. To store and analyze these vast amounts of data, specialized Time Series Management Systems (TSMSs) have been developed to overcome the limitations of general purpose Database Management Systems (DBMSs) for times series management. In this paper, we present a thorough analysis and classification of TSMSs developed through academic or industrial research and documented through publications. Our classification is organized into categories based on the architectures observed during our analysis. In addition, we provide an overview of each system with a focus on the motivational use case that drove the development of the system, the functionality for storage and querying of time series a system implements, the components the system is composed of, and the capabilities of each system with regard to Stream Processing and Approximate Query Processing (AQP). Last, we provide a summary of research directions proposed by other researchers in the field and present our vision for a next generation TSMS.Comment: 20 Pages, 15 Figures, 2 Tables, Accepted for publication in IEEE TKD

    K2P2^2 - A photometry pipeline for the K2 mission

    Full text link
    With the loss of a second reaction wheel, resulting in the inability to point continuously and stably at the same field of view, the NASA Kepler satellite recently entered a new mode of observation known as the K2 mission. The data from this redesigned mission present a specific challenge; the targets systematically drift in position on a ~6 hour time scale, inducing a significant instrumental signal in the photometric time series --- this greatly impacts the ability to detect planetary signals and perform asteroseismic analysis. Here we detail our version of a reduction pipeline for K2 target pixel data, which automatically: defines masks for all targets in a given frame; extracts the target's flux- and position time series; corrects the time series based on the apparent movement on the CCD (either in 1D or 2D) combined with the correction of instrumental and/or planetary signals via the KASOC filter (Handberg & Lund 2014), thus rendering the time series ready for asteroseismic analysis; computes power spectra for all targets, and identifies potential contaminations between targets. From a test of our pipeline on a sample of targets from the K2 campaign 0, the recovery of data for multiple targets increases the amount of potential light curves by a factor 10{\geq}10. Our pipeline could be applied to the upcoming TESS (Ricker et al. 2014) and PLATO 2.0 (Rauer et al. 2013) missions.Comment: 14 pages, 20 figures, Accepted for publication in The Astrophysical Journal (Apj

    Model-Based Time Series Management at Scale

    Get PDF

    A chemical survey of exoplanets with ARIEL

    Get PDF
    Thousands of exoplanets have now been discovered with a huge range of masses, sizes and orbits: from rocky Earth-like planets to large gas giants grazing the surface of their host star. However, the essential nature of these exoplanets remains largely mysterious: there is no known, discernible pattern linking the presence, size, or orbital parameters of a planet to the nature of its parent star. We have little idea whether the chemistry of a planet is linked to its formation environment, or whether the type of host star drives the physics and chemistry of the planet’s birth, and evolution. ARIEL was conceived to observe a large number (~1000) of transiting planets for statistical understanding, including gas giants, Neptunes, super-Earths and Earth-size planets around a range of host star types using transit spectroscopy in the 1.25–7.8 μm spectral range and multiple narrow-band photometry in the optical. ARIEL will focus on warm and hot planets to take advantage of their well-mixed atmospheres which should show minimal condensation and sequestration of high-Z materials compared to their colder Solar System siblings. Said warm and hot atmospheres are expected to be more representative of the planetary bulk composition. Observations of these warm/hot exoplanets, and in particular of their elemental composition (especially C, O, N, S, Si), will allow the understanding of the early stages of planetary and atmospheric formation during the nebular phase and the following few million years. ARIEL will thus provide a representative picture of the chemical nature of the exoplanets and relate this directly to the type and chemical environment of the host star. ARIEL is designed as a dedicated survey mission for combined-light spectroscopy, capable of observing a large and well-defined planet sample within its 4-year mission lifetime. Transit, eclipse and phase-curve spectroscopy methods, whereby the signal from the star and planet are differentiated using knowledge of the planetary ephemerides, allow us to measure atmospheric signals from the planet at levels of 10–100 part per million (ppm) relative to the star and, given the bright nature of targets, also allows more sophisticated techniques, such as eclipse mapping, to give a deeper insight into the nature of the atmosphere. These types of observations require a stable payload and satellite platform with broad, instantaneous wavelength coverage to detect many molecular species, probe the thermal structure, identify clouds and monitor the stellar activity. The wavelength range proposed covers all the expected major atmospheric gases from e.g. H2O, CO2, CH4 NH3, HCN, H2S through to the more exotic metallic compounds, such as TiO, VO, and condensed species. Simulations of ARIEL performance in conducting exoplanet surveys have been performed – using conservative estimates of mission performance and a full model of all significant noise sources in the measurement – using a list of potential ARIEL targets that incorporates the latest available exoplanet statistics. The conclusion at the end of the Phase A study, is that ARIEL – in line with the stated mission objectives – will be able to observe about 1000 exoplanets depending on the details of the adopted survey strategy, thus confirming the feasibility of the main science objectives.Peer reviewedFinal Published versio
    corecore