Search CORE

14,396 research outputs found

Pruning and Nonparametric Multiple Change Point Detection

Author: James Nicholas
Matteson David
Zhang Wenyu
Publication venue
Publication date: 16/09/2017
Field of study

Change point analysis is a statistical tool to identify homogeneity within time series data. We propose a pruning approach for approximate nonparametric estimation of multiple change points. This general purpose change point detection procedure `cp3o' applies a pruning routine within a dynamic program to greatly reduce the search space and computational costs. Existing goodness-of-fit change point objectives can immediately be utilized within the framework. We further propose novel change point algorithms by applying cp3o to two popular nonparametric goodness of fit measures: `e-cp3o' uses E-statistics, and `ks-cp3o' uses Kolmogorov-Smirnov statistics. Simulation studies highlight the performance of these algorithms in comparison with parametric and other nonparametric change point methods. Finally, we illustrate these approaches with climatological and financial applications.Comment: 9 pages. arXiv admin note: text overlap with arXiv:1505.0430

arXiv.org e-Print Archive

Crossref

A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data

Author: Baumer Benjamin S.
Publication venue
Publication date: 23/05/2018
Field of study

Many interesting data sets available on the Internet are of a medium size---too big to fit into a personal computer's memory, but not so large that they won't fit comfortably on its hard disk. In the coming years, data sets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable framework for R (the state-of-the-art statistical computing environment) that leverages SQL (the venerable database architecture and query language) to make reproducible research on medium data a painless reality.Comment: 30 pages, plus supplementary material

arXiv.org e-Print Archive

FigShare

Smith College: Smith ScholarWorks

Deceit: A flexible distributed file system

Author: Birman Kenneth
Marzullo Keith
Siegel Alex
Publication venue
Publication date: 01/11/1989
Field of study

Deceit, a distributed file system (DFS) being developed at Cornell, focuses on flexible file semantics in relation to efficiency, scalability, and reliability. Deceit servers are interchangeable and collectively provide the illusion of a single, large server machine to any clients of the Deceit service. Non-volatile replicas of each file are stored on a subset of the file servers. The user is able to set parameters on a file to achieve different levels of availability, performance, and one-copy serializability. Deceit also supports a file version control mechanism. In contrast with many recent DFS efforts, Deceit can behave like a plain Sun Network File System (NFS) server and can be used by any NFS client without modifying any client software. The current Deceit prototype uses the ISIS Distributed Programming Environment for all communication and process group management, an approach that reduces system complexity and increases system robustness

NASA Technical Reports Server

eCommons@Cornell

Automated data processing architecture for the Gemini Planet Imager Exoplanet Survey

Author: Ammons S. Mark
Arriaga Pauline
Bailey Vanessa
Barman Travis
Bruzzone Sebastian
Bulger Joanna
Chilcote Jeffrey
Cotten Tara
De Rosa Robert
Doyon René
Duchêne Gaspard
Fitzgerald Michael
Follette Katherine
Goodsell Stephen
Graham James
Greenbaum Alexandra
Hibon Pascale
Hung Li-Wei
Ingraham Patrick
Kalas Paul
Konopacky Quinn
Larkin James
Macintosh Bruce
Maire Jérôme
Marchis Franck
Marley Mark
Marois Christian
Metchev Stanimir
Millar-Blanchaer Maxwell
Nielsen Eric
Oppenheimer Rebecca
Palmer David
Patience Jennifer
Perrin Marshall
Poyneer Lisa
Pueyo Laurent
Rajan Abhijith
Rameau Julien
Rantakyrö Fredrik
Ruffio Jean-Baptiste
Savransky Dmitry
Schneider Adam
Shapiro Jacob
Sivaramakrishnan Anand
Song Inseok
Soummer Remi
Thomas Sandrine
Wallace J. Kent
Wang Jason
Ward-Duong Kimberly
Wiktorowicz Sloane
Wolff Schuyler
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2018
Field of study

The Gemini Planet Imager Exoplanet Survey (GPIES) is a multi-year direct imaging survey of 600 stars to discover and characterize young Jovian exoplanets and their environments. We have developed an automated data architecture to process and index all data related to the survey uniformly. An automated and flexible data processing framework, which we term the Data Cruncher, combines multiple data reduction pipelines together to process all spectroscopic, polarimetric, and calibration data taken with GPIES. With no human intervention, fully reduced and calibrated data products are available less than an hour after the data are taken to expedite follow-up on potential objects of interest. The Data Cruncher can run on a supercomputer to reprocess all GPIES data in a single day as improvements are made to our data reduction pipelines. A backend MySQL database indexes all files, which are synced to the cloud, and a front-end web server allows for easy browsing of all files associated with GPIES. To help observers, quicklook displays show reduced data as they are processed in real-time, and chatbots on Slack post observing information as well as reduced data products. Together, the GPIES automated data processing architecture reduces our workload, provides real-time data reduction, optimizes our observing strategy, and maintains a homogeneously reduced dataset to study planet occurrence and instrument performance.Comment: 21 pages, 3 figures, accepted in JATI

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

HAL-INSU

The University of Arizona

eScholarship - University of California

HAL Université de Savoie

Transportable Applications Environment (TAE) Plus: A NASA user interface development and management system

Author: Szczur Martha R.
Publication venue
Publication date
Field of study

The transportable Applications Environment Plus (TAE Plus), developed at the NASA Goddard Space FLight Center, is a portable, What you see is what you get (WYSIWYG) user interface development and management system. Its primary objective is to provide an integrated software environment that allows interactive prototyping and development of graphical user interfaces, as well as management of the user interface within the operational domain. TAE Plus is being applied to many types of applications, and what TAE Plus provides, how the implementation has utilizes state-of-the-art technologies within graphic workstations, and how it has been used both within and without NASA are discussed

NASA Technical Reports Server