185,557 research outputs found
Citizen Science 2.0 : Data Management Principles to Harness the Power of the Crowd
Citizen science refers to voluntary participation by the general public in scientific endeavors. Although citizen science has a long tradition, the rise of online communities and user-generated web content has the potential to greatly expand its scope and contributions. Citizens spread across a large area will collect more information than an individual researcher can. Because citizen scientists tend to make observations about areas they know well, data are likely to be very detailed. Although the potential for engaging citizen scientists is extensive, there are challenges as well. In this paper we consider one such challenge – creating an environment in which non-experts in a scientific domain can provide appropriate and accurate data regarding their observations. We describe the problem in the context of a research project that includes the development of a website to collect citizen-generated data on the distribution of plants and animals in a geographic region. We propose an approach that can improve the quantity and quality of data collected in such projects by organizing data using instance-based data structures. Potential implications of this approach are discussed and plans for future research to validate the design are described
On the selection of secondary indices in relational databases
An important problem in the physical design of databases is the selection of secondary indices. In general, this problem cannot be solved in an optimal way due to the complexity of the selection process. Often use is made of heuristics such as the well-known ADD and DROP algorithms. In this paper it will be shown that frequently used cost functions can be classified as super- or submodular functions. For these functions several mathematical properties have been derived which reduce the complexity of the index selection problem. These properties will be used to develop a tool for physical database design and also give a mathematical foundation for the success of the before-mentioned ADD and DROP algorithms
PynPoint: a modular pipeline architecture for processing and analysis of high-contrast imaging data
The direct detection and characterization of planetary and substellar
companions at small angular separations is a rapidly advancing field. Dedicated
high-contrast imaging instruments deliver unprecedented sensitivity, enabling
detailed insights into the atmospheres of young low-mass companions. In
addition, improvements in data reduction and PSF subtraction algorithms are
equally relevant for maximizing the scientific yield, both from new and
archival data sets. We aim at developing a generic and modular data reduction
pipeline for processing and analysis of high-contrast imaging data obtained
with pupil-stabilized observations. The package should be scalable and robust
for future implementations and in particular well suitable for the 3-5 micron
wavelength range where typically (ten) thousands of frames have to be processed
and an accurate subtraction of the thermal background emission is critical.
PynPoint is written in Python 2.7 and applies various image processing
techniques, as well as statistical tools for analyzing the data, building on
open-source Python packages. The current version of PynPoint has evolved from
an earlier version that was developed as a PSF subtraction tool based on PCA.
The architecture of PynPoint has been redesigned with the core functionalities
decoupled from the pipeline modules. Modules have been implemented for
dedicated processing and analysis steps, including background subtraction,
frame registration, PSF subtraction, photometric and astrometric measurements,
and estimation of detection limits. The pipeline package enables end-to-end
data reduction of pupil-stabilized data and supports classical dithering and
coronagraphic data sets. As an example, we processed archival VLT/NACO L' and
M' data of beta Pic b and reassessed the planet's brightness and position with
an MCMC analysis, and we provide a derivation of the photometric error budget.Comment: 16 pages, 9 figures, accepted for publication in A&A, PynPoint is
available at https://github.com/PynPoint/PynPoin
Relational Approach to Knowledge Engineering for POMDP-based Assistance Systems as a Translation of a Psychological Model
Assistive systems for persons with cognitive disabilities (e.g. dementia) are
difficult to build due to the wide range of different approaches people can
take to accomplishing the same task, and the significant uncertainties that
arise from both the unpredictability of client's behaviours and from noise in
sensor readings. Partially observable Markov decision process (POMDP) models
have been used successfully as the reasoning engine behind such assistive
systems for small multi-step tasks such as hand washing. POMDP models are a
powerful, yet flexible framework for modelling assistance that can deal with
uncertainty and utility. Unfortunately, POMDPs usually require a very labour
intensive, manual procedure for their definition and construction. Our previous
work has described a knowledge driven method for automatically generating POMDP
activity recognition and context sensitive prompting systems for complex tasks.
We call the resulting POMDP a SNAP (SyNdetic Assistance Process). The
spreadsheet-like result of the analysis does not correspond to the POMDP model
directly and the translation to a formal POMDP representation is required. To
date, this translation had to be performed manually by a trained POMDP expert.
In this paper, we formalise and automate this translation process using a
probabilistic relational model (PRM) encoded in a relational database. We
demonstrate the method by eliciting three assistance tasks from non-experts. We
validate the resulting POMDP models using case-based simulations to show that
they are reasonable for the domains. We also show a complete case study of a
designer specifying one database, including an evaluation in a real-life
experiment with a human actor
The Design and Operation of The Keck Observatory Archive
The Infrared Processing and Analysis Center (IPAC) and the W. M. Keck
Observatory (WMKO) operate an archive for the Keck Observatory. At the end of
2013, KOA completed the ingestion of data from all eight active observatory
instruments. KOA will continue to ingest all newly obtained observations, at an
anticipated volume of 4 TB per year. The data are transmitted electronically
from WMKO to IPAC for storage and curation. Access to data is governed by a
data use policy, and approximately two-thirds of the data in the archive are
public.Comment: 12 pages, 4 figs, 4 tables. Presented at Software and
Cyberinfrastructure for Astronomy III, SPIE Astronomical Telescopes +
Instrumentation 2014. June 2014, Montreal, Canad
Using concept lattices to mine functional dependencies
Concept Lattices have been proved to be a valuable tool to represent
the knowlegde in a database.
In this paper we show how functional dependencies in databases
can be extracted using Concept Lattices, not preprocessing the original
database,
but providing a new closure operator. We also prove that this method
generalizes the previous methods and
closure operators that are being used to find association rules in binary
databases.Postprint (published version
- …