371 research outputs found
Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees
We provide classifications for all 143 million non-repeat photometric objects
in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision
trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate
that these star/galaxy classifications are expected to be reliable for
approximately 22 million objects with r < ~20. The general machine learning
environment Data-to-Knowledge and supercomputing resources enabled extensive
investigation of the decision tree parameter space. This work presents the
first public release of objects classified in this way for an entire SDSS data
release. The objects are classified as either galaxy, star or nsng (neither
star nor galaxy), with an associated probability for each class. To demonstrate
how to effectively make use of these classifications, we perform several
important tests. First, we detail selection criteria within the probability
space defined by the three classes to extract samples of stars and galaxies to
a given completeness and efficiency. Second, we investigate the efficacy of the
classifications and the effect of extrapolating from the spectroscopic regime
by performing blind tests on objects in the SDSS, 2dF Galaxy Redshift and 2dF
QSO Redshift (2QZ) surveys. Given the photometric limits of our spectroscopic
training data, we effectively begin to extrapolate past our star-galaxy
training set at r ~ 18. By comparing the number counts of our training sample
with the classified sources, however, we find that our efficiencies appear to
remain robust to r ~ 20. As a result, we expect our classifications to be
accurate for 900,000 galaxies and 6.7 million stars, and remain robust via
extrapolation for a total of 8.0 million galaxies and 13.9 million stars.
[Abridged]Comment: 27 pages, 12 figures, to be published in ApJ, uses emulateapj.cl
Recommended from our members
Competitive Altruism Explains Labor Exchange Variation in a Dominican Community
Smallholder farmers rely on labor exchange to generate agricultural work when cash is rare and credit unavailable. Reciprocal altruism, biased by genetic kinship, has been implicated as the mechanism responsible for labor exchange; however, few empirical tests confirm this proposition. Competitive altruism could be operating if people differ in ability and use this information as a criterion for partnership selection. Labor exchange data are presented from a Dominican smallholder village over a 10-month period within the village’s primary cash economic opportunity, bay oil production. Results indicate that competitive altruism better explains variation in labor exchange relationships and group size than reciprocal altruism and kinship, suggesting the presence of a biologic market for male exchange relationships. Bay oil laborers vary in altruistic behaviors, causing reputations for altruism to emerge. Men with reputations as high-quality altruists generate larger labor groups in bay oil production than do poor-quality ones. Larger groups induce bargaining wars, causing men to compete through altruistic acts, which allows high-quality individuals to discriminate potential partners for labor exchange relationships. Men with better reputations achieve more same-sex reciprocal partnerships but not a greater incidence of conjugal partnership, suggesting that male altruism is intra- but not intersexually selected.This is the publisher’s final pdf. The published article is copyrighted by the University of Chicago Press and can be found at: http://www.press.uchicago.edu/ucp/journals/journal/ca.html
A Home-Based Telerehabilitation Program for Patients with Stroke
Background. Although rehabilitation therapy is commonly provided after stroke, many patients do not derive maximal benefit because of access, cost, and compliance. A telerehabilitation-based program may overcome these barriers. We designed, then evaluated a home-based telerehabilitation system in patients with chronic hemiparetic stroke. Methods. Patients were 3 to 24 months poststroke with stable arm motor deficits. Each received 28 days of telerehabilitation using a system delivered to their home. Each day consisted of 1 structured hour focused on individualized exercises and games, stroke education, and an hour of free play. Results. Enrollees (n = 12) had baseline Fugl-Meyer (FM) scores of 39 ± 12 (mean ± SD). Compliance was excellent: participants engaged in therapy on 329/336 (97.9%) assigned days. Arm repetitions across the 28 days averaged 24,607 ± 9934 per participant. Arm motor status showed significant gains (FM change 4.8 ± 3.8 points, P = .0015), with half of the participants exceeding the minimal clinically important difference. Although scores on tests of computer literacy declined with age (r = −0.92; P \u3c .0001), neither the motor gains nor the amount of system use varied with computer literacy. Daily stroke education via the telerehabilitation system was associated with a 39% increase in stroke prevention knowledge (P = .0007). Depression scores obtained in person correlated with scores obtained via the telerehabilitation system 16 days later (r = 0.88; P = .0001). In-person blood pressure values closely matched those obtained via this system (r = 0.99; P \u3c .0001). Conclusions. This home-based system was effective in providing telerehabilitation, education, and secondary stroke prevention to participants. Use of a computer-based interface offers many opportunities to monitor and improve the health of patients after stroke
Prediction of survival probabilities with Bayesian Decision Trees
Practitioners use Trauma and Injury Severity Score (TRISS) models for predicting the survival probability of an injured patient. The accuracy of TRISS predictions is acceptable for patients with up to three typical injuries, but unacceptable for patients with a larger number of injuries or with atypical injuries. Based on a regression model, the TRISS methodology does not provide the predictive density required for accurate assessment of risk. Moreover, the regression model is difficult to interpret. We therefore consider Bayesian inference for estimating the predictive distribution of survival. The inference is based on decision tree models which recursively split data along explanatory variables, and so practitioners can understand these models. We propose the Bayesian method for estimating the predictive density and show that it outperforms the TRISS method in terms of both goodness-of-fit and classification accuracy. The developed method has been made available for evaluation purposes as a stand-alone application
The empirical replicability of task-based fMRI as a function of sample size
Replicating results (i.e. obtaining consistent results using a new independent dataset) is an essential part of good science. As replicability has consequences for theories derived from empirical studies, it is of utmost importance to better understand the underlying mechanisms influencing it. A popular tool for non-invasive neuroimaging studies is functional magnetic resonance imaging (fMRI). While the effect of underpowered studies is well documented, the empirical assessment of the interplay between sample size and replicability of results for task-based fMRI studies remains limited. In this work, we extend existing work on this assessment in two ways. Firstly, we use a large database of 1400 subjects performing four types of tasks from the IMAGEN project to subsample a series of independent samples of increasing size. Secondly, replicability is evaluated using a multi-dimensional framework consisting of 3 different measures: (un)conditional test-retest reliability, coherence and stability. We demonstrate not only a positive effect of sample size, but also a trade-off between spatial resolution and replicability. When replicability is assessed voxelwise or when observing small areas of activation, a larger sample size than typically used in fMRI is required to replicate results. On the other hand, when focussing on clusters of voxels, we observe a higher replicability. In addition, we observe variability in the size of clusters of activation between experimental paradigms or contrasts of parameter estimates within these
VAST: An ASKAP Survey for Variables and Slow Transients
The Australian Square Kilometre Array Pathfinder (ASKAP) will give us an
unprecedented opportunity to investigate the transient sky at radio
wavelengths. In this paper we present VAST, an ASKAP survey for Variables and
Slow Transients. VAST will exploit the wide-field survey capabilities of ASKAP
to enable the discovery and investigation of variable and transient phenomena
from the local to the cosmological, including flare stars, intermittent
pulsars, X-ray binaries, magnetars, extreme scattering events, interstellar
scintillation, radio supernovae and orphan afterglows of gamma ray bursts. In
addition, it will allow us to probe unexplored regions of parameter space where
new classes of transient sources may be detected. In this paper we review the
known radio transient and variable populations and the current results from
blind radio surveys. We outline a comprehensive program based on a multi-tiered
survey strategy to characterise the radio transient sky through detection and
monitoring of transient and variable sources on the ASKAP imaging timescales of
five seconds and greater. We also present an analysis of the expected source
populations that we will be able to detect with VAST.Comment: 29 pages, 8 figures. Submitted for publication in Pub. Astron. Soc.
Australi
TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets
<p>Abstract</p> <p>Background</p> <p>Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data.</p> <p>Results</p> <p>TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences.</p> <p>Conclusions</p> <p>TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at <url>http://edwards.sdsu.edu/tagcleaner</url>.</p
Spire, an Actin Nucleation Factor, Regulates Cell Division during Drosophila Heart Development
The Drosophila dorsal vessel is a beneficial model system for studying the regulation of early heart development. Spire (Spir), an actin-nucleation factor, regulates actin dynamics in many developmental processes, such as cell shape determination, intracellular transport, and locomotion. Through protein expression pattern analysis, we demonstrate that the absence of spir function affects cell division in Myocyte enhancer factor 2-, Tinman (Tin)-, Even-skipped- and Seven up (Svp)-positive heart cells. In addition, genetic interaction analysis shows that spir functionally interacts with Dorsocross, tin, and pannier to properly specify the cardiac fate. Furthermore, through visualization of double heterozygous embryos, we determines that spir cooperates with CycA for heart cell specification and division. Finally, when comparing the spir mutant phenotype with that of a CycA mutant, the results suggest that most Svp-positive progenitors in spir mutant embryos cannot undergo full cell division at cell cycle 15, and that Tin-positive progenitors are arrested at cell cycle 16 as double-nucleated cells. We conclude that Spir plays a crucial role in controlling dorsal vessel formation and has a function in cell division during heart tube morphogenesis
- …