30,883 research outputs found
Oblivious Median Slope Selection
We study the median slope selection problem in the oblivious RAM model. In
this model memory accesses have to be independent of the data processed, i.e.,
an adversary cannot use observed access patterns to derive additional
information about the input. We show how to modify the randomized algorithm of
Matou\v{s}ek (1991) to obtain an oblivious version with O(n log^2 n) expected
time for n points in R^2. This complexity matches a theoretical upper bound
that can be obtained through general oblivious transformation. In addition,
results from a proof-of-concept implementation show that our algorithm is also
practically efficient.Comment: 14 pages, to appear in Proceedings of CCCG 202
How to measure metallicity from five-band photometry with supervised machine learning algorithms
We demonstrate that it is possible to measure metallicity from the SDSS
five-band photometry to better than 0.1 dex using supervised machine learning
algorithms. Using spectroscopic estimates of metallicity as ground truth, we
build, optimize and train several estimators to predict metallicity. We use the
observed photometry, as well as derived quantities such as stellar mass and
photometric redshift, as features, and we build two sample data sets at median
redshifts of 0.103 and 0.218 and median r-band magnitude of 17.5 and 18.3
respectively. We find that ensemble methods, such as Random Forests of Trees
and Extremely Randomized Trees, and Support Vector Machines all perform
comparably well and can measure metallicity with a Root Mean Square Error
(RMSE) of 0.081 and 0.090 for the two data sets when all objects are included.
The fraction of outliers (objects for which |Z_true - Z_pred| > 0.2 dex) is 2.2
and 3.9%, respectively and the RMSE decreases to 0.068 and 0.069 if those
objects are excluded. Because of the ability of these algorithms to capture
complex relationships between data and target, our technique performs better
than previously proposed methods that sought to fit metallicity using an
analytic fitting formula, and has 3x more constraining power than SED
fitting-based methods. Additionally, this method is extremely forgiving of
contamination in the training set, and can be used with very satisfactory
results for training sample sizes of just a few hundred objects. We distribute
all the routines to reproduce our results and apply them to other data sets.Comment: Minor revisions, matching version published in MNRA
Approximate selective inference via maximum likelihood
This article considers a conditional approach to selective inference via
approximate maximum likelihood for data described by Gaussian models. There are
two important considerations in adopting a post-selection inferential
perspective. While one of them concerns the effective use of information in
data, the other aspect deals with the computational cost of adjusting for
selection. Our approximate proposal serves both these purposes-- (i) exploits
the use of randomness for efficient utilization of left-over information from
selection; (ii) enables us to bypass potentially expensive MCMC sampling from
conditional distributions. At the core of our method is the solution to a
convex optimization problem which assumes a separable form across multiple
selection queries. This allows us to address the problem of tractable and
efficient inference in many practical scenarios, where more than one learning
query is conducted to define and perhaps redefine models and their
corresponding parameters. Through an in-depth analysis, we illustrate the
potential of our proposal and provide extensive comparisons with other
post-selective schemes in both randomized and non-randomized paradigms of
inference
The role of mentorship in protege performance
The role of mentorship on protege performance is a matter of importance to
academic, business, and governmental organizations. While the benefits of
mentorship for proteges, mentors and their organizations are apparent, the
extent to which proteges mimic their mentors' career choices and acquire their
mentorship skills is unclear. Here, we investigate one aspect of mentor
emulation by studying mentorship fecundity---the number of proteges a mentor
trains---with data from the Mathematics Genealogy Project, which tracks the
mentorship record of thousands of mathematicians over several centuries. We
demonstrate that fecundity among academic mathematicians is correlated with
other measures of academic success. We also find that the average fecundity of
mentors remains stable over 60 years of recorded mentorship. We further uncover
three significant correlations in mentorship fecundity. First, mentors with
small mentorship fecundity train proteges that go on to have a 37% larger than
expected mentorship fecundity. Second, in the first third of their career,
mentors with large fecundity train proteges that go on to have a 29% larger
than expected fecundity. Finally, in the last third of their career, mentors
with large fecundity train proteges that go on to have a 31% smaller than
expected fecundity.Comment: 23 pages double-spaced, 4 figure
Risk Stratification in Post-MI Patients Based on Left Ventricular Ejection Fraction and Heart-Rate Turbulence
Objectives: Development of risk stratification criteria for predicting mortality in post-infarction patients taking into account LVEF and heart-rate turbulence (HRT). Methods: Based on previous results the two parameters LVEF (continuously) and turbulence slope (TS) as an indicator of the HRT were combined for risk stratification. The method has been applied within two independent data sets (the MPIP-trial and the EMIAT-study). Results: The criteria were defined in order to match the outcome of applying LVEF ( 30 % in sensitivity. In the MPIP trial the optimal criteria selected are TS normal and LVEF ( 21 % or TS abnormal and LVEF ( 40 %. Within the placebo group of the EMIAT-study the corresponding criteria are: TS normal and LVEF ( 23 % or TS abnormal and LVEF ( 40 %. Combining both studies the following criteria could be obtained: TS normal and LVEF ( 20 % or TS abnormal and LVEF ( 40 %. In the MPIP study 83 out of the 581 patients (= 14.3 %) are fulfilling these criteria. Within this group 30 patients have died during the follow-up. In the EMIAT-trial 218 out of the 591 patients (= 37.9 %) are classified as high risk patients with 53 deaths. Combining both studies the high risk group contains 301 patients with 83 deaths (ppv = 27.7 %). Using the MADIT-criterion as classification rule (LVEF ( 30 %) a sample of 375 patients with 85 deaths (ppv = 24 %) can be selected. Conclusions: The stratification rule based on LVEF and TS is able to select high risk patients suitable for implanting an ICD. The rule performs better than the classical one with LVEF alone. The high risk group applying the new criteria is smaller with about the same number of deaths and therefor with a higher positive predictive value. The classification criteria have been validated within a bootstrap study with 100 replications. In all samples the rule based on TS and LVEF (= NEW) was superior to LVEV alone, the high risk group has been smaller (( s: 301 ( 14.5 (NEW) vs. 375 ( 14.5 (LVEF)) and the positive predictive value was larger (( s: 27.2 ( 2.6 % (NEW) vs. 23.3 ( 2.2 % (LVEF)). The new criteria are less expensive due to a reduced number of high risk patients selected
Percolation Analysis of a Wiener Reconstruction of the IRAS 1.2 Jy Redshift Catalog
We present percolation analyses of Wiener Reconstructions of the IRAS 1.2 Jy
Redshift Survey. There are ten reconstructions of galaxy density fields in real
space spanning the range to , where
, is the present dimensionless density and
is the bias factor. Our method uses the growth of the largest cluster
statistic to characterize the topology of a density field, where Gaussian
randomized versions of the reconstructions are used as standards for analysis.
For the reconstruction volume of radius, Mpc,
percolation analysis reveals a slight `meatball' topology for the real space,
galaxy distribution of the IRAS survey.
cosmology-galaxies:clustering-methods:numericalComment: Revised version accepted for publication in The Astrophysical
Journal, January 10, 1997 issue, Vol.47
- …