30,883 research outputs found

    Oblivious Median Slope Selection

    Full text link
    We study the median slope selection problem in the oblivious RAM model. In this model memory accesses have to be independent of the data processed, i.e., an adversary cannot use observed access patterns to derive additional information about the input. We show how to modify the randomized algorithm of Matou\v{s}ek (1991) to obtain an oblivious version with O(n log^2 n) expected time for n points in R^2. This complexity matches a theoretical upper bound that can be obtained through general oblivious transformation. In addition, results from a proof-of-concept implementation show that our algorithm is also practically efficient.Comment: 14 pages, to appear in Proceedings of CCCG 202

    How to measure metallicity from five-band photometry with supervised machine learning algorithms

    Full text link
    We demonstrate that it is possible to measure metallicity from the SDSS five-band photometry to better than 0.1 dex using supervised machine learning algorithms. Using spectroscopic estimates of metallicity as ground truth, we build, optimize and train several estimators to predict metallicity. We use the observed photometry, as well as derived quantities such as stellar mass and photometric redshift, as features, and we build two sample data sets at median redshifts of 0.103 and 0.218 and median r-band magnitude of 17.5 and 18.3 respectively. We find that ensemble methods, such as Random Forests of Trees and Extremely Randomized Trees, and Support Vector Machines all perform comparably well and can measure metallicity with a Root Mean Square Error (RMSE) of 0.081 and 0.090 for the two data sets when all objects are included. The fraction of outliers (objects for which |Z_true - Z_pred| > 0.2 dex) is 2.2 and 3.9%, respectively and the RMSE decreases to 0.068 and 0.069 if those objects are excluded. Because of the ability of these algorithms to capture complex relationships between data and target, our technique performs better than previously proposed methods that sought to fit metallicity using an analytic fitting formula, and has 3x more constraining power than SED fitting-based methods. Additionally, this method is extremely forgiving of contamination in the training set, and can be used with very satisfactory results for training sample sizes of just a few hundred objects. We distribute all the routines to reproduce our results and apply them to other data sets.Comment: Minor revisions, matching version published in MNRA

    Approximate selective inference via maximum likelihood

    Full text link
    This article considers a conditional approach to selective inference via approximate maximum likelihood for data described by Gaussian models. There are two important considerations in adopting a post-selection inferential perspective. While one of them concerns the effective use of information in data, the other aspect deals with the computational cost of adjusting for selection. Our approximate proposal serves both these purposes-- (i) exploits the use of randomness for efficient utilization of left-over information from selection; (ii) enables us to bypass potentially expensive MCMC sampling from conditional distributions. At the core of our method is the solution to a convex optimization problem which assumes a separable form across multiple selection queries. This allows us to address the problem of tractable and efficient inference in many practical scenarios, where more than one learning query is conducted to define and perhaps redefine models and their corresponding parameters. Through an in-depth analysis, we illustrate the potential of our proposal and provide extensive comparisons with other post-selective schemes in both randomized and non-randomized paradigms of inference

    The role of mentorship in protege performance

    Full text link
    The role of mentorship on protege performance is a matter of importance to academic, business, and governmental organizations. While the benefits of mentorship for proteges, mentors and their organizations are apparent, the extent to which proteges mimic their mentors' career choices and acquire their mentorship skills is unclear. Here, we investigate one aspect of mentor emulation by studying mentorship fecundity---the number of proteges a mentor trains---with data from the Mathematics Genealogy Project, which tracks the mentorship record of thousands of mathematicians over several centuries. We demonstrate that fecundity among academic mathematicians is correlated with other measures of academic success. We also find that the average fecundity of mentors remains stable over 60 years of recorded mentorship. We further uncover three significant correlations in mentorship fecundity. First, mentors with small mentorship fecundity train proteges that go on to have a 37% larger than expected mentorship fecundity. Second, in the first third of their career, mentors with large fecundity train proteges that go on to have a 29% larger than expected fecundity. Finally, in the last third of their career, mentors with large fecundity train proteges that go on to have a 31% smaller than expected fecundity.Comment: 23 pages double-spaced, 4 figure

    Risk Stratification in Post-MI Patients Based on Left Ventricular Ejection Fraction and Heart-Rate Turbulence

    Get PDF
    Objectives: Development of risk stratification criteria for predicting mortality in post-infarction patients taking into account LVEF and heart-rate turbulence (HRT). Methods: Based on previous results the two parameters LVEF (continuously) and turbulence slope (TS) as an indicator of the HRT were combined for risk stratification. The method has been applied within two independent data sets (the MPIP-trial and the EMIAT-study). Results: The criteria were defined in order to match the outcome of applying LVEF ( 30 % in sensitivity. In the MPIP trial the optimal criteria selected are TS normal and LVEF ( 21 % or TS abnormal and LVEF ( 40 %. Within the placebo group of the EMIAT-study the corresponding criteria are: TS normal and LVEF ( 23 % or TS abnormal and LVEF ( 40 %. Combining both studies the following criteria could be obtained: TS normal and LVEF ( 20 % or TS abnormal and LVEF ( 40 %. In the MPIP study 83 out of the 581 patients (= 14.3 %) are fulfilling these criteria. Within this group 30 patients have died during the follow-up. In the EMIAT-trial 218 out of the 591 patients (= 37.9 %) are classified as high risk patients with 53 deaths. Combining both studies the high risk group contains 301 patients with 83 deaths (ppv = 27.7 %). Using the MADIT-criterion as classification rule (LVEF ( 30 %) a sample of 375 patients with 85 deaths (ppv = 24 %) can be selected. Conclusions: The stratification rule based on LVEF and TS is able to select high risk patients suitable for implanting an ICD. The rule performs better than the classical one with LVEF alone. The high risk group applying the new criteria is smaller with about the same number of deaths and therefor with a higher positive predictive value. The classification criteria have been validated within a bootstrap study with 100 replications. In all samples the rule based on TS and LVEF (= NEW) was superior to LVEV alone, the high risk group has been smaller (( s: 301 ( 14.5 (NEW) vs. 375 ( 14.5 (LVEF)) and the positive predictive value was larger (( s: 27.2 ( 2.6 % (NEW) vs. 23.3 ( 2.2 % (LVEF)). The new criteria are less expensive due to a reduced number of high risk patients selected

    Percolation Analysis of a Wiener Reconstruction of the IRAS 1.2 Jy Redshift Catalog

    Get PDF
    We present percolation analyses of Wiener Reconstructions of the IRAS 1.2 Jy Redshift Survey. There are ten reconstructions of galaxy density fields in real space spanning the range β=0.1\beta= 0.1 to 1.01.0, where β=Ω0.6/b{\beta}={\Omega^{0.6}}/b, Ω\Omega is the present dimensionless density and bb is the bias factor. Our method uses the growth of the largest cluster statistic to characterize the topology of a density field, where Gaussian randomized versions of the reconstructions are used as standards for analysis. For the reconstruction volume of radius, R≈100h−1R {\approx} 100 h^{-1} Mpc, percolation analysis reveals a slight `meatball' topology for the real space, galaxy distribution of the IRAS survey. cosmology-galaxies:clustering-methods:numericalComment: Revised version accepted for publication in The Astrophysical Journal, January 10, 1997 issue, Vol.47
    • …
    corecore