141,103 research outputs found

    Where are the missing gamma ray burst redshifts?

    Full text link
    In the redshift range z = 0-1, the gamma ray burst (GRB) redshift distribution should increase rapidly because of increasing differential volume sizes and strong evolution in the star formation rate. This feature is not observed in the Swift redshift distribution and to account for this discrepancy, a dominant bias, independent of the Swift sensitivity, is required. Furthermore, despite rapid localization, about 40-50% of Swift and pre-Swift GRBs do not have a measured redshift. We employ a heuristic technique to extract this redshift bias using 66 GRBs localized by Swift with redshifts determined from absorption or emission spectroscopy. For the Swift and HETE+BeppoSAX redshift distributions, the best model fit to the bias in z < 1 implies that if GRB rate evolution follows the SFR, the bias cancels this rate increase. We find that the same bias is affecting both Swift and HETE+BeppoSAX measurements similarly in z < 1. Using a bias model constrained at a 98% KS probability, we find that 72% of GRBs in z < 2 will not have measurable redshifts and about 55% in z > 2. To achieve this high KS probability requires increasing the GRB rate density in small z compared to the high-z rate. This provides further evidence for a low-luminosity population of GRBs that are observed in only a small volume because of their faintness.Comment: 5 pages, submitted to MNRA

    How good Neural Networks interpretation methods really are? A quantitative benchmark

    Full text link
    Saliency Maps (SMs) have been extensively used to interpret deep learning models decision by highlighting the features deemed relevant by the model. They are used on highly nonlinear problems, where linear feature selection (FS) methods fail at highlighting relevant explanatory variables. However, the reliability of gradient-based feature attribution methods such as SM has mostly been only qualitatively (visually) assessed, and quantitative benchmarks are currently missing, partially due to the lack of a definite ground truth on image data. Concerned about the apophenic biases introduced by visual assessment of these methods, in this paper we propose a synthetic quantitative benchmark for Neural Networks (NNs) interpretation methods. For this purpose, we built synthetic datasets with nonlinearly separable classes and increasing number of decoy (random) features, illustrating the challenge of FS in high-dimensional settings. We also compare these methods to conventional approaches such as mRMR or Random Forests. Our results show that our simple synthetic datasets are sufficient to challenge most of the benchmarked methods. TreeShap, mRMR and LassoNet are the best performing FS methods. We also show that, when quantifying the relevance of a few non linearly-entangled predictive features diluted in a large number of irrelevant noisy variables, neural network-based FS and interpretation methods are still far from being reliable

    Data Deluge in Astrophysics: Photometric Redshifts as a Template Use Case

    Get PDF
    Astronomy has entered the big data era and Machine Learning based methods have found widespread use in a large variety of astronomical applications. This is demonstrated by the recent huge increase in the number of publications making use of this new approach. The usage of machine learning methods, however is still far from trivial and many problems still need to be solved. Using the evaluation of photometric redshifts as a case study, we outline the main problems and some ongoing efforts to solve them.Comment: 13 pages, 3 figures, Springer's Communications in Computer and Information Science (CCIS), Vol. 82

    Cluster membership probabilities from proper motions and multiwavelength photometric catalogues: I. Method and application to the Pleiades cluster

    Full text link
    We present a new technique designed to take full advantage of the high dimensionality (photometric, astrometric, temporal) of the DANCe survey to derive self-consistent and robust membership probabilities of the Pleiades cluster. We aim at developing a methodology to infer membership probabilities to the Pleiades cluster from the DANCe multidimensional astro-photometric data set in a consistent way throughout the entire derivation. The determination of the membership probabilities has to be applicable to censored data and must incorporate the measurement uncertainties into the inference procedure. We use Bayes' theorem and a curvilinear forward model for the likelihood of the measurements of cluster members in the colour-magnitude space, to infer posterior membership probabilities. The distribution of the cluster members proper motions and the distribution of contaminants in the full multidimensional astro-photometric space is modelled with a mixture-of-Gaussians likelihood. We analyse several representation spaces composed of the proper motions plus a subset of the available magnitudes and colour indices. We select two prominent representation spaces composed of variables selected using feature relevance determination techniques based in Random Forests, and analyse the resulting samples of high probability candidates. We consistently find lists of high probability (p > 0.9975) candidates with \approx 1000 sources, 4 to 5 times more than obtained in the most recent astro-photometric studies of the cluster. The methodology presented here is ready for application in data sets that include more dimensions, such as radial and/or rotational velocities, spectral indices and variability.Comment: 14 pages, 4 figures, accepted by A&

    Cosmological constraints from Sunyaev-Zeldovich cluster counts: an approach to account for missing redshifts

    Full text link
    The accumulation of redshifts provides a significant observational bottleneck when using galaxy cluster surveys to constrain cosmological parameters. We propose a simple method to allow the use of samples where there is a fraction of the redshifts that are not known. The simplest assumption is that the missing redshifts are randomly extracted from the catalogue, but the method also allows one to take into account known selection effects in the accumulation of redshifts. We quantify the reduction in statistical precision of cosmological parameter constraints as a function of the fraction of missing redshifts for simulated surveys, and also investigate the impact of making an incorrect assumption for the distribution of missing redshifts.Comment: 6 pages, 5 figures, accepted by Ap

    Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

    Full text link
    This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

    Damped Lyman alpha Absorbing Galaxies At Low Redshifts z<1 From Hierarchical Galaxy Formation Models

    Full text link
    We investigate Damped Ly-alpha absorbing galaxies (DLA galaxies) at low redshifts z<1 in the hierarchical structure formation scenario to clarify the nature of DLA galaxies because observational data of such galaxies mainly at low redshifts are currently available. We find that our model well reproduces distributions of fundamental properties of DLA galaxies such as luminosities, column densities, impact parameters obtained by optical and near-infrared imagings. Our results suggest that DLA systems primarily consist of low luminosity galaxies with small impact parameters (typical radius about 3 kpc, surface brightness from 22 to 27 mag arcsec^{-2}) similar to low surface brightness (LSB) galaxies. In addition, we investigate selection biases arising from the faintness and from the masking effect which prevents us from identifying a DLA galaxy hidden or contaminated by a point spread function of a background quasar. We find that the latter affects the distributions of DLA properties more seriously rather than the former, and that the observational data are well reproduced only when taking into account the masking effect. The missing rate of DLA galaxies by the masking effect attains 60-90 % in the sample at redshift 0<z<1 when an angular size limit is as small as 1 arcsec. Furthermore we find a tight correlation between HI mass and cross section of DLA galaxies, and also find that HI-rich galaxies with M(HI) \sim 10^{9} M_sun dominate DLA systems. These features are entirely consistent with those from the Arecibo Dual-Beam Survey which is a blind 21 cm survey. Finally we discuss star formation rates, and find that they are typically about 10^{-2} M_sun yr^{-1} as low as those in LSB galaxies.Comment: 21 pages, 13 figures, Accepted for publication in Astrophsical Journa
    corecore