141,103 research outputs found
Where are the missing gamma ray burst redshifts?
In the redshift range z = 0-1, the gamma ray burst (GRB) redshift
distribution should increase rapidly because of increasing differential volume
sizes and strong evolution in the star formation rate. This feature is not
observed in the Swift redshift distribution and to account for this
discrepancy, a dominant bias, independent of the Swift sensitivity, is
required. Furthermore, despite rapid localization, about 40-50% of Swift and
pre-Swift GRBs do not have a measured redshift. We employ a heuristic technique
to extract this redshift bias using 66 GRBs localized by Swift with redshifts
determined from absorption or emission spectroscopy. For the Swift and
HETE+BeppoSAX redshift distributions, the best model fit to the bias in z < 1
implies that if GRB rate evolution follows the SFR, the bias cancels this rate
increase. We find that the same bias is affecting both Swift and HETE+BeppoSAX
measurements similarly in z < 1. Using a bias model constrained at a 98% KS
probability, we find that 72% of GRBs in z < 2 will not have measurable
redshifts and about 55% in z > 2. To achieve this high KS probability requires
increasing the GRB rate density in small z compared to the high-z rate. This
provides further evidence for a low-luminosity population of GRBs that are
observed in only a small volume because of their faintness.Comment: 5 pages, submitted to MNRA
How good Neural Networks interpretation methods really are? A quantitative benchmark
Saliency Maps (SMs) have been extensively used to interpret deep learning
models decision by highlighting the features deemed relevant by the model. They
are used on highly nonlinear problems, where linear feature selection (FS)
methods fail at highlighting relevant explanatory variables. However, the
reliability of gradient-based feature attribution methods such as SM has mostly
been only qualitatively (visually) assessed, and quantitative benchmarks are
currently missing, partially due to the lack of a definite ground truth on
image data. Concerned about the apophenic biases introduced by visual
assessment of these methods, in this paper we propose a synthetic quantitative
benchmark for Neural Networks (NNs) interpretation methods. For this purpose,
we built synthetic datasets with nonlinearly separable classes and increasing
number of decoy (random) features, illustrating the challenge of FS in
high-dimensional settings. We also compare these methods to conventional
approaches such as mRMR or Random Forests. Our results show that our simple
synthetic datasets are sufficient to challenge most of the benchmarked methods.
TreeShap, mRMR and LassoNet are the best performing FS methods. We also show
that, when quantifying the relevance of a few non linearly-entangled predictive
features diluted in a large number of irrelevant noisy variables, neural
network-based FS and interpretation methods are still far from being reliable
Data Deluge in Astrophysics: Photometric Redshifts as a Template Use Case
Astronomy has entered the big data era and Machine Learning based methods
have found widespread use in a large variety of astronomical applications. This
is demonstrated by the recent huge increase in the number of publications
making use of this new approach. The usage of machine learning methods, however
is still far from trivial and many problems still need to be solved. Using the
evaluation of photometric redshifts as a case study, we outline the main
problems and some ongoing efforts to solve them.Comment: 13 pages, 3 figures, Springer's Communications in Computer and
Information Science (CCIS), Vol. 82
Cluster membership probabilities from proper motions and multiwavelength photometric catalogues: I. Method and application to the Pleiades cluster
We present a new technique designed to take full advantage of the high
dimensionality (photometric, astrometric, temporal) of the DANCe survey to
derive self-consistent and robust membership probabilities of the Pleiades
cluster. We aim at developing a methodology to infer membership probabilities
to the Pleiades cluster from the DANCe multidimensional astro-photometric data
set in a consistent way throughout the entire derivation. The determination of
the membership probabilities has to be applicable to censored data and must
incorporate the measurement uncertainties into the inference procedure.
We use Bayes' theorem and a curvilinear forward model for the likelihood of
the measurements of cluster members in the colour-magnitude space, to infer
posterior membership probabilities. The distribution of the cluster members
proper motions and the distribution of contaminants in the full
multidimensional astro-photometric space is modelled with a
mixture-of-Gaussians likelihood. We analyse several representation spaces
composed of the proper motions plus a subset of the available magnitudes and
colour indices. We select two prominent representation spaces composed of
variables selected using feature relevance determination techniques based in
Random Forests, and analyse the resulting samples of high probability
candidates. We consistently find lists of high probability (p > 0.9975)
candidates with 1000 sources, 4 to 5 times more than obtained in the
most recent astro-photometric studies of the cluster.
The methodology presented here is ready for application in data sets that
include more dimensions, such as radial and/or rotational velocities, spectral
indices and variability.Comment: 14 pages, 4 figures, accepted by A&
Cosmological constraints from Sunyaev-Zeldovich cluster counts: an approach to account for missing redshifts
The accumulation of redshifts provides a significant observational bottleneck
when using galaxy cluster surveys to constrain cosmological parameters. We
propose a simple method to allow the use of samples where there is a fraction
of the redshifts that are not known. The simplest assumption is that the
missing redshifts are randomly extracted from the catalogue, but the method
also allows one to take into account known selection effects in the
accumulation of redshifts. We quantify the reduction in statistical precision
of cosmological parameter constraints as a function of the fraction of missing
redshifts for simulated surveys, and also investigate the impact of making an
incorrect assumption for the distribution of missing redshifts.Comment: 6 pages, 5 figures, accepted by Ap
Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm
This paper introduces ICET, a new algorithm for cost-sensitive
classification. ICET uses a genetic algorithm to evolve a population of biases
for a decision tree induction algorithm. The fitness function of the genetic
algorithm is the average cost of classification when using the decision tree,
including both the costs of tests (features, measurements) and the costs of
classification errors. ICET is compared here with three other algorithms for
cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5,
which classifies without regard to cost. The five algorithms are evaluated
empirically on five real-world medical datasets. Three sets of experiments are
performed. The first set examines the baseline performance of the five
algorithms on the five datasets and establishes that ICET performs
significantly better than its competitors. The second set tests the robustness
of ICET under a variety of conditions and shows that ICET maintains its
advantage. The third set looks at ICET's search in bias space and discovers a
way to improve the search.Comment: See http://www.jair.org/ for any accompanying file
Damped Lyman alpha Absorbing Galaxies At Low Redshifts z<1 From Hierarchical Galaxy Formation Models
We investigate Damped Ly-alpha absorbing galaxies (DLA galaxies) at low
redshifts z<1 in the hierarchical structure formation scenario to clarify the
nature of DLA galaxies because observational data of such galaxies mainly at
low redshifts are currently available. We find that our model well reproduces
distributions of fundamental properties of DLA galaxies such as luminosities,
column densities, impact parameters obtained by optical and near-infrared
imagings. Our results suggest that DLA systems primarily consist of low
luminosity galaxies with small impact parameters (typical radius about 3 kpc,
surface brightness from 22 to 27 mag arcsec^{-2}) similar to low surface
brightness (LSB) galaxies. In addition, we investigate selection biases arising
from the faintness and from the masking effect which prevents us from
identifying a DLA galaxy hidden or contaminated by a point spread function of a
background quasar. We find that the latter affects the distributions of DLA
properties more seriously rather than the former, and that the observational
data are well reproduced only when taking into account the masking effect. The
missing rate of DLA galaxies by the masking effect attains 60-90 % in the
sample at redshift 0<z<1 when an angular size limit is as small as 1 arcsec.
Furthermore we find a tight correlation between HI mass and cross section of
DLA galaxies, and also find that HI-rich galaxies with M(HI) \sim 10^{9} M_sun
dominate DLA systems. These features are entirely consistent with those from
the Arecibo Dual-Beam Survey which is a blind 21 cm survey. Finally we discuss
star formation rates, and find that they are typically about 10^{-2} M_sun
yr^{-1} as low as those in LSB galaxies.Comment: 21 pages, 13 figures, Accepted for publication in Astrophsical
Journa
- …