1,277 research outputs found
A Framework for Genetic Algorithms Based on Hadoop
Genetic Algorithms (GAs) are powerful metaheuristic techniques mostly used in
many real-world applications. The sequential execution of GAs requires
considerable computational power both in time and resources. Nevertheless, GAs
are naturally parallel and accessing a parallel platform such as Cloud is easy
and cheap. Apache Hadoop is one of the common services that can be used for
parallel applications. However, using Hadoop to develop a parallel version of
GAs is not simple without facing its inner workings. Even though some
sequential frameworks for GAs already exist, there is no framework supporting
the development of GA applications that can be executed in parallel. In this
paper is described a framework for parallel GAs on the Hadoop platform,
following the paradigm of MapReduce. The main purpose of this framework is to
allow the user to focus on the aspects of GA that are specific to the problem
to be addressed, being sure that this task is going to be correctly executed on
the Cloud with a good performance. The framework has been also exploited to
develop an application for Feature Subset Selection problem. A preliminary
analysis of the performance of the developed GA application has been performed
using three datasets and shown very promising performance
Orion revisited. II. The foreground population to Orion A
Following the recent discovery of a large population of young stars in front
of the Orion Nebula, we carried out an observational campaign with the DECam
wide-field camera covering ~10~deg^2 centered on NGC 1980 to confirm, probe the
extent of, and characterize this foreground population of pre-main-sequence
stars. We confirm the presence of a large foreground population towards the
Orion A cloud. This population contains several distinct subgroups, including
NGC1980 and NGC1981, and stretches across several degrees in front of the Orion
A cloud. By comparing the location of their sequence in various color-magnitude
diagrams with other clusters, we found a distance and an age of 380pc and
5~10Myr, in good agreement with previous estimates. Our final sample includes
2123 candidate members and is complete from below the hydrogen-burning limit to
about 0.3Msun, where the data start to be limited by saturation. Extrapolating
the mass function to the high masses, we estimate a total number of ~2600
members in the surveyed region. We confirm the presence of a rich, contiguous,
and essentially coeval population of about 2600 foreground stars in front of
the Orion A cloud, loosely clustered around NGC1980, NGC1981, and a new group
in the foreground of the OMC-2/3. For the area of the cloud surveyed, this
result implies that there are more young stars in the foreground population
than young stars inside the cloud. Assuming a normal initial mass function, we
estimate that between one to a few supernovae must have exploded in the
foreground population in the past few million years, close to the surface of
Orion A, which might be responsible, together with stellar winds, for the
structure and star formation activity in these clouds. This long-overlooked
foreground stellar population is of great significance, calling for a revision
of the star formation history in this region of the Galaxy.Comment: Accepted for publication in A&
Did You Do Your Homework? Raising Awareness on Software Fairness and Discrimination
Machine Learning is a vital part of various modern day decision making software.
At the same time, it has shown to exhibit bias, which can cause an unjust treatment of individuals and population groups. One method to achieve fairness in machine learning software is to provide individuals with the same degree of benefit, regardless of sensitive attributes (e.g., students receive the same grade, independent of their sex or race). However, there can be other attributes that one might want to discriminate against (e.g., students with homework should receive higher grades). We will call such attributes anti-protected attributes. When reducing the bias of machine learning software, one risks the loss of discriminatory behaviour of anti-protected attributes. To combat this, we use grid search to show that machine learning software can be debiased (e.g., reduce gender bias) while also improving the ability to discriminate against anti-protected attributes
The Effect of Offspring Population Size on NSGA-II: A Preliminary Study
Non-Dominated Sorting Genetic Algorithm (NSGA-II) is one of the
most popular Multi-Objective Evolutionary Algorithms (MOEA)
and has been applied to a large range of problems.
Previous studies have shown that parameter tuning can improve
NSGA-II performance. However, the tuning of the offspring population size, which guides the exploration-exploitation trade-off in
NSGA-II, has been overlooked so far. Previous work has generally
used the population size as the default offspring population size for
NSGA-II.
We therefore investigate the impact of offspring population size
on the performance of NSGA-II. We carry out an empirical study by
comparing the effectiveness of three configurations vs. the default
NSGA-II configuration on six optimization problems based on four
Pareto front quality indicators and statistical tests.
Our findings show that the performance of NSGA-II can be improved by reducing the offspring population size and in turn increasing the number of generations. This leads to similar or statistically
significant better results than those obtained by using the default
NSGA-II configuration in 92% of the experiments performed
A hierarchical Bayesian model to infer PL(Z) relations using Gaia parallaxes
Aims. We aim at creating a Bayesian model to infer the coefficients of PL or
PLZ relations that propagates uncertainties in the observables in a rigorous
and well founded way. Methods. We propose a directed acyclic graph to encode
the conditional probabilities of the inference model that will allow us to
infer probability distributions for the PL and PL(Z) relations. We evaluate the
model with several semi-synthetic data sets and apply it to a sample of 200
fundamental mode and first overtone mode RR Lyrae stars for which Gaia DR1
parallaxes and literature Ks-band mean magnitudes are available. We define and
test several hyperprior probabilities to verify their adequacy and check the
sensitivity of the solution with respect to the prior choice. Results. The main
conclusion of this work is the absolute necessity of incorporating the existing
correlations between the observed variables (periods, metallicities and
parallaxes) in the form of model priors in order to avoid systematically biased
results, especially in the case of non-negligible uncertainties in the
parallaxes. The tests with the semi-synthetic data based on the data set used
in Gaia Collaboration et al. (2017) reveal the significant impact that the
existing correlations between parallax, metallicity and periods have on the
inferred parameters. The relation coefficients obtained here have been
superseded by those presented in Muraveva et al. (2018a), that incorporates the
findings of this work and the more recent Gaia DR2 measurements.Comment: 14 pages, 12 figures. Submitted to A&
Statistical techniques for the detection and analysis of solar explosive events
Solar explosive events are commonly explained as small scale magnetic
reconnection events, although unambiguous confirmation of this scenario remains
elusive due to the lack of spatial resolution and of the statistical analysis
of large enough samples of this type of events. In this work, we propose a
sound statistical treatment of data cubes consisting of a temporal sequence of
long slit spectra of the solar atmosphere. The analysis comprises all the
stages from the explosive event detection to its characterization and the
subsequent sample study. We have designed two complementary approaches based on
the combination of standard statistical techniques (Robust Principal Component
Analysis in one approach and wavelet decomposition and Independent Component
Analysis in the second) in order to obtain least biased samples. These
techniques are implemented in the spirit of letting the data speak for
themselves. The analysis is carried out for two spectral lines: the C IV line
at 1548.2 angstroms and the Ne VIII line at 770.4 angstroms. We find
significant differences between the characteristics of the line profiles
emitted in the proximities of two active regions, and in the quiet Sun, most
visible in the relative importance of a separate population of red shifted
profiles. We also find a higher frequency of explosive events near the active
regions, and in the C IV line. The distribution of the explosive events
characteristics is interpreted in the light of recent numerical simulations.
Finally, we point out several regions of the parameter space where the
reconnection model has to be refined in order to explain the observations.Comment: Accepted for publication in Astronomy and Astrophysics (in Section 9.
The Sun) on 18/01/2011. 17 pages, 22 Figure
- …