262 research outputs found
Cooperative Coevolution for Non-Separable Large-Scale Black-Box Optimization: Convergence Analyses and Distributed Accelerations
Given the ubiquity of non-separable optimization problems in real worlds, in
this paper we analyze and extend the large-scale version of the well-known
cooperative coevolution (CC), a divide-and-conquer optimization framework, on
non-separable functions. First, we reveal empirical reasons of why
decomposition-based methods are preferred or not in practice on some
non-separable large-scale problems, which have not been clearly pointed out in
many previous CC papers. Then, we formalize CC to a continuous game model via
simplification, but without losing its essential property. Different from
previous evolutionary game theory for CC, our new model provides a much simpler
but useful viewpoint to analyze its convergence, since only the pure Nash
equilibrium concept is needed and more general fitness landscapes can be
explicitly considered. Based on convergence analyses, we propose a hierarchical
decomposition strategy for better generalization, as for any decomposition
there is a risk of getting trapped into a suboptimal Nash equilibrium. Finally,
we use powerful distributed computing to accelerate it under the multi-level
learning framework, which combines the fine-tuning ability from decomposition
with the invariance property of CMA-ES. Experiments on a set of
high-dimensional functions validate both its search performance and scalability
(w.r.t. CPU cores) on a clustering computing platform with 400 CPU cores
Development of a R package to facilitate the learning of clustering techniques
This project explores the development of a tool, in the form of a R package, to ease the process of
learning clustering techniques, how they work and what their pros and cons are. This tool should provide
implementations for several different clustering techniques with explanations in order to allow the student
to get familiar with the characteristics of each algorithm by testing them against several different datasets
while deepening their understanding of them through the explanations. Additionally, these explanations
should adapt to the input data, making the tool not only adept for self-regulated learning but for teaching
too.Grado en IngenierĂa InformĂĄtic
Advances in Evolutionary Algorithms
With the recent trends towards massive data sets and significant computational power, combined with evolutionary algorithmic advances evolutionary computation is becoming much more relevant to practice. Aim of the book is to present recent improvements, innovative ideas and concepts in a part of a huge EA field
A probabilistic approach to emission-line galaxy classification
We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional
emission-line classification schemes of galaxy ionization sources: the
Baldwin-Phillips-Terlevich (BPT) and vs. [NII]/H
(WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey
Data Release 7 and SEAGal/STARLIGHT datasets. We apply a GMM to empirically
define classes of galaxies in a three-dimensional space spanned by the
[OIII]/H, [NII]/H, and EW(H), optical
parameters. The best-fit GMM based on several statistical criteria suggests a
solution around four Gaussian components (GCs), which are capable to explain up
to 97 per cent of the data variance. Using elements of information theory, we
compare each GC to their respective astronomical counterpart. GC1 and GC4 are
associated with star-forming galaxies, suggesting the need to define a new
starburst subgroup. GC2 is associated with BPT's Active Galaxy Nuclei (AGN)
class and WHAN's weak AGN class. GC3 is associated with BPT's composite class
and WHAN's strong AGN class. Conversely, there is no statistical evidence --
based on four GCs -- for the existence of a Seyfert/LINER dichotomy in our
sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The
GC5 appears associated to the LINER and Passive galaxies on the BPT and WHAN
diagrams respectively. Subtleties aside, we demonstrate the potential of our
methodology to recover/unravel different objects inside the wilderness of
astronomical datasets, without lacking the ability to convey physically
interpretable results. The probabilistic classifications from the GMM analysis
are publicly available within the COINtoolbox
(https://cointoolbox.github.io/GMM\_Catalogue/).Comment: Accepted for publication in MNRA
Modeling and Recognition of Smart Grid Faults by a Combined Approach of Dissimilarity Learning and One-Class Classification
Detecting faults in electrical power grids is of paramount importance, either
from the electricity operator and consumer viewpoints. Modern electric power
grids (smart grids) are equipped with smart sensors that allow to gather
real-time information regarding the physical status of all the component
elements belonging to the whole infrastructure (e.g., cables and related
insulation, transformers, breakers and so on). In real-world smart grid
systems, usually, additional information that are related to the operational
status of the grid itself are collected such as meteorological information.
Designing a suitable recognition (discrimination) model of faults in a
real-world smart grid system is hence a challenging task. This follows from the
heterogeneity of the information that actually determine a typical fault
condition. The second point is that, for synthesizing a recognition model, in
practice only the conditions of observed faults are usually meaningful.
Therefore, a suitable recognition model should be synthesized by making use of
the observed fault conditions only. In this paper, we deal with the problem of
modeling and recognizing faults in a real-world smart grid system, which
supplies the entire city of Rome, Italy. Recognition of faults is addressed by
following a combined approach of multiple dissimilarity measures customization
and one-class classification techniques. We provide here an in-depth study
related to the available data and to the models synthesized by the proposed
one-class classifier. We offer also a comprehensive analysis of the fault
recognition results by exploiting a fuzzy set based reliability decision rule
Multispecies Coevolution Particle Swarm Optimization Based on Previous Search History
A hybrid coevolution particle swarm optimization algorithm with dynamic multispecies strategy based on K-means clustering and nonrevisit strategy based on Binary Space Partitioning fitness tree (called MCPSO-PSH) is proposed. Previous search history memorized into the Binary Space Partitioning fitness tree can effectively restrain the individualsâ revisit phenomenon. The whole population is partitioned into several subspecies and cooperative coevolution is realized by an information communication mechanism between subspecies, which can enhance the global search ability of particles and avoid premature convergence to local optimum. To demonstrate the power of the method, comparisons between the proposed algorithm and state-of-the-art algorithms are grouped into two categories: 10 basic benchmark functions (10-dimensional and 30-dimensional), 10 CEC2005 benchmark functions (30-dimensional), and a real-world problem (multilevel image segmentation problems). Experimental results show that MCPSO-PSH displays a competitive performance compared to the other swarm-based or evolutionary algorithms in terms of solution accuracy and statistical tests
UNSUPERVISED LEARNING IN PHYLOGENOMIC ANALYSIS OVER THE SPACE OF PHYLOGENETIC TREES
A phylogenetic tree is a tree to represent an evolutionary history between species or other entities. Phylogenomics is a new field intersecting phylogenetics and genomics and it is well-known that we need statistical learning methods to handle and analyze a large amount of data which can be generated relatively cheaply with new technologies. Based on the existing Markov models, we introduce a new method, CURatio, to identify outliers in a given gene data set. This method, intrinsically an unsupervised method, can find outliers from thousands or even more genes. This ability to analyze large amounts of genes (even with missing information) makes it unique in many parametric methods. At the same time, the exploration of statistical analysis in high-dimensional space of phylogenetic trees has never stopped, many tree metrics are proposed to statistical methodology. Tropical metric is one of them. We implement a MCMC sampling method to estimate the principal components in a tree space with the tropical metric for achieving dimension reduction and visualizing the result in a 2-D tropical triangle
STATISTICS IN THE BILLERA-HOLMES-VOGTMANN TREESPACE
This dissertation is an effort to adapt two classical non-parametric statistical techniques, kernel density estimation (KDE) and principal components analysis (PCA), to the Billera-Holmes-Vogtmann (BHV) metric space for phylogenetic trees. This adaption gives a more general framework for developing and testing various hypotheses about apparent differences or similarities between sets of phylogenetic trees than currently exists.
For example, while the majority of gene histories found in a clade of organisms are expected to be generated by a common evolutionary process, numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from the histories of the majority of genes. Such âoutlyingâ gene trees are considered to be biologically interesting and identifying these genes has become an important problem in phylogenetics.
The R sofware package kdetrees, developed in Chapter 2, contains an implementation of the kernel density estimation method. The primary theoretical difficulty involved in this adaptation concerns the normalizion of the kernel functions in the BHV metric space. This problem is addressed in Chapter 3. In both chapters, the software package is applied to both simulated and empirical datasets to demonstrate the properties of the method.
A few first theoretical steps in adaption of principal components analysis to the BHV space are presented in Chapter 4. It becomes necessary to generalize the notion of a set of perpendicular vectors in Euclidean space to the BHV metric space, but there some ambiguity about how to best proceed. We show that convex hulls are one reasonable approach to the problem. The Nye-PCA- algorithm provides a method of projecting onto arbitrary convex hulls in BHV space, providing the core of a modified PCA-type method
- âŠ