114 research outputs found

    How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling

    Get PDF
    Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon

    Towards explainable metaheuristics: PCA for trajectory mining in evolutionary algorithms.

    Get PDF
    The generation of explanations regarding decisions made by population-based meta-heuristics is often a difficult task due to the nature of the mechanisms employed by these approaches. With the increase in use of these methods for optimisation in industries that require end-user confirmation, the need for explanations has also grown. We present a novel approach to the extraction of features capable of supporting an explanation through the use of trajectory mining - extracting key features from the populations of NDAs. We apply Principal Components Analysis techniques to identify new methods of population diversity tracking post-runtime after projection into a lower dimensional space. These methods are applied to a set of benchmark problems solved by a Genetic Algorithm and a Univariate Estimation of Distribution Algorithm. We show that the new sub-space derived metrics can capture key learning steps in the algorithm run and how solution variable patterns that explain the fitness function may be captured in the principal component coefficients

    MEDVIR: 3D visual interface applied to gene profile analisys.

    Get PDF
    The origins for this work arise in response to the increasing need for biologists and doctors to obtain tools for visual analysis of data. When dealing with multidimensional data, such as medical data, the traditional data mining techniques can be a tedious and complex task, even to some medical experts. Therefore, it is necessary to develop useful visualization techniques that can complement the expert’s criterion, and at the same time visually stimulate and make easier the process of obtaining knowledge from a dataset. Thus, the process of interpretation and understanding of the data can be greatly enriched. Multidimensionality is inherent to any medical data, requiring a time-consuming effort to get a clinical useful outcome. Unfortunately, both clinicians and biologists are not trained in managing more than four dimensions. Specifically, we were aimed to design a 3D visual interface for gene profile analysis easy in order to be used both by medical and biologist experts. In this way, a new analysis method is proposed: MedVir. This is a simple and intuitive analysis mechanism based on the visualization of any multidimensional medical data in a three dimensional space that allows interaction with experts in order to collaborate and enrich this representation. In other words, MedVir makes a powerful reduction in data dimensionality in order to represent the original information into a three dimensional environment. The experts can interact with the data and draw conclusions in a visual and quickly way

    A methodology to compare dimensionality reduction algorithms in terms of loss of quality

    Get PDF
    Dimensionality Reduction (DR) is attracting more attention these days as a result of the increasing need to handle huge amounts of data effectively. DR methods allow the number of initial features to be reduced considerably until a set of them is found that allows the original properties of the data to be kept. However, their use entails an inherent loss of quality that is likely to affect the understanding of the data, in terms of data analysis. This loss of quality could be determinant when selecting a DR method, because of the nature of each method. In this paper, we propose a methodology that allows different DR methods to be analyzed and compared as regards the loss of quality produced by them. This methodology makes use of the concept of preservation of geometry (quality assessment criteria) to assess the loss of quality. Experiments have been carried out by using the most well-known DR algorithms and quality assessment criteria, based on the literature. These experiments have been applied on 12 real-world datasets. Results obtained so far show that it is possible to establish a method to select the most appropriate DR method, in terms of minimum loss of quality. Experiments have also highlighted some interesting relationships between the quality assessment criteria. Finally, the methodology allows the appropriate choice of dimensionality for reducing data to be established, whilst giving rise to a minimum loss of quality

    An Interactive Visualisation System for Engineering Design using Evolutionary Computing

    Get PDF
    This thesis describes a system designed to promote collaboration between the human and computer during engineering design tasks. Evolutionary algorithms (in particular the genetic algorithm) can find good solutions to engineering design problems in a small number of iterations, but a review of the interactive evolutionary computing literature reveals that users would benefit from understanding the design space and having the freedom to direct the search. The main objective of this research is to fulfil a dual requirement: the computer should generate data and analyse the design space to identify high performing regions in terms of the quality and robustness of solutions, while at the same time the user should be allowed to interact with the data and use their experience and the information provided to guide the search inside and outside regions already found. To achieve these goals a flexible user interface was developed that links and clarifies the research fields of evolutionary computing, interactive engineering design and multivariate visualisation. A number of accessible visualisation techniques were incorporated into the system. An innovative algorithm based on univariate kernel density estimation is introduced that quickly identifies the relevant clusters in the data from the point of view of the original design variables or a natural coordinate system such as the principal or independent components. The robustness of solutions inside a region can be investigated by novel use of 'negative' genetic algorithm search to find the worst case scenario. New high performance regions can be discovered in further runs of the evolutionary algorithm; penalty functions are used to avoid previously found regions. The clustering procedure was also successfully applied to multiobjective problems and used to force the genetic algorithm to find desired solutions in the trade-off between objectives. The system was evaluated by a small number of users who were asked to solve simulated engineering design scenarios by finding and comparing robust regions in artificial test functions. Empirical comparison with benchmark algorithms was inconclusive but it was shown that even a devoted hybrid algorithm needs help to solve a design task. A critical analysis of the feedback and results suggested modifications to the clustering algorithm and a more practical way to evaluate the robustness of solutions. The system was also shown to experienced engineers working on their real world problems, new solutions were found in pertinent regions of objective space; links to the artefact aided comparison of results. It was confirmed that in practice a lot of design knowledge is encoded into design problems but experienced engineers use subjective knowledge of the problem to make decisions and evaluate the robustness of solutions. So the full potential of the system was seen in its ability to support decision making by supplying a diverse range of alternative design options, thereby enabling knowledge discovery in a wide-ranging number of applications

    Single- and Multi-Distribution Dimensionality Reduction Approaches for a Better Data Structure Capturing

    Get PDF
    In recent years, the huge expansion of digital technologies has vastly increased the volume of data to be explored, such that reducing the dimensionality of data is an essential step in data exploration. The integrity of a dimensionality reduction technique relates to the goodness of maintaining the data structure. Dimensionality reduction techniques such as Principal Component Analyses (PCA) and Multidimensional Scaling (MDS) globally preserve the distance ranking at the expense of neglecting small-distance preservation. Conversely, the structure capturing of some other methods such as Isomap, Locally Linear Embedding (LLE), Laplacian Eigenmaps t-Stochastic Neighbour Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and TriMap rely on the number of neighbours considered. This paper presents a dimensionality reduction technique, Same Degree Distribution (SDD) that does not rely on the number of neighbours, thanks to using degree-distributions in both high and low dimensional spaces. Degree-distribution is similar to Student-t distribution and is less expensive than Gaussian distribution. As such, it enables better global data preservation in less computational time. Moreover, to improve the data structure capturing, SDD has been extended to Multi-SDDs (MSDD), which employs various degree distributions on top of SDD. The proposed approach and its extension demonstrated a greater performance compared with eight other benchmark methods, tested in several popular synthetics and real datasets such as Iris, Breast Cancer, Swiss Roll, MNIST, and Make Blob evaluated by the co-ranking matrix and Kendall’s Tau coefficient. For further work, we aim to approximate the number of distributions and their degrees in relation to the given dataset. Reducing the computational complexity is another objective for further work

    Projection-Based Clustering through Self-Organization and Swarm Intelligence

    Get PDF
    It covers aspects of unsupervised machine learning used for knowledge discovery in data science and introduces a data-driven approach to cluster analysis, the Databionic swarm (DBS). DBS consists of the 3D landscape visualization and clustering of data. The 3D landscape enables 3D printing of high-dimensional data structures. The clustering and number of clusters or an absence of cluster structure are verified by the 3D landscape at a glance. DBS is the first swarm-based technique that shows emergent properties while exploiting concepts of swarm intelligence, self-organization and the Nash equilibrium concept from game theory. It results in the elimination of a global objective function and the setting of parameters. By downloading the R package DBS can be applied to data drawn from diverse research fields and used even by non-professionals in the field of data mining

    A Visualization Technique for Accessing Solution Pool in Interactive Methods of Multiobjective Optimization

    Get PDF
    Interactive methods of multiobjective optimization repetitively derive Pareto optimal solutions based on decision maker's preference information and present the obtained solutions for his/her consideration. Some interactive methods save the obtained solutions into a solution pool and, at each iteration, allow the decision maker considering any of solutions obtained earlier. This feature contributes to the flexibility of exploring the Pareto optimal set and learning about the optimization problem. However, in the case of many objective functions, the accumulation of derived solutions makes accessing the solution pool cognitively difficult for the decision maker. We propose to enhance interactive methods with visualization of the set of solution outcomes using dimensionality reduction and interactive mechanisms for exploration of the solution pool. We describe a proposed visualization technique and demonstrate its usage with an example problem solved using the interactive method NIMBUS
    corecore