390 research outputs found

    REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit

    Get PDF
    The R-package REPPlab is designed to explore multivariate data sets using one-dimensional unsupervised projection pursuit. It is useful as a preprocessing step to find clusters or as an outlier detection tool for multivariate data. Except from the packages tourr and rggobi, there is no implementation of exploratory projection pursuit tools available in R. REPPlab is an R interface for the Java program EPP-lab that implements four projection indices and three biologically inspired optimization algorithms. It also proposes new tools for plotting and combining the results and specific tools for outlier detection. The functionality of the package is illustrated through some simulations and using some real data

    Non-convex Optimization for Machine Learning

    Full text link
    A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization. On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques - popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties. This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. The monograph will lead the reader through several widely used non-convex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems.Comment: The official publication is available from now publishers via http://dx.doi.org/10.1561/220000005

    Doctor of Philosophy

    Get PDF
    dissertationWith the ever-increasing amount of available computing resources and sensing devices, a wide variety of high-dimensional datasets are being produced in numerous fields. The complexity and increasing popularity of these data have led to new challenges and opportunities in visualization. Since most display devices are limited to communication through two-dimensional (2D) images, many visualization methods rely on 2D projections to express high-dimensional information. Such a reduction of dimension leads to an explosion in the number of 2D representations required to visualize high-dimensional spaces, each giving a glimpse of the high-dimensional information. As a result, one of the most important challenges in visualizing high-dimensional datasets is the automatic filtration and summarization of the large exploration space consisting of all 2D projections. In this dissertation, a new type of algorithm is introduced to reduce the exploration space that identifies a small set of projections that capture the intrinsic structure of high-dimensional data. In addition, a general framework for summarizing the structure of quality measures in the space of all linear 2D projections is presented. However, identifying the representative or informative projections is only part of the challenge. Due to the high-dimensional nature of these datasets, obtaining insights and arriving at conclusions based solely on 2D representations are limited and prone to error. How to interpret the inaccuracies and resolve the ambiguity in the 2D projections is the other half of the puzzle. This dissertation introduces projection distortion error measures and interactive manipulation schemes that allow the understanding of high-dimensional structures via data manipulation in 2D projections

    Quantification of uncertainty of geometallurgical variables for mine planning optimisation

    Get PDF
    Interest in geometallurgy has increased significantly over the past 15 years or so because of the benefits it brings to mine planning and operation. Its use and integration into design, planning and operation is becoming increasingly critical especially in the context of declining ore grades and increasing mining and processing costs. This thesis, comprising four papers, offers methodologies and methods to quantify geometallurgical uncertainty and enrich the block model with geometallurgical variables, which contribute to improved optimisation of mining operations. This enhanced block model is termed a geometallurgical block model. Bootstrapped non-linear regression models by projection pursuit were built to predict grindability indices and recovery, and quantify model uncertainty. These models are useful for populating the geometallurgical block model with response attributes. New multi-objective optimisation formulations for block caving mining were formulated and solved by a meta-heuristics solver focussing on maximising the project revenue and, at the same time, minimising several risk measures. A novel clustering method, which is able to use both continuous and categorical attributes and incorporate expert knowledge, was also developed for geometallurgical domaining which characterises the deposit according to its metallurgical response. The concept of geometallurgical dilution was formulated and used for optimising production scheduling in an open-pit case study.Thesis (Ph.D.) (Research by Publication) -- University of Adelaide, School of Civil, Environmental and Mining Engineering, 201

    An Interactive Visualisation System for Engineering Design using Evolutionary Computing

    Get PDF
    This thesis describes a system designed to promote collaboration between the human and computer during engineering design tasks. Evolutionary algorithms (in particular the genetic algorithm) can find good solutions to engineering design problems in a small number of iterations, but a review of the interactive evolutionary computing literature reveals that users would benefit from understanding the design space and having the freedom to direct the search. The main objective of this research is to fulfil a dual requirement: the computer should generate data and analyse the design space to identify high performing regions in terms of the quality and robustness of solutions, while at the same time the user should be allowed to interact with the data and use their experience and the information provided to guide the search inside and outside regions already found. To achieve these goals a flexible user interface was developed that links and clarifies the research fields of evolutionary computing, interactive engineering design and multivariate visualisation. A number of accessible visualisation techniques were incorporated into the system. An innovative algorithm based on univariate kernel density estimation is introduced that quickly identifies the relevant clusters in the data from the point of view of the original design variables or a natural coordinate system such as the principal or independent components. The robustness of solutions inside a region can be investigated by novel use of 'negative' genetic algorithm search to find the worst case scenario. New high performance regions can be discovered in further runs of the evolutionary algorithm; penalty functions are used to avoid previously found regions. The clustering procedure was also successfully applied to multiobjective problems and used to force the genetic algorithm to find desired solutions in the trade-off between objectives. The system was evaluated by a small number of users who were asked to solve simulated engineering design scenarios by finding and comparing robust regions in artificial test functions. Empirical comparison with benchmark algorithms was inconclusive but it was shown that even a devoted hybrid algorithm needs help to solve a design task. A critical analysis of the feedback and results suggested modifications to the clustering algorithm and a more practical way to evaluate the robustness of solutions. The system was also shown to experienced engineers working on their real world problems, new solutions were found in pertinent regions of objective space; links to the artefact aided comparison of results. It was confirmed that in practice a lot of design knowledge is encoded into design problems but experienced engineers use subjective knowledge of the problem to make decisions and evaluate the robustness of solutions. So the full potential of the system was seen in its ability to support decision making by supplying a diverse range of alternative design options, thereby enabling knowledge discovery in a wide-ranging number of applications

    Practical tools for exploring data and models

    Get PDF

    PPCI: an R Package for Cluster Identification using Projection Pursuit

    Get PDF
    This paper presents the R package PPCI which implements three recently proposed projection pursuit methods for clustering. The methods are unified by the approach of defining an optimal hyperplane to separate clusters, and deriving a projection index whose optimiser is the vector normal to this separating hyperplane. Divisive hierarchical clustering algorithms that can detect clusters defined in different subspaces are readily obtained by recursively bi-partitioning the data through such hyperplanes. Projecting onto the vector normal to the optimal hyperplane enables visualisations of the data that can be used to validate the partition at each level of the cluster hierarchy. PPCI also provides a simplified framework in which the clustering models can be modified in an interactive manner. Extensions to problems involving clusters which are not linearly separable, and to the problem of finding maximum hard margin hyperplanes for clustering are also discussed
    corecore