390 research outputs found
REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit
The R-package REPPlab is designed to explore multivariate data sets using one-dimensional unsupervised projection pursuit. It is useful as a preprocessing step to find clusters or as an outlier detection tool for multivariate data. Except from the packages tourr and rggobi, there is no implementation of exploratory projection pursuit tools available in R. REPPlab is an R interface for the Java program EPP-lab that implements four projection indices and three biologically inspired optimization algorithms. It also proposes new tools for plotting and combining the results and specific tools for outlier detection. The functionality of the package is illustrated through some simulations and using some real data
Non-convex Optimization for Machine Learning
A vast majority of machine learning algorithms train their models and perform
inference by solving optimization problems. In order to capture the learning
and prediction problems accurately, structural constraints such as sparsity or
low rank are frequently imposed or else the objective itself is designed to be
a non-convex function. This is especially true of algorithms that operate in
high-dimensional spaces or that train non-linear models such as tensor models
and deep networks.
The freedom to express the learning problem as a non-convex optimization
problem gives immense modeling power to the algorithm designer, but often such
problems are NP-hard to solve. A popular workaround to this has been to relax
non-convex problems to convex ones and use traditional methods to solve the
(convex) relaxed optimization problems. However this approach may be lossy and
nevertheless presents significant challenges for large scale optimization.
On the other hand, direct approaches to non-convex optimization have met with
resounding success in several domains and remain the methods of choice for the
practitioner, as they frequently outperform relaxation-based techniques -
popular heuristics include projected gradient descent and alternating
minimization. However, these are often poorly understood in terms of their
convergence and other properties.
This monograph presents a selection of recent advances that bridge a
long-standing gap in our understanding of these heuristics. The monograph will
lead the reader through several widely used non-convex optimization techniques,
as well as applications thereof. The goal of this monograph is to both,
introduce the rich literature in this area, as well as equip the reader with
the tools and techniques needed to analyze these simple procedures for
non-convex problems.Comment: The official publication is available from now publishers via
http://dx.doi.org/10.1561/220000005
Doctor of Philosophy
dissertationWith the ever-increasing amount of available computing resources and sensing devices, a wide variety of high-dimensional datasets are being produced in numerous fields. The complexity and increasing popularity of these data have led to new challenges and opportunities in visualization. Since most display devices are limited to communication through two-dimensional (2D) images, many visualization methods rely on 2D projections to express high-dimensional information. Such a reduction of dimension leads to an explosion in the number of 2D representations required to visualize high-dimensional spaces, each giving a glimpse of the high-dimensional information. As a result, one of the most important challenges in visualizing high-dimensional datasets is the automatic filtration and summarization of the large exploration space consisting of all 2D projections. In this dissertation, a new type of algorithm is introduced to reduce the exploration space that identifies a small set of projections that capture the intrinsic structure of high-dimensional data. In addition, a general framework for summarizing the structure of quality measures in the space of all linear 2D projections is presented. However, identifying the representative or informative projections is only part of the challenge. Due to the high-dimensional nature of these datasets, obtaining insights and arriving at conclusions based solely on 2D representations are limited and prone to error. How to interpret the inaccuracies and resolve the ambiguity in the 2D projections is the other half of the puzzle. This dissertation introduces projection distortion error measures and interactive manipulation schemes that allow the understanding of high-dimensional structures via data manipulation in 2D projections
Quantification of uncertainty of geometallurgical variables for mine planning optimisation
Interest in geometallurgy has increased significantly over the past 15 years or
so because of the benefits it brings to mine planning and operation. Its use
and integration into design, planning and operation is becoming increasingly
critical especially in the context of declining ore grades and increasing mining
and processing costs.
This thesis, comprising four papers, offers methodologies and methods to
quantify geometallurgical uncertainty and enrich the block model with geometallurgical
variables, which contribute to improved optimisation of mining
operations. This enhanced block model is termed a geometallurgical block
model.
Bootstrapped non-linear regression models by projection pursuit were built
to predict grindability indices and recovery, and quantify model uncertainty.
These models are useful for populating the geometallurgical block model with
response attributes. New multi-objective optimisation formulations for block
caving mining were formulated and solved by a meta-heuristics solver focussing
on maximising the project revenue and, at the same time, minimising
several risk measures. A novel clustering method, which is able to use
both continuous and categorical attributes and incorporate expert knowledge,
was also developed for geometallurgical domaining which characterises the
deposit according to its metallurgical response. The concept of geometallurgical
dilution was formulated and used for optimising production scheduling in
an open-pit case study.Thesis (Ph.D.) (Research by Publication) -- University of Adelaide, School of Civil, Environmental and Mining Engineering, 201
An Interactive Visualisation System for Engineering Design using Evolutionary Computing
This thesis describes a system designed to promote collaboration between the human and computer
during engineering design tasks. Evolutionary algorithms (in particular the genetic algorithm) can
find good solutions to engineering design problems in a small number of iterations, but a review of
the interactive evolutionary computing literature reveals that users would benefit from
understanding the design space and having the freedom to direct the search. The main objective of
this research is to fulfil a dual requirement: the computer should generate data and analyse the
design space to identify high performing regions in terms of the quality and robustness of solutions,
while at the same time the user should be allowed to interact with the data and use their experience
and the information provided to guide the search inside and outside regions already found.
To achieve these goals a flexible user interface was developed that links and clarifies the
research fields of evolutionary computing, interactive engineering design and multivariate
visualisation. A number of accessible visualisation techniques were incorporated into the system.
An innovative algorithm based on univariate kernel density estimation is introduced that quickly
identifies the relevant clusters in the data from the point of view of the original design variables or
a natural coordinate system such as the principal or independent components. The robustness of
solutions inside a region can be investigated by novel use of 'negative' genetic algorithm search to
find the worst case scenario. New high performance regions can be discovered in further runs of
the evolutionary algorithm; penalty functions are used to avoid previously found regions. The
clustering procedure was also successfully applied to multiobjective problems and used to force the
genetic algorithm to find desired solutions in the trade-off between objectives.
The system was evaluated by a small number of users who were asked to solve simulated
engineering design scenarios by finding and comparing robust regions in artificial test functions.
Empirical comparison with benchmark algorithms was inconclusive but it was shown that even a
devoted hybrid algorithm needs help to solve a design task. A critical analysis of the feedback and
results suggested modifications to the clustering algorithm and a more practical way to evaluate the
robustness of solutions. The system was also shown to experienced engineers working on their real
world problems, new solutions were found in pertinent regions of objective space; links to the
artefact aided comparison of results. It was confirmed that in practice a lot of design knowledge is
encoded into design problems but experienced engineers use subjective knowledge of the problem
to make decisions and evaluate the robustness of solutions. So the full potential of the system was
seen in its ability to support decision making by supplying a diverse range of alternative design
options, thereby enabling knowledge discovery in a wide-ranging number of applications
PPCI: an R Package for Cluster Identification using Projection Pursuit
This paper presents the R package PPCI which implements three recently proposed projection pursuit methods for clustering. The methods are unified by the approach of defining an optimal hyperplane to separate clusters, and deriving a projection index whose optimiser is the vector normal to this separating hyperplane. Divisive hierarchical clustering algorithms that can detect clusters defined in different subspaces are readily obtained by recursively bi-partitioning the data through such hyperplanes. Projecting onto the vector normal to the optimal hyperplane enables visualisations of the data that can be used to validate the partition at each level of the cluster hierarchy. PPCI also provides a simplified framework in which the clustering models can be modified in an interactive manner. Extensions to problems involving clusters which are not linearly separable, and to the problem of finding maximum hard margin hyperplanes for clustering are also discussed
- …