8 research outputs found

    Data-Driven Geometric Design Space Exploration and Design Synthesis

    Get PDF
    A design space is the space of all potential design candidates. While the design space can be of any kind, this work focuses on exploring geometric design spaces, where geometric parameters are used to represent designs and will largely affect a given design's functionality or performance (e.g., airfoil, hull, and car body designs). By exploring the design space, we evaluate different design choices and look for desired solutions. However, a design space may have unnecessarily high dimensionality and implicit boundaries, which makes it difficult to explore. Also, if we synthesize new designs by randomly sampling design variables in the high-dimensional design space, there is high chance that the designs are not feasible, as there is correlation between feasible design variables. This dissertation introduces ways of capturing a compact representation (which we call a latent space) that describes the variability of designs, so that we can synthesize designs and explore design options using this compact representation instead of the original high-dimensional design variables. The main research question answered by this dissertation is: how does one effectively learn this compact representation from data and efficiently explore this latent space so that we can quickly find desired design solutions? The word "quickly" here means to eliminate or reduce the iterative ideation, prototyping, and evaluation steps in a conventional design process. This also reduces human intervention, and hence facilitates design automation. This work bridges the gap between machine learning and geometric design in engineering. It contributes new pieces of knowledge within two topics: design space exploration and design synthesis. Specifically, the main contributions are: 1. A method for measuring the intrinsic complexity of a design space based on design data manifolds. 2. Machine learning models that incorporate prior knowledge from the domain of design to improve latent space exploration and design synthesis quality. 3. New design space exploration tools that expand the design space and search for desired designs in an unbounded space. 4. Geometrical design space benchmarks with controllable complexity for testing data-driven design space exploration and design synthesis

    Learning in high dimensions with projected linear discriminants

    Get PDF
    The enormous power of modern computers has made possible the statistical modelling of data with dimensionality that would have made this task inconceivable only decades ago. However, experience in such modelling has made researchers aware of many issues associated with working in high-dimensional domains, collectively known as `the curse of dimensionality', which can confound practitioners' desires to build good models of the world from these data. When the dimensionality is very large, low-dimensional methods and geometric intuition both break down in these high-dimensional spaces. To mitigate the dimensionality curse we can use low-dimensional representations of the original data that capture most of the information it contained. However, little is currently known about the effect of such dimensionality reduction on classifier performance. In this thesis we develop theory quantifying the effect of random projection - a recent, very promising, non-adaptive dimensionality reduction technique - on the classification performance of Fisher's Linear Discriminant (FLD), a successful and widely-used linear classifier. We tackle the issues associated with small sample size and high-dimensionality by using randomly projected FLD ensembles, and we develop theory explaining why our new approach performs well. Finally, we quantify the generalization error of Kernel FLD, a related non-linear projected classifier

    Kernel Feature Extraction Methods for Remote Sensing Data Analysis

    Get PDF
    Technological advances in the last decades have improved our capabilities of collecting and storing high data volumes. However, this makes that in some fields, such as remote sensing several problems are generated in the data processing due to the peculiar characteristics of their data. High data volume, high dimensionality, heterogeneity and their nonlinearity, make that the analysis and extraction of relevant information from these images could be a bottleneck for many real applications. The research applying image processing and machine learning techniques along with feature extraction, allows the reduction of the data dimensionality while keeps the maximum information. Therefore, developments and applications of feature extraction methodologies using these techniques have increased exponentially in remote sensing. This improves the data visualization and the knowledge discovery. Several feature extraction methods have been addressed in the literature depending on the data availability, which can be classified in supervised, semisupervised and unsupervised. In particular, feature extraction can use in combination with kernel methods (nonlinear). The process for obtaining a space that keeps greater information content is facilitated by this combination. One of the most important properties of the combination is that can be directly used for general tasks including classification, regression, clustering, ranking, compression, or data visualization. In this Thesis, we address the problems of different nonlinear feature extraction approaches based on kernel methods for remote sensing data analysis. Several improvements to the current feature extraction methods are proposed to transform the data in order to make high dimensional data tasks easier, such as classification or biophysical parameter estimation. This Thesis focus on three main objectives to reach these improvements in the current feature extraction methods: The first objective is to include invariances into supervised kernel feature extraction methods. Throughout these invariances it is possible to generate virtual samples that help to mitigate the problem of the reduced number of samples in supervised methods. The proposed algorithm is a simple method that essentially generates new (synthetic) training samples from available labeled samples. These samples along with original samples should be used in feature extraction methods obtaining more independent features between them that without virtual samples. The introduction of prior knowledge by means of the virtual samples could obtain classification and biophysical parameter estimation methods more robust than without them. The second objective is to use the generative kernels, i.e. probabilistic kernels, that directly learn by means of clustering techniques from original data by finding local-to-global similarities along the manifold. The proposed kernel is useful for general feature extraction purposes. Furthermore, the kernel attempts to improve the current methods because the kernel not only contains labeled data information but also uses the unlabeled information of the manifold. Moreover, the proposed kernel is parameter free in contrast with the parameterized functions such as, the radial basis function (RBF). Using probabilistic kernels is sought to obtain new unsupervised and semisupervised methods in order to reduce the number and cost of labeled data in remote sensing. Third objective is to develop new kernel feature extraction methods for improving the features obtained by the current methods. Optimizing the functional could obtain improvements in new algorithm. For instance, the Optimized Kernel Entropy Component Analysis (OKECA) method. The method is based on the Independent Component Analysis (ICA) framework resulting more efficient than the standard Kernel Entropy Component Analysis (KECA) method in terms of dimensionality reduction. In this Thesis, the methods are focused on remote sensing data analysis. Nevertheless, feature extraction methods are used to analyze data of several research fields whereas data are multidimensional. For these reasons, the results are illustrated into experimental sequence. First, the projections are analyzed by means of Toy examples. The algorithms are tested through standard databases with supervised information to proceed to the last step, the analysis of remote sensing images by the proposed methods

    Model-Based Problem Solving through Symbolic Regression via Pareto Genetic Programming.

    Get PDF
    Pareto genetic programming methodology is extended by additional generic model selection and generation strategies that (1) drive the modeling engine to creation of models of reduced non-linearity and increased generalization capabilities, and (2) improve the effectiveness of the search for robust models by goal softening and adaptive fitness evaluations. In addition to the new strategies for model development and model selection, this dissertation presents a new approach for analysis, ranking, and compression of given multi-dimensional input-response data for the purpose of balancing the information content of undesigned data sets.

    Applications

    Get PDF

    Machine Learning in Discrete Molecular Spaces

    Get PDF
    The past decade has seen an explosion of machine learning in chemistry. Whether it is in property prediction, synthesis, molecular design, or any other subdivision, machine learning seems poised to become an integral, if not a dominant, component of future research efforts. This extraordinary capacity rests on the interac- tion between machine learning models and the underlying chemical data landscape commonly referred to as chemical space. Chemical space has multiple incarnations, but is generally considered the space of all possible molecules. In this sense, it is one example of a molecular set: an arbitrary collection of molecules. This thesis is devoted to precisely these objects, and particularly how they interact with machine learning models. This work is predicated on the idea that by better understanding the relationship between molecular sets and the models trained on them we can improve models, achieve greater interpretability, and further break down the walls between data-driven and human-centric chemistry. The hope is that this enables the full predictive power of machine learning to be leveraged while continuing to build our understanding of chemistry. The first three chapters of this thesis introduce and reviews the necessary machine learning theory, particularly the tools that have been specially designed for chemical problems. This is followed by an extensive literature review in which the contributions of machine learning to multiple facets of chemistry over the last two decades are explored. Chapters 4-7 explore the research conducted throughout this PhD. Here we explore how we can meaningfully describe the properties of an arbitrary set of molecules through information theory; how we can determine the most informative data points in a set of molecules; how graph signal processing can be used to understand the relationship between the chosen molecular representation, the property, and the machine learning model; and finally how this approach can be brought to bear on protein space. Each of these sub-projects briefly explores the necessary mathematical theory before leveraging it to provide approaches that resolve the posed problems. We conclude with a summary of the contributions of this work and outline fruitful avenues for further exploration

    Model Order Reduction

    Get PDF
    An increasing complexity of models used to predict real-world systems leads to the need for algorithms to replace complex models with far simpler ones, while preserving the accuracy of the predictions. This three-volume handbook covers methods as well as applications. This third volume focuses on applications in engineering, biomedical engineering, computational physics and computer science
    corecore