827 research outputs found

    Maximin Designs for Computer Experiments.

    Get PDF
    Decision processes are nowadays often facilitated by simulation tools. In the field of engineering, for example, such tools are used to simulate the behavior of products and processes. Simulation runs, however, are often very time-consuming, and, hence, the number of simulation runs allowed is limited in practice. The problem then is to determine which simulation runs to perform such that the maximal amount of information about the product or process is obtained. This problem is addressed in the first part of the thesis. It is proposed to use so-called maximin Latin hypercube designs and many new results for this class of designs are obtained. In the second part, the case of multiple interrelated simulation tools is considered and a framework to deal with such tools is introduced. Important steps in this framework are the construction and the use of coordination methods and of nested designs in order to control the dependencies present between the various simulation tools

    Cross-validation based adaptive sampling for Gaussian process models

    Get PDF
    In many real-world applications, we are interested in approximating black-box, costly functions as accurately as possible with the smallest number of function evaluations. A complex computer code is an example of such a function. In this work, a Gaussian process (GP) emulator is used to approximate the output of complex computer code. We consider the problem of extending an initial experiment (set of model runs) sequentially to improve the emulator. A sequential sampling approach based on leave-one-out (LOO) cross-validation is proposed that can be easily extended to a batch mode. This is a desirable property since it saves the user time when parallel computing is available. After fitting a GP to training data points, the expected squared LOO (ES-LOO) error is calculated at each design point. ES-LOO is used as a measure to identify important data points. More precisely, when this quantity is large at a point it means that the quality of prediction depends a great deal on that point and adding more samples nearby could improve the accuracy of the GP. As a result, it is reasonable to select the next sample where ES-LOO is maximised. However, ES-LOO is only known at the experimental design and needs to be estimated at unobserved points. To do this, a second GP is fitted to the ES-LOO errors and where the maximum of the modified expected improvement (EI) criterion occurs is chosen as the next sample. EI is a popular acquisition function in Bayesian optimisation and is used to trade-off between local/global search. However, it has a tendency towards exploitation, meaning that its maximum is close to the (current) "best" sample. To avoid clustering, a modified version of EI, called pseudo expected improvement, is employed which is more explorative than EI yet allows us to discover unexplored regions. Our results show that the proposed sampling method is promising

    Multi-criteria Anomaly Detection using Pareto Depth Analysis

    Full text link
    We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be defined, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria.Comment: Removed an unnecessary line from Algorithm

    A Distance-preserving Matrix Sketch

    Full text link
    Visualizing very large matrices involves many formidable problems. Various popular solutions to these problems involve sampling, clustering, projection, or feature selection to reduce the size and complexity of the original task. An important aspect of these methods is how to preserve relative distances between points in the higher-dimensional space after reducing rows and columns to fit in a lower dimensional space. This aspect is important because conclusions based on faulty visual reasoning can be harmful. Judging dissimilar points as similar or similar points as dissimilar on the basis of a visualization can lead to false conclusions. To ameliorate this bias and to make visualizations of very large datasets feasible, we introduce two new algorithms that respectively select a subset of rows and columns of a rectangular matrix. This selection is designed to preserve relative distances as closely as possible. We compare our matrix sketch to more traditional alternatives on a variety of artificial and real datasets.Comment: 38 pages, 13 figure

    A Distributed Surrogate Methodology for Inverse Most Probable Point Searches in Reliability Based Design Optimization

    Get PDF
    Surrogate models are commonly used in place of prohibitively expensive computational models to drive iterative procedures necessary for engineering design and analysis such as global optimization. Additionally, surrogate modeling has been applied to reliability based design optimization which constrains designs to those which provide a satisfactory reliability against failure considering system parameter uncertainties. Through surrogate modeling the analysis time is significantly reduced when the total number of evaluated samples upon which the final model is built is less than the number which would have otherwise been required using the expensive model directly with the analysis algorithm. Too few samples will provide an inaccurate approximation while too many will add redundant information to an already sufficiently accurate region. With the prediction error having an impact on the overall uncertainty present in the optimal solution, care must be taken to only evaluate samples which decrease solution uncertainty rather than prediction uncertainty over the entire design domain. This work proposes a numerical approach to the surrogate based optimization and reliability assessment problem using solution confidence as the primary algorithm termination criterion. The surrogate uncertainty information provided is used to construct multiple distributed surrogates which represent individual realizations of a lager surrogate population designated by the initial approximation. When globally optimized upon, these distributed surrogates yield a solution distribution quantifying the confidence one can have in the optimal solution based on current surrogate uncertainty. Furthermore, the solution distribution provides insight for the placement of supplemental sample evaluations when solution confidence is insufficient. Numerical case studies are presented for comparison of the proposed methodology with existing methods for surrogate based optimization, such as expected improvement from the Efficient Global Optimization algorithm

    Automatic visual recognition using parallel machines

    Get PDF
    Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

    Lossy Compression of Climate Data Using Convolutional Autoencoders

    Get PDF

    Algorithmic issues in visual object recognition

    Get PDF
    This thesis is divided into two parts covering two aspects of research in the area of visual object recognition. Part I is about human detection in still images. Human detection is a challenging computer vision task due to the wide variability in human visual appearances and body poses. In this part, we present several enhancements to human detection algorithms. First, we present an extension to the integral images framework to allow for constant time computation of non-uniformly weighted summations over rectangular regions using a bundle of integral images. Such computational element is commonly used in constructing gradient-based feature descriptors, which are the most successful in shape-based human detection. Second, we introduce deformable features as an alternative to the conventional static features used in classifiers based on boosted ensembles. Deformable features can enhance the accuracy of human detection by adapting to pose changes that can be described as translations of body features. Third, we present a comprehensive evaluation framework for cascade-based human detectors. The presented framework facilitates comparison between cascade-based detection algorithms, provides a confidence measure for result, and deploys a practical evaluation scenario. Part II explores the possibilities of enhancing the speed of core algorithms used in visual object recognition using the computing capabilities of Graphics Processing Units (GPUs). First, we present an implementation of Graph Cut on GPUs, which achieves up to 4x speedup against compared to a CPU implementation. The Graph Cut algorithm has many applications related to visual object recognition such as segmentation and 3D point matching. Second, we present an efficient sparse approximation of kernel matrices for GPUs that can significantly speed up kernel based learning algorithms, which are widely used in object detection and recognition. We present an implementation of the Affinity Propagation clustering algorithm based on this representation, which is about 6 times faster than another GPU implementation based on a conventional sparse matrix representation

    A Framework for the Verification and Validation of Artificial Intelligence Machine Learning Systems

    Get PDF
    An effective verification and validation (V&V) process framework for the white-box and black-box testing of artificial intelligence (AI) machine learning (ML) systems is not readily available. This research uses grounded theory to develop a framework that leads to the most effective and informative white-box and black-box methods for the V&V of AI ML systems. Verification of the system ensures that the system adheres to the requirements and specifications developed and given by the major stakeholders, while validation confirms that the system properly performs with representative users in the intended environment and does not perform in an unexpected manner. Beginning with definitions, descriptions, and examples of ML processes and systems, the research results identify a clear and general process to effectively test these systems. The developed framework ensures the most productive and accurate testing results. Formerly, and occasionally still, the system definition and requirements exist in scattered documents that make it difficult to integrate, trace, and test through V&V. Modern system engineers along with system developers and stakeholders collaborate to produce a full system model using model-based systems engineering (MBSE). MBSE employs a Unified Modeling Language (UML) or System Modeling Language (SysML) representation of the system and its requirements that readily passes from each stakeholder for system information and additional input. The comprehensive and detailed MBSE model allows for direct traceability to the system requirements. xxiv To thoroughly test a ML system, one performs either white-box or black-box testing or both. Black-box testing is a testing method in which the internal model structure, design, and implementation of the system under test is unknown to the test engineer. Testers and analysts are simply looking at performance of the system given input and output. White-box testing is a testing method in which the internal model structure, design, and implementation of the system under test is known to the test engineer. When possible, test engineers and analysts perform both black-box and white-box testing. However, sometimes testers lack authorization to access the internal structure of the system. The researcher captures this decision in the ML framework. No two ML systems are exactly alike and therefore, the testing of each system must be custom to some degree. Even though there is customization, an effective process exists. This research includes some specialized methods, based on grounded theory, to use in the testing of the internal structure and performance. Through the study and organization of proven methods, this research develops an effective ML V&V framework. Systems engineers and analysts are able to simply apply the framework for various white-box and black-box V&V testing circumstances
    • …
    corecore