    Constructive Approximation and Learning by Greedy Algorithms

    This thesis develops several kernel-based greedy algorithms for different machine learning problems and analyzes their theoretical and empirical properties. Greedy approaches have been extensively used in the past for tackling problems in combinatorial optimization where finding even a feasible solution can be a computationally hard problem (i.e., not solvable in polynomial time). A key feature of greedy algorithms is that a solution is constructed recursively from the smallest constituent parts. In each step of the constructive process a component is added to the partial solution from the previous step and, thus, the size of the optimization problem is reduced. The selected components are given by optimization problems that are simpler and easier to solve than the original problem. As such schemes are typically fast at constructing a solution they can be very effective on complex optimization problems where finding an optimal/good solution has a high computational cost. Moreover, greedy solutions are rather intuitive and the schemes themselves are simple to design and easy to implement. There is a large class of problems for which greedy schemes generate an optimal solution or a good approximation of the optimum. In the first part of the thesis, we develop two deterministic greedy algorithms for optimization problems in which a solution is given by a set of functions mapping an instance space to the space of reals. The first of the two approaches facilitates data understanding through interactive visualization by providing means for experts to incorporate their domain knowledge into otherwise static kernel principal component analysis. This is achieved by greedily constructing embedding directions that maximize the variance at data points (unexplained by the previously constructed embedding directions) while adhering to specified domain knowledge constraints. The second deterministic greedy approach is a supervised feature construction method capable of addressing the problem of kernel choice. The goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity — large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. The approach mimics functional gradient descent and constructs features by fitting squared error residuals. We show that the constructive process is consistent and provide conditions under which it converges to the optimal solution. In the second part of the thesis, we investigate two problems for which deterministic greedy schemes can fail to find an optimal solution or a good approximation of the optimum. This happens as a result of making a sequence of choices which take into account only the immediate reward without considering the consequences onto future decisions. To address this shortcoming of deterministic greedy schemes, we propose two efficient randomized greedy algorithms which are guaranteed to find effective solutions to the corresponding problems. In the first of the two approaches, we provide a mean to scale kernel methods to problems with millions of instances. An approach, frequently used in practice, for this type of problems is the Nyström method for low-rank approximation of kernel matrices. A crucial step in this method is the choice of landmarks which determine the quality of the approximation. We tackle this problem with a randomized greedy algorithm based on the K-means++ cluster seeding scheme and provide a theoretical and empirical study of its effectiveness. In the second problem for which a deterministic strategy can fail to find a good solution, the goal is to find a set of objects from a structured space that are likely to exhibit an unknown target property. This discrete optimization problem is of significant interest to cyclic discovery processes such as de novo drug design. We propose to address it with an adaptive Metropolis–Hastings approach that samples candidates from the posterior distribution of structures conditioned on them having the target property. The proposed constructive scheme defines a consistent random process and our empirical evaluation demonstrates its effectiveness across several different application domains

    Mcmc- Based Optimization And Application

    In the thesis, we study the theory of Markov Chain Monte Carlo (MCMC) and its application in statistical optimization. The MCMC method is a class of evolutionary algorithms for generating samples from given probability distributions. In the thesis, we first focus on the methods of slice sampling and simulated annealing. While slice sampling has a merit to generate samples based on the underlying distribution with adjustable step size, simulated annealing can facilitate samples to jump out of local optima and converge quickly to the global optimum. With this MCMC method, we then solve two practical optimization problems. The first problem is image transmission over varying channels. Existing work in media transmission generally assumes that channel condition is stationary. However, communication channels are often varying with time in practice. Adaptive design needs frequent feedback for channel updates, which is often impractical due to the complexity and delay. In this application, we design an unequal error protection scheme for image transmission over noisy varying channels based on MCMC. First, the problem cost function is mapped into a multi-variable probability distribution. Then, with the “detailed balance , MCMC is used to generate samples from the mapped stationary distribution so that the optimal solution is the one that gives the lowest data distortion. We also show that the final rate allocation designed with this method works better than a conventional design that considers the mean value of the channel. In the second application, we consider a terminal-location-planning problem for intermodal transportation systems. With a given number of potential locations, it needs to find the most appropriate number of terminals and their locations to provide the economically most efficient operation when multiple service pairs exist simultaneously. The problem also has an inherent issue that for a particular planning, the optimal route paths must be determined for the co-existing service pairs. To solve this NP-hard problem, we design a MCMC-based two-layer method. The lower-layer is an optimal routing design for all service pairs given a particular planning that considers both efficiency and fairness. The upper-layer is finding the optimal planning based on MCMC with the stationary distribution that is mapped from the cost function. The effectiveness of this method is demonstrated through computer simulations and comparison with one state-of-the-art method. The work of this thesis has shown that a MCMC-method, consisting of both slice sampling and simulated annealing, can be successfully applied to solving practical optimization problems. Particularly, the method has advantages in dealing with high-dimensional problems with large searching spaces

    Stochastic Optimization Models for Perishable Products

    For many years, researchers have focused on developing optimization models to design and manage supply chains. These models have helped companies in different industries to minimize costs, maximize performance while balancing their social and environmental impacts. There is an increasing interest in developing models which optimize supply chain decisions of perishable products. This is mainly because many of the products we use today are perishable, managing their inventory is challenging due to their short shelf life, and out-dated products become waste. Therefore, these supply chain decisions impact profitability and sustainability of companies and the quality of the environment. Perishable products wastage is inevitable when demand is not known beforehand. A number of models in the literature use simulation and probabilistic models to capture supply chain uncertainties. However, when demand distribution cannot be described using standard distributions, probabilistic models are not effective. In this case, using stochastic optimization methods is preferred over obtaining approximate inventory management policies through simulation. This dissertation proposes models to help businesses and non-prot organizations make inventory replenishment, pricing and transportation decisions that improve the performance of their system. These models focus on perishable products which either deteriorate over time or have a fixed shelf life. The demand and/or supply for these products and/or, the remaining shelf life are stochastic. Stochastic optimization models, including a two-stage stochastic mixed integer linear program, a two-stage stochastic mixed integer non linear program, and a chance constraint program are proposed to capture uncertainties. The objective is to minimize the total replenishment costs which impact prots and service rate. These models are motivated by applications in the vaccine distribution supply chain, and other supply chains used to distribute perishable products. This dissertation also focuses on developing solution algorithms to solve the proposed optimization models. The computational complexity of these models motivated the development of extensions to standard models used to solve stochastic optimization problems. These algorithms use sample average approximation (SAA) to represent uncertainty. The algorithms proposed are extensions of the stochastic Benders decomposition algorithm, the L-shaped method (LS). These extensions use Gomory mixed integer cuts, mixed-integer rounding cuts, and piecewise linear relaxation of bilinear terms. These extensions lead to the development of linear approximations of the models developed. Computational results reveal that the solution approach presented here outperforms the standard LS method. Finally, this dissertation develops case studies using real-life data from the Demographic Health Surveys in Niger and Bangladesh to build predictive models to meet requirements for various childhood immunization vaccines. The results of this study provide support tools for policymakers to design vaccine distribution networks

    Calibrate, emulate, sample

    Many parameter estimation problems arising in applications can be cast in the framework of Bayesian inversion. This allows not only for an estimate of the parameters, but also for the quantification of uncertainties in the estimates. Often in such problems the parameter-to-data map is very expensive to evaluate, and computing derivatives of the map, or derivative-adjoints, may not be feasible. Additionally, in many applications only noisy evaluations of the map may be available. We propose an approach to Bayesian inversion in such settings that builds on the derivative-free optimization capabilities of ensemble Kalman inversion methods. The overarching approach is to first use ensemble Kalman sampling (EKS) to calibrate the unknown parameters to fit the data; second, to use the output of the EKS to emulate the parameter-to-data map; third, to sample from an approximate Bayesian posterior distribution in which the parameter-to-data map is replaced by its emulator. This results in a principled approach to approximate Bayesian inference that requires only a small number of evaluations of the (possibly noisy approximation of the) parameter-to-data map. It does not require derivatives of this map, but instead leverages the documented power of ensemble Kalman methods. Furthermore, the EKS has the desirable property that it evolves the parameter ensemble towards the regions in which the bulk of the parameter posterior mass is located, thereby locating them well for the emulation phase of the methodology. In essence, the EKS methodology provides a cheap solution to the design problem of where to place points in parameter space to efficiently train an emulator of the parameter-to-data map for the purposes of Bayesian inversion

    A hybrid, auto-adaptive, and rule-based multi-agent approach using evolutionary algorithms for improved searching

    Selecting the most appropriate heuristic for solving a specific problem is not easy, for many reasons. This article focuses on one of these reasons: traditionally, the solution search process has operated in a given manner regardless of the specific problem being solved, and the process has been the same regardless of the size, complexity and domain of the problem. To cope with this situation, search processes should mould the search into areas of the search space that are meaningful for the problem. This article builds on previous work in the development of a multi-agent paradigm using techniques derived from knowledge discovery (data-mining techniques) on databases of so-far visited solutions. The aim is to improve the search mechanisms, increase computational efficiency and use rules to enrich the formulation of optimization problems, while reducing the search space and catering to realistic problems.     Online Predictive Optimization Framework for Stochastic Demand-Responsive Transit Services

    This study develops an online predictive optimization framework for dynamically operating a transit service in an area of crowd movements. The proposed framework integrates demand prediction and supply optimization to periodically redesign the service routes based on recently observed demand. To predict demand for the service, we use Quantile Regression to estimate the marginal distribution of movement counts between each pair of serviced locations. The framework then combines these marginals into a joint demand distribution by constructing a Gaussian copula, which captures the structure of correlation between the marginals. For supply optimization, we devise a linear programming model, which simultaneously determines the route structure and the service frequency according to the predicted demand. Importantly, our framework both preserves the uncertainty structure of future demand and leverages this for robust route optimization, while keeping both components decoupled. We evaluate our framework using a real-world case study of autonomous mobility in a university campus in Denmark. The results show that our framework often obtains the ground truth optimal solution, and can outperform conventional methods for route optimization, which do not leverage full predictive distributions.Comment: 34 pages, 12 figures, 5 table
