389,470 research outputs found
Constructive Approximation and Learning by Greedy Algorithms
This thesis develops several kernel-based greedy algorithms for different machine learning problems and analyzes their theoretical and empirical properties. Greedy approaches have been extensively used in the past for tackling problems in combinatorial optimization where finding even a feasible solution can be a computationally hard problem (i.e., not solvable in polynomial time). A key feature of greedy algorithms is that a solution is constructed recursively from the smallest constituent parts. In each step of the constructive process a component is added to the partial solution from the previous step and, thus, the size of the optimization problem is reduced. The selected components are given by optimization problems that are simpler and easier to solve than the original problem. As such schemes are typically fast at constructing a solution they can be very effective on complex optimization problems where finding an optimal/good solution has a high computational cost. Moreover, greedy solutions are rather intuitive and the schemes themselves are simple to design and easy to implement. There is a large class of problems for which greedy schemes generate an optimal solution or a good approximation of the optimum. In the first part of the thesis, we develop two deterministic greedy algorithms for optimization problems in which a solution is given by a set of functions mapping an instance space to the space of reals. The first of the two approaches facilitates data understanding through interactive visualization by providing means for experts to incorporate their domain knowledge into otherwise static kernel principal component analysis. This is achieved by greedily constructing embedding directions that maximize the variance at data points (unexplained by the previously constructed embedding directions) while adhering to specified domain knowledge constraints. The second deterministic greedy approach is a supervised feature construction method capable of addressing the problem of kernel choice. The goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity — large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. The approach mimics functional gradient descent and constructs features by fitting squared error residuals. We show that the constructive process is consistent and provide conditions under which it converges to the optimal solution. In the second part of the thesis, we investigate two problems for which deterministic greedy schemes can fail to find an optimal solution or a good approximation of the optimum. This happens as a result of making a sequence of choices which take into account only the immediate reward without considering the consequences onto future decisions. To address this shortcoming of deterministic greedy schemes, we propose two efficient randomized greedy algorithms which are guaranteed to find effective solutions to the corresponding problems. In the first of the two approaches, we provide a mean to scale kernel methods to problems with millions of instances. An approach, frequently used in practice, for this type of problems is the Nyström method for low-rank approximation of kernel matrices. A crucial step in this method is the choice of landmarks which determine the quality of the approximation. We tackle this problem with a randomized greedy algorithm based on the K-means++ cluster seeding scheme and provide a theoretical and empirical study of its effectiveness. In the second problem for which a deterministic strategy can fail to find a good solution, the goal is to find a set of objects from a structured space that are likely to exhibit an unknown target property. This discrete optimization problem is of significant interest to cyclic discovery processes such as de novo drug design. We propose to address it with an adaptive Metropolis–Hastings approach that samples candidates from the posterior distribution of structures conditioned on them having the target property. The proposed constructive scheme defines a consistent random process and our empirical evaluation demonstrates its effectiveness across several different application domains
Mcmc- Based Optimization And Application
In the thesis, we study the theory of Markov Chain Monte Carlo (MCMC) and its application in statistical optimization. The MCMC method is a class of evolutionary algorithms for generating samples from given probability distributions. In the thesis, we first focus on the methods of slice sampling and simulated annealing. While slice sampling has a merit to generate samples based on the underlying distribution with adjustable step size, simulated annealing can facilitate samples to jump out of local optima and converge quickly to the global optimum. With this MCMC method, we then solve two practical optimization problems. The first problem is image transmission over varying channels. Existing work in media transmission generally assumes that channel condition is stationary. However, communication channels are often varying with time in practice. Adaptive design needs frequent feedback for channel updates, which is often impractical due to the complexity and delay. In this application, we design an unequal error protection scheme for image transmission over noisy varying channels based on MCMC. First, the problem cost function is mapped into a multi-variable probability distribution. Then, with the “detailed balance , MCMC is used to generate samples from the mapped stationary distribution so that the optimal solution is the one that gives the lowest data distortion. We also show that the final rate allocation designed with this method works better than a conventional design that considers the mean value of the channel. In the second application, we consider a terminal-location-planning problem for intermodal transportation systems. With a given number of potential locations, it needs to find the most appropriate number of terminals and their locations to provide the economically most efficient operation when multiple service pairs exist simultaneously. The problem also has an inherent issue that for a particular planning, the optimal route paths must be determined for the co-existing service pairs. To solve this NP-hard problem, we design a MCMC-based two-layer method. The lower-layer is an optimal routing design for all service pairs given a particular planning that considers both efficiency and fairness. The upper-layer is finding the optimal planning based on MCMC with the stationary distribution that is mapped from the cost function. The effectiveness of this method is demonstrated through computer simulations and comparison with one state-of-the-art method. The work of this thesis has shown that a MCMC-method, consisting of both slice sampling and simulated annealing, can be successfully applied to solving practical optimization problems. Particularly, the method has advantages in dealing with high-dimensional problems with large searching spaces
Stochastic Optimization Models for Perishable Products
For many years, researchers have focused on developing optimization models to design and manage supply chains. These models have helped companies in different industries to minimize costs, maximize performance while balancing their social and environmental impacts. There is an increasing interest in developing models which optimize supply chain decisions of perishable products. This is mainly because many of the products we use today are perishable, managing their inventory is challenging due to their short shelf life, and out-dated products become waste. Therefore, these supply chain decisions impact profitability and sustainability of companies and the quality of the environment. Perishable products wastage is inevitable when demand is not known beforehand. A number of models in the literature use simulation and probabilistic models to capture supply chain uncertainties. However, when demand distribution cannot be described using standard distributions, probabilistic models are not effective. In this case, using stochastic optimization methods is preferred over obtaining approximate inventory management policies through simulation.
This dissertation proposes models to help businesses and non-prot organizations make inventory replenishment, pricing and transportation decisions that improve the performance of their system. These models focus on perishable products which either deteriorate over time or have a fixed shelf life. The demand and/or supply for these products and/or, the remaining shelf life are stochastic. Stochastic optimization models, including a two-stage stochastic mixed integer linear program, a two-stage stochastic mixed integer non linear program, and a chance constraint program are proposed to capture uncertainties. The objective is to minimize the total replenishment costs which impact prots and service rate. These models are motivated by applications in the vaccine distribution supply chain, and other supply chains used to distribute perishable products.
This dissertation also focuses on developing solution algorithms to solve the proposed optimization models. The computational complexity of these models motivated the development of extensions to standard models used to solve stochastic optimization problems. These algorithms use sample average approximation (SAA) to represent uncertainty. The algorithms proposed are extensions of the stochastic Benders decomposition algorithm, the L-shaped method (LS). These extensions use Gomory mixed integer cuts, mixed-integer rounding cuts, and piecewise linear relaxation of bilinear terms. These extensions lead to the development of linear approximations of the models developed. Computational results reveal that the solution approach presented here outperforms the
standard LS method.
Finally, this dissertation develops case studies using real-life data from the Demographic Health Surveys in Niger and Bangladesh to build predictive models to meet requirements for various childhood immunization vaccines. The results of this study provide support tools for policymakers to design vaccine distribution networks
Calibrate, emulate, sample
Many parameter estimation problems arising in applications can be cast in the framework of Bayesian inversion. This allows not only for an estimate of the parameters, but also for the quantification of uncertainties in the estimates. Often in such problems the parameter-to-data map is very expensive to evaluate, and computing derivatives of the map, or derivative-adjoints, may not be feasible. Additionally, in many applications only noisy evaluations of the map may be available. We propose an approach to Bayesian inversion in such settings that builds on the derivative-free optimization capabilities of ensemble Kalman inversion methods. The overarching approach is to first use ensemble Kalman sampling (EKS) to calibrate the unknown parameters to fit the data; second, to use the output of the EKS to emulate the parameter-to-data map; third, to sample from an approximate Bayesian posterior distribution in which the parameter-to-data map is replaced by its emulator. This results in a principled approach to approximate Bayesian inference that requires only a small number of evaluations of the (possibly noisy approximation of the) parameter-to-data map. It does not require derivatives of this map, but instead leverages the documented power of ensemble Kalman methods. Furthermore, the EKS has the desirable property that it evolves the parameter ensemble towards the regions in which the bulk of the parameter posterior mass is located, thereby locating them well for the emulation phase of the methodology. In essence, the EKS methodology provides a cheap solution to the design problem of where to place points in parameter space to efficiently train an emulator of the parameter-to-data map for the purposes of Bayesian inversion
A hybrid, auto-adaptive, and rule-based multi-agent approach using evolutionary algorithms for improved searching
Selecting the most appropriate heuristic for solving a specific problem is not easy, for many reasons. This article focuses on one of these reasons: traditionally, the solution search process has operated in a given manner regardless of the specific problem being solved, and the process has been the same regardless of the size, complexity and domain of the problem. To cope with this situation, search processes should mould the search into areas of the search space that are meaningful for the problem. This article builds on previous work in the development of a multi-agent paradigm using techniques derived from knowledge discovery (data-mining techniques) on databases of so-far visited solutions. The aim is to improve the search mechanisms, increase computational efficiency and use rules to enrich the formulation of optimization problems, while reducing the search space and catering to realistic problems.Izquierdo Sebastián, J.; Montalvo Arango, I.; Campbell, E.; Pérez GarcÃa, R. (2015). A hybrid, auto-adaptive, and rule-based multi-agent approach using evolutionary algorithms for improved searching. Engineering Optimization. 1-13. doi:10.1080/0305215X.2015.1107434S113Becker, U., & Fahrmeir, L. (2001). Bump Hunting for Risk: a New Data Mining Tool and its Applications. Computational Statistics, 16(3), 373-386. doi:10.1007/s001800100073Bouguessa, M., & Shengrui Wang. (2009). Mining Projected Clusters in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering, 21(4), 507-522. doi:10.1109/tkde.2008.162Chong, I.-G., & Jun, C.-H. (2005). Performance of some variable selection methods when multicollinearity is present. Chemometrics and Intelligent Laboratory Systems, 78(1-2), 103-112. doi:10.1016/j.chemolab.2004.12.011CHONG, I., & JUN, C. (2008). Flexible patient rule induction method for optimizing process variables in discrete type. Expert Systems with Applications, 34(4), 3014-3020. doi:10.1016/j.eswa.2007.05.047Cole, S. W., Galic, Z., & Zack, J. A. (2003). Controlling false-negative errors in microarray differential expression analysis: a PRIM approach. Bioinformatics, 19(14), 1808-1816. doi:10.1093/bioinformatics/btg242FRIEDMAN, J. H., & FISHER, N. I. (1999). Statistics and Computing, 9(2), 123-143. doi:10.1023/a:1008894516817Geem, Z. W. (2006). Optimal cost design of water distribution networks using harmony search. Engineering Optimization, 38(3), 259-277. doi:10.1080/03052150500467430Goncalves, L. B., Vellasco, M. M. B. R., Pacheco, M. A. C., & Flavio Joaquim de Souza. (2006). Inverted hierarchical neuro-fuzzy BSP system: a novel neuro-fuzzy model for pattern classification and rule extraction in databases. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 36(2), 236-248. doi:10.1109/tsmcc.2004.843220Hastie, T., Friedman, J., & Tibshirani, R. (2001). The Elements of Statistical Learning. Springer Series in Statistics. doi:10.1007/978-0-387-21606-5Chih-Ming Hsu, & Ming-Syan Chen. (2009). On the Design and Applicability of Distance Functions in High-Dimensional Data Space. IEEE Transactions on Knowledge and Data Engineering, 21(4), 523-536. doi:10.1109/tkde.2008.178Hwang, S.-F., & He, R.-S. (2006). A hybrid real-parameter genetic algorithm for function optimization. Advanced Engineering Informatics, 20(1), 7-21. doi:10.1016/j.aei.2005.09.001Izquierdo, J., Montalvo, I., Pérez, R., & Fuertes, V. S. (2008). Design optimization of wastewater collection networks by PSO. Computers & Mathematics with Applications, 56(3), 777-784. doi:10.1016/j.camwa.2008.02.007Javadi, A. A., Farmani, R., & Tan, T. P. (2005). A hybrid intelligent genetic algorithm. Advanced Engineering Informatics, 19(4), 255-262. doi:10.1016/j.aei.2005.07.003Jin, X., Zhang, J., Gao, J., & Wu, W. (2008). Multi-objective optimization of water supply network rehabilitation with non-dominated sorting Genetic Algorithm-II. Journal of Zhejiang University-SCIENCE A, 9(3), 391-400. doi:10.1631/jzus.a071448Johns, M. B., Keedwell, E., & Savic, D. (2014). Adaptive locally constrained genetic algorithm for least-cost water distribution network design. Journal of Hydroinformatics, 16(2), 288-301. doi:10.2166/hydro.2013.218Jourdan, L., Corne, D., Savic, D., & Walters, G. (2005). Preliminary Investigation of the ‘Learnable Evolution Model’ for Faster/Better Multiobjective Water Systems Design. Evolutionary Multi-Criterion Optimization, 841-855. doi:10.1007/978-3-540-31880-4_58Kamwa, I., Samantaray, S. R., & Joos, G. (2009). Development of Rule-Based Classifiers for Rapid Stability Assessment of Wide-Area Post-Disturbance Records. IEEE Transactions on Power Systems, 24(1), 258-270. doi:10.1109/tpwrs.2008.2009430Kang, D., & Lansey, K. (2012). Revisiting Optimal Water-Distribution System Design: Issues and a Heuristic Hierarchical Approach. Journal of Water Resources Planning and Management, 138(3), 208-217. doi:10.1061/(asce)wr.1943-5452.0000165Keedwell, E., & Khu, S.-T. (2005). A hybrid genetic algorithm for the design of water distribution networks. Engineering Applications of Artificial Intelligence, 18(4), 461-472. doi:10.1016/j.engappai.2004.10.001Kehl, V., & Ulm, K. (2006). Responder identification in clinical trials with censored data. Computational Statistics & Data Analysis, 50(5), 1338-1355. doi:10.1016/j.csda.2004.11.015Liu, X., Minin, V., Huang, Y., Seligson, D. B., & Horvath, S. (2004). Statistical Methods for Analyzing Tissue Microarray Data. Journal of Biopharmaceutical Statistics, 14(3), 671-685. doi:10.1081/bip-200025657Marchi, A., Dandy, G., Wilkins, A., & Rohrlach, H. (2014). Methodology for Comparing Evolutionary Algorithms for Optimization of Water Distribution Systems. Journal of Water Resources Planning and Management, 140(1), 22-31. doi:10.1061/(asce)wr.1943-5452.0000321MartÃnez-RodrÃguez, J. B., Montalvo, I., Izquierdo, J., & Pérez-GarcÃa, R. (2011). Reliability and Tolerance Comparison in Water Supply Networks. Water Resources Management, 25(5), 1437-1448. doi:10.1007/s11269-010-9753-2McClymont, K., Keedwell, E., Savić, D., & Randall-Smith, M. (2013). A general multi-objective hyper-heuristic for water distribution network design with discolouration risk. Journal of Hydroinformatics, 15(3), 700-716. doi:10.2166/hydro.2012.022McClymont, K., Keedwell, E. C., Savić, D., & Randall-Smith, M. (2014). Automated construction of evolutionary algorithm operators for the bi-objective water distribution network design problem using a genetic programming based hyper-heuristic approach. Journal of Hydroinformatics, 16(2), 302-318. doi:10.2166/hydro.2013.226Michalski, R. S. (2000). Machine Learning, 38(1/2), 9-40. doi:10.1023/a:1007677805582Montalvo, I., Izquierdo, J., Pérez-GarcÃa, R., & Herrera, M. (2014). Water Distribution System Computer-Aided Design by Agent Swarm Optimization. Computer-Aided Civil and Infrastructure Engineering, 29(6), 433-448. doi:10.1111/mice.12062Montalvo, I., Izquierdo, J., Schwarze, S., & Pérez-GarcÃa, R. (2010). Multi-objective particle swarm optimization applied to water distribution systems design: An approach with human interaction. Mathematical and Computer Modelling, 52(7-8), 1219-1227. doi:10.1016/j.mcm.2010.02.017Nguyen, V. V., Hartmann, D., & König, M. (2012). A distributed agent-based approach for simulation-based optimization. Advanced Engineering Informatics, 26(4), 814-832. doi:10.1016/j.aei.2012.06.001Nicklow, J., Reed, P., Savic, D., Dessalegne, T., Harrell, L., … Chan-Hilton, A. (2010). State of the Art for Genetic Algorithms and Beyond in Water Resources Planning and Management. Journal of Water Resources Planning and Management, 136(4), 412-432. doi:10.1061/(asce)wr.1943-5452.0000053Onwubolu, G. C., & Babu, B. V. (2004). New Optimization Techniques in Engineering. Studies in Fuzziness and Soft Computing. doi:10.1007/978-3-540-39930-8Pelikan, M., Goldberg, D. E., & Lobo, F. G. (2002). Computational Optimization and Applications, 21(1), 5-20. doi:10.1023/a:1013500812258Reed, P. M., Hadka, D., Herman, J. D., Kasprzyk, J. R., & Kollat, J. B. (2013). Evolutionary multiobjective optimization in water resources: The past, present, and future. Advances in Water Resources, 51, 438-456. doi:10.1016/j.advwatres.2012.01.005Shang, W., Zhao, S., & Shen, Y. (2009). A flexible tolerance genetic algorithm for optimal problems with nonlinear equality constraints. Advanced Engineering Informatics, 23(3), 253-264. doi:10.1016/j.aei.2008.09.001Vrugt, J. A., & Robinson, B. A. (2007). Improved evolutionary optimization from genetically adaptive multimethod search. Proceedings of the National Academy of Sciences, 104(3), 708-711. doi:10.1073/pnas.0610471104Vrugt, J. A., Robinson, B. A., & Hyman, J. M. (2009). Self-Adaptive Multimethod Search for Global Optimization in Real-Parameter Spaces. IEEE Transactions on Evolutionary Computation, 13(2), 243-259. doi:10.1109/tevc.2008.924428Xie, X.-F., & Liu, J. (2008). Graph coloring by multiagent fusion search. Journal of Combinatorial Optimization, 18(2), 99-123. doi:10.1007/s10878-008-9140-6Xiao-Feng Xie, & Jiming Liu. (2009). Multiagent Optimization System for Solving the Traveling Salesman Problem (TSP). IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 489-502. doi:10.1109/tsmcb.2008.2006910Zheng, F., Simpson, A. R., & Zecchin, A. C. (2013). A decomposition and multistage optimization approach applied to the optimization of water distribution systems with multiple supply sources. Water Resources Research, 49(1), 380-399. doi:10.1029/2012wr013160Zheng, F., Simpson, A. R., & Zecchin, A. C. (2014). Coupled Binary Linear Programming–Differential Evolution Algorithm Approach for Water Distribution System Optimization. Journal of Water Resources Planning and Management, 140(5), 585-597. doi:10.1061/(asce)wr.1943-5452.000036
Recommended from our members
Optimization for Probabilistic Machine Learning
We have access to great variety of datasets more than any time in the history. Everyday, more data is collected from various natural resources and digital platforms. Great advances in the area of machine learning research in the past few decades have relied strongly on availability of these datasets. However, analyzing them imposes significant challenges that are mainly due to two factors. First, the datasets have complex structures with hidden interdependencies. Second, most of the valuable datasets are high dimensional and are largely scaled. The main goal of a machine learning framework is to design a model that is a valid representative of the observations and develop a learning algorithm to make inference about unobserved or latent data based on the observations. Discovering hidden patterns and inferring latent characteristics in such datasets is one of the greatest challenges in the area of machine learning research. In this dissertation, I will investigate some of the challenges in modeling and algorithm design, and present my research results on how to overcome these obstacles.
Analyzing data generally involves two main stages. The first stage is designing a model that is flexible enough to capture complex variation and latent structures in data and is robust enough to generalize well to the unseen data. Designing an expressive and interpretable model is one of crucial objectives in this stage. The second stage involves training learning algorithm on the observed data and measuring the accuracy of model and learning algorithm. This stage usually involves an optimization problem whose objective is to tune the model to the training data and learn the model parameters. Finding global optimal or sufficiently good local optimal solution is one of the main challenges in this step.
Probabilistic models are one of the best known models for capturing data generating process and quantifying uncertainties in data using random variables and probability distributions. They are powerful models that are shown to be adaptive and robust and can scale well to large datasets. However, most probabilistic models have a complex structure. Training them could become challenging commonly due to the presence of intractable integrals in the calculation. To remedy this, they require approximate inference strategies that often results in non-convex optimization problems. The optimization part ensures that the model is the best representative of data or data generating process. The non-convexity of an optimization problem take away the general guarantee on finding a global optimal solution. It will be shown later in this dissertation that inference for a significant number of probabilistic models require solving a non-convex optimization problem.
One of the well-known methods for approximate inference in probabilistic modeling is variational inference. In the Bayesian setting, the target is to learn the true posterior distribution for model parameters given the observations and prior distributions. The main challenge involves marginalization of all the other variables in the model except for the variable of interest. This high-dimensional integral is generally computationally hard, and for many models there is no known polynomial time algorithm for calculating them exactly. Variational inference deals with finding an approximate posterior distribution for Bayesian models where finding the true posterior distribution is analytically or numerically impossible. It assumes a family of distribution for the estimation, and finds the closest member of that family to the true posterior distribution using a distance measure. For many models though, this technique requires solving a non-convex optimization problem that has no general guarantee on reaching a global optimal solution. This dissertation presents a convex relaxation technique for dealing with hardness of the optimization involved in the inference.
The proposed convex relaxation technique is based on semidefinite optimization that has a general applicability to polynomial optimization problem. I will present theoretical foundations and in-depth details of this relaxation in this work. Linear dynamical systems represent the functionality of many real-world physical systems. They can describe the dynamics of a linear time-varying observation which is controlled by a controller unit with quadratic cost function objectives. Designing distributed and decentralized controllers is the goal of many of these systems, which computationally, results in a non-convex optimization problem. In this dissertation, I will further investigate the issues arising in this area and develop a convex relaxation framework to deal with the optimization challenges.
Setting the correct number of model parameters is an important aspect for a good probabilistic model. If there are only a few parameters, model may lack capturing all the essential relations and components in the observations while too many parameters may cause significant complications in learning or overfit to the observations. Non-parametric models are suitable techniques to deal with this issue. They allow the model to learn the appropriate number of parameters to describe the data and make predictions. In this dissertation, I will present my work on designing Bayesian non-parametric models as powerful tools for learning representations of data. Moreover, I will describe the algorithm that we derived to efficiently train the model on the observations and learn the number of model parameters.
Later in this dissertation, I will present my works on designing probabilistic models in combination with deep learning methods for representing sequential data. Sequential datasets comprise a significant portion of resources in the area of machine learning research. Designing models to capture dependencies in sequential datasets are of great interest and have a wide variety of applications in engineering, medicine and statistics. Recent advances in deep learning research has shown exceptional promises in this area. However, they lack interpretability in their general form. To remedy this, I will present my work on mixing probabilistic models with neural network models that results in better performance and expressiveness of the results
Online Predictive Optimization Framework for Stochastic Demand-Responsive Transit Services
This study develops an online predictive optimization framework for
dynamically operating a transit service in an area of crowd movements. The
proposed framework integrates demand prediction and supply optimization to
periodically redesign the service routes based on recently observed demand. To
predict demand for the service, we use Quantile Regression to estimate the
marginal distribution of movement counts between each pair of serviced
locations. The framework then combines these marginals into a joint demand
distribution by constructing a Gaussian copula, which captures the structure of
correlation between the marginals. For supply optimization, we devise a linear
programming model, which simultaneously determines the route structure and the
service frequency according to the predicted demand. Importantly, our framework
both preserves the uncertainty structure of future demand and leverages this
for robust route optimization, while keeping both components decoupled. We
evaluate our framework using a real-world case study of autonomous mobility in
a university campus in Denmark. The results show that our framework often
obtains the ground truth optimal solution, and can outperform conventional
methods for route optimization, which do not leverage full predictive
distributions.Comment: 34 pages, 12 figures, 5 table
- …