49 research outputs found

    Sparsifying to optimize over multiple information sources: an augmented Gaussian process based algorithm

    Get PDF
    AbstractOptimizing a black-box, expensive, and multi-extremal function, given multiple approximations, is a challenging task known as multi-information source optimization (MISO), where each source has a different cost and the level of approximation (aka fidelity) of each source can change over the search space. While most of the current approaches fuse the Gaussian processes (GPs) modelling each source, we propose to use GP sparsification to select only "reliable" function evaluations performed over all the sources. These selected evaluations are used to create an augmented Gaussian process (AGP), whose name is implied by the fact that the evaluations on the most expensive source are augmented with the reliable evaluations over less expensive sources. A new acquisition function, based on confidence bound, is also proposed, including both cost of the next source to query and the location-dependent approximation of that source. This approximation is estimated through a model discrepancy measure and the prediction uncertainty of the GPs. MISO-AGP and the MISO-fused GP counterpart are compared on two test problems and hyperparameter optimization of a machine learning classifier on a large dataset

    Improving analytics in urban water management: a spectral clustering-based approach for leakage localization

    Get PDF
    Worldwide growing water demand has been forcing utilities to successfully manage their costs. Contemporarily, within an era of tight budgets in most economic and social sectors, it affects also Water Distribution Networks (WDN). So, an efficient urban water management is needed to get a balance between consumer satisfaction and infrastructural assets inherent to WDN. Particular case is referred to pipe networks which suffer for frequent leaks, failures and service disruptions. The ensuing costs due to inspection, repair and replacement, are a significant part of operational expenses and give rise to difficult decision making. Recently, the goal regarding the improvement of the traditional leakage management process through the development of analytical leakage localization tools has been brought to the forefront leading to the proposal of several approaches. The basis of all methods relies on the fact that leaks can be detected correlating changes in flow to the output of a simulation model whose parameters are related to both location and severity of the leak. This paper, starting from a previous work of the authors, shows how the critical phases of leak localization can be accomplished through a combination of hydraulic simulation and clustering. The research deals with the benefits provided by Spectral Clustering which is usually adopted for network analysis tasks (e.g., community or sub-network discovery). A transformation from a data points dataset, consisting of leakage scenarios simulated through a hydraulic simulation model, to a similarity graph is presented. Spectral Clustering is then applied on the similarity graph and results are compared with those provided by traditional clustering techniques on the original data points dataset. The proposed spectral approach proved to be more effective with respect to traditional clustering, having a better performance to analytically localize leaks in a water distribution network and, consequently, reducing costs for intervention, inspection and rehabilitation.Peer ReviewedPostprint (published version

    A Wasserstein distance based multiobjective evolutionary algorithm for the risk aware optimization of sensor placement

    Get PDF
    Abstract In this paper we propose a new algorithm for the identification of optimal "sensing spots", within a network, for monitoring the spread of "effects" triggered by "events". This problem is referred to as "Optimal Sensor Placement" and many real-world problems fit into this general framework. In this paper sensor placement (SP) (i.e., location of sensors at some nodes) for the early detection of contaminants in water distribution networks (WDNs) will be used as a running example. Usually, we have to manage a trade-off between different objective functions, so that we are faced with a multi objective optimization problem. (MOP). The best trade-off between the objectives can be defined in terms of Pareto optimality. In this paper we model the sensor placement problem as a multi objective optimization problem with boolean decision variables and propose a Multi Objective Evolutionary Algorithm (MOEA) for approximating and analyzing the Pareto set. The evaluation of the objective functions requires the execution of a simulation model: to organize the simulation results in a computationally efficient way we propose a data structure collecting simulation outcomes for every SP which is particularly suitable for visualization of the dynamics of contaminant concentration and evolutionary optimization. This data structure enables the definition of information spaces, in which a candidate placement can be represented as a matrix or, in probabilistic terms as a histogram. The introduction of a distance between histograms, namely the Wasserstein (WST) distance, enables to derive new genetic operators, indicators of the quality of the Pareto set and criteria to choose among the Pareto solutions. The new algorithm MOEA/WST has been tested on two benchmark water distribution networks and a real world network. Preliminary results are compared with NSGA-II and show a better performance, in terms of hypervolume and coverage, in particular for relatively large networks and low generation counts

    A New Evolutionary Approach to Optimal Sensor Placement in Water Distribution Networks

    Get PDF
    The sensor placement problem is modeled as a multi-objective optimization problem with Boolean decision variables. A new multi objective evolutionary algorithm (MOEA) is proposed for approximating and analyzing the set of Pareto optimal solutions. The evaluation of the objective functions requires the execution of a hydraulic simulation model of the network. To organize the simulation results a data structure is proposed which enables the dynamic representation of a sensor placement and its fitness as a heatmap. This allows the definition of information spaces, in which the fitness of a placement can be represented as a matrix or, in probabilistic terms as a histogram. The key element in the new algorithm is this probabilistic representation which is embedded in a space endowed with a metric based on a specific notion of distance. Among several distances between probability distributions the Wasserstein (WST) distance has been selected: WST has enabled to derive new genetic operators, indicators of the quality of the Pareto set and criteria to choose among the Pareto solutions. The new algorithm has been tested on a benchmark water distribution network with two objective functions showing an improvement over NSGA-II, in particular for low generation counts, making it a good candidate for expensive black-box multi-objective optimizatio

    Spectral Clustering And Support Vector Classification For Localizing Leakages In Water Distribution Networks – The ICeWater Project Approach

    Full text link
    This paper presents a framework based on hydraulic simulation and machine learning for supporting Water Distribution Network (WDN) managers in localizing leakages, while reducing time and costs for investigation, intervention and rehabilitation. As a first step, hydraulic simulation is used to run different leakage scenarios by introducing a leak on each pipe, in turn, and varying its severity. As output of each scenario run, pressure and flow variations in correspondence of the actual monitoring points into the WDN, and with respect to the faultless model, are stored. Scenarios clustering is aimed at grouping together leaks generating similar effects, in terms of observable pressure and flow variations. This analysis is performed by creating a similarity graph, where nodes are scenarios and edges are weighted by the similarity between pairs of scenarios. Spectral clustering, a graph-clustering technique, is here proposed according to its usually higher performances with respect to traditional data-points clustering. Then each scenario is labeled with its cluster by obtaining a labeled dataset on which a Support Vector Machine (SVM) with RBF-kernel is trained. When an actual leak is detected, the variations in measured pressure and flow with respect to the faultless hydraulic model are given as input to the trained SVM which assigns them to a specific cluster, whose corresponding pipes are provided as the hydraulic components to check for leakage. Since spectral clustering induces a non-linear transformation, from Input Space (i.e., pressure and flow variations) to Feature Space (i.e., most relevant eigen-vectors) where clusters are obtained, the SVM encodes the non-linear relationship of pressure and flow variations with the scenarios cluster. The SVM is able to remap efficiently the results from spectral clustering toward the Input Space giving the probably leaky pipes even for pressure and flow variations not included in the simulated leakage scenarios

    classification of oncologic data with genetic programming

    Get PDF
    Discovering the models explaining the hidden relationship between genetic material and tumor pathologies is one of the most important open challenges in biology and medicine. Given the large amount of data made available by the DNA Microarray technique, Machine Learning is becoming a popular tool for this kind of investigations. In the last few years, we have been particularly involved in the study of Genetic Programming for mining large sets of biomedical data. In this paper, we present a comparison between four variants of Genetic Programming for the classification of two different oncologic datasets: the first one contains data from healthy colon tissues and colon tissues affected by cancer; the second one contains data from patients affected by two kinds of leukemia (acute myeloid leukemia and acute lymphoblastic leukemia). We report experimental results obtained using two different fitness criteria: the receiver operating characteristic and the percentage of correctly classified instances. These results, and their comparison with the ones obtained by three nonevolutionary Machine Learning methods (Support Vector Machines, MultiBoosting, and Random Forests) on the same data, seem to hint that Genetic Programming is a promising technique for this kind of classification
    corecore