79,253 research outputs found

    Dynamic load balancing in parallel KD-tree k-means

    Get PDF
    One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

    On the development of a stochastic optimisation algorithm with capabilities for distributed computing

    Get PDF
    In this thesis, we devise a new stochastic optimisation method (cascade optimisation algorithm) by incorporating the concepts from Markov process whilst eliminating the inherent sequential nature that is the major deficit preventing the exploitation of advances in distributed computing infrastructures. This method introduces partitions and pools to store intermediate solution and corresponding objectives. A Markov process increases the population of partitions and pools. The population is distributed periodically following an external certain. With the use of partitions and pools, multiple Markov processes can be launched simultaneously for different partitions and pools. The cascade optimisation algorithm is suitable for parallel and distributed computing environments. In addition, this method has the potential to integrate knowledge acquisition techniques (e. g. data mining and ontology) to achieve effective knowledge-based decision making. Several features are extracted and studied in this thesis. The application problems involve both the small-scale and the large-scale optimisation problems. Comparisons with the stochastic optimisation methods are made and results show that the cascade optimisation algorithm can converge to the optimal solutions in agreement with other methods more quickly. The cascade optimisation algorithm is also studied on parallel and distributed computing environments in terms of the reduction in computation time.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Evaluating the Robustness of Resource Allocations Obtained through Performance Modeling with Stochastic Process Algebra

    Get PDF
    Recent developments in the field of parallel and distributed computing has led to a proliferation of solving large and computationally intensive mathematical, science, or engineering problems, that consist of several parallelizable parts and several non-parallelizable (sequential) parts. In a parallel and distributed computing environment, the performance goal is to optimize the execution of parallelizable parts of an application on concurrent processors. This requires efficient application scheduling and resource allocation for mapping applications to a set of suitable parallel processors such that the overall performance goal is achieved. However, such computational environments are often prone to unpredictable variations in application (problem and algorithm) and system characteristics. Therefore, a robustness study is required to guarantee a desired level of performance. Given an initial workload, a mapping of applications to resources is considered to be robust if that mapping optimizes execution performance and guarantees a desired level of performance in the presence of unpredictable perturbations at runtime. In this research, a stochastic process algebra, Performance Evaluation Process Algebra (PEPA), is used for obtaining resource allocations via a numerical analysis of performance modeling of the parallel execution of applications on parallel computing resources. The PEPA performance model is translated into an underlying mathematical Markov chain model for obtaining performance measures. Further, a robustness analysis of the allocation techniques is performed for finding a robustmapping from a set of initial mapping schemes. The numerical analysis of the performance models have confirmed similarity with the simulation results of earlier research available in existing literature. When compared to direct experiments and simulations, numerical models and the corresponding analyses are easier to reproduce, do not incur any setup or installation costs, do not impose any prerequisites for learning a simulation framework, and are not limited by the complexity of the underlying infrastructure or simulation libraries

    Optimum allocation of distributed generation in multi-feeder systems using long term evaluation and assuming voltage-dependent loads

    Get PDF
    The analysis of actual distribution systems with penetration of distributed generation requires powerful tools with capabilities that until very recently were not available in distribution software tools; for instance, probabilistic and time mode simulations. This paper presents the work made by the authors to expand some procedures previously implemented for using OpenDSS, a freely available software tool for distribution system studies, when it is driven as a COM DLL from MATLAB using a parallel computing environment. The paper details the application of parallel computing to the allocation of distributed generation for optimum reduction of energy losses in a multi-feeder distribution system when the system is evaluated during a long period (e.g., the target is to minimize energy losses for periods longer than one year) and voltage-dependent load models are used. The long term evaluation is carried out by assuming that the connection of the generation units is sequential, and using a divide and conquer approach to speed up calculations. The main goals are to check the viability of a Monte Carlo method in some studies for which parallel computing can be advantageously applied and propose a procedure for quasi-optimum allocation of photovoltaic generation in a multi-feeder distribution system.© 2015 Elsevier Ltd.Postprint (author's final draft
    • …
    corecore