64,785 research outputs found

    Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs

    Full text link
    We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log^2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Groebner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp. 619-628. For source code and experimental results, see http://www.ics.uci.edu/~eppstein/projects/pairs

    Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping

    Full text link
    We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clustering tree with leaf nodes for responses and internal nodes for clusters of related responses at multiple granularity, and we seek to leverage this structure to recover covariates relevant to each hierarchically-defined cluster of responses. We propose a tree-guided group lasso, or tree lasso, for estimating such structured sparsity under multi-response regression by employing a novel penalty function constructed from the tree. We describe a systematic weighting scheme for the overlapping groups in the tree-penalty such that each regression coefficient is penalized in a balanced manner despite the inhomogeneous multiplicity of group memberships of the regression coefficients due to overlaps among groups. For efficient optimization, we employ a smoothing proximal gradient method that was originally developed for a general class of structured-sparsity-inducing penalties. Using simulated and yeast data sets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns, compared to other methods for learning a multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Impact of different time series aggregation methods on optimal energy system design

    Full text link
    Modelling renewable energy systems is a computationally-demanding task due to the high fluctuation of supply and demand time series. To reduce the scale of these, this paper discusses different methods for their aggregation into typical periods. Each aggregation method is applied to a different type of energy system model, making the methods fairly incomparable. To overcome this, the different aggregation methods are first extended so that they can be applied to all types of multidimensional time series and then compared by applying them to different energy system configurations and analyzing their impact on the cost optimal design. It was found that regardless of the method, time series aggregation allows for significantly reduced computational resources. Nevertheless, averaged values lead to underestimation of the real system cost in comparison to the use of representative periods from the original time series. The aggregation method itself, e.g. k means clustering, plays a minor role. More significant is the system considered: Energy systems utilizing centralized resources require fewer typical periods for a feasible system design in comparison to systems with a higher share of renewable feed-in. Furthermore, for energy systems based on seasonal storage, currently existing models integration of typical periods is not suitable

    A clustering particle swarm optimizer for locating and tracking multiple optima in dynamic environments

    Get PDF
    This article is posted here with permission from the IEEE - Copyright @ 2010 IEEEIn the real world, many optimization problems are dynamic. This requires an optimization algorithm to not only find the global optimal solution under a specific environment but also to track the trajectory of the changing optima over dynamic environments. To address this requirement, this paper investigates a clustering particle swarm optimizer (PSO) for dynamic optimization problems. This algorithm employs a hierarchical clustering method to locate and track multiple peaks. A fast local search method is also introduced to search optimal solutions in a promising subregion found by the clustering method. Experimental study is conducted based on the moving peaks benchmark to test the performance of the clustering PSO in comparison with several state-of-the-art algorithms from the literature. The experimental results show the efficiency of the clustering PSO for locating and tracking multiple optima in dynamic environments in comparison with other particle swarm optimization models based on the multiswarm method.This work was supported by the Engineering and Physical Sciences Research Council of U.K., under Grant EP/E060722/1
    corecore