779 research outputs found

    Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

    Full text link
    As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

    Nearly optimal solutions for the Chow Parameters Problem and low-weight approximation of halfspaces

    Get PDF
    The \emph{Chow parameters} of a Boolean function f:{1,1}n{1,1}f: \{-1,1\}^n \to \{-1,1\} are its n+1n+1 degree-0 and degree-1 Fourier coefficients. It has been known since 1961 (Chow, Tannenbaum) that the (exact values of the) Chow parameters of any linear threshold function ff uniquely specify ff within the space of all Boolean functions, but until recently (O'Donnell and Servedio) nothing was known about efficient algorithms for \emph{reconstructing} ff (exactly or approximately) from exact or approximate values of its Chow parameters. We refer to this reconstruction problem as the \emph{Chow Parameters Problem.} Our main result is a new algorithm for the Chow Parameters Problem which, given (sufficiently accurate approximations to) the Chow parameters of any linear threshold function ff, runs in time \tilde{O}(n^2)\cdot (1/\eps)^{O(\log^2(1/\eps))} and with high probability outputs a representation of an LTF ff' that is \eps-close to ff. The only previous algorithm (O'Donnell and Servedio) had running time \poly(n) \cdot 2^{2^{\tilde{O}(1/\eps^2)}}. As a byproduct of our approach, we show that for any linear threshold function ff over {1,1}n\{-1,1\}^n, there is a linear threshold function ff' which is \eps-close to ff and has all weights that are integers at most \sqrt{n} \cdot (1/\eps)^{O(\log^2(1/\eps))}. This significantly improves the best previous result of Diakonikolas and Servedio which gave a \poly(n) \cdot 2^{\tilde{O}(1/\eps^{2/3})} weight bound, and is close to the known lower bound of max{n,\max\{\sqrt{n}, (1/\eps)^{\Omega(\log \log (1/\eps))}\} (Goldberg, Servedio). Our techniques also yield improved algorithms for related problems in learning theory

    More on Gribov copies and propagators in Landau-gauge Yang-Mills theory

    Full text link
    Fixing a gauge in the non-perturbative domain of Yang-Mills theory is a non-trivial problem due to the presence of Gribov copies. In particular, there are different gauges in the non-perturbative regime which all correspond to the same definition of a gauge in the perturbative domain. Gauge-dependent correlation functions may differ in these gauges. Two such gauges are the minimal and absolute Landau gauge, both corresponding to the perturbative Landau gauge. These, and their numerical implementation, are described and presented in detail. Other choices will also be discussed. This investigation is performed, using numerical lattice gauge theory calculations, by comparing the propagators of gluons and ghosts for the minimal Landau gauge and the absolute Landau gauge in SU(2) Yang-Mills theory. It is found that the propagators are different in the far infrared and even at energy scales of the order of half a GeV. In particular, also the finite-volume effects are modified. This is observed in two and three dimensions. Some remarks on the four-dimensional case are provided as well.Comment: 23 pages, 16 figures, 6 tables; various changes throughout most of the paper; extended discussion on different possibilities to define the Landau gauge and connection to existing scenarios; in v3: Minor changes, error in eq. (3) & (4) corrected, version to appear in PR

    False-Name Manipulation in Weighted Voting Games is Hard for Probabilistic Polynomial Time

    Full text link
    False-name manipulation refers to the question of whether a player in a weighted voting game can increase her power by splitting into several players and distributing her weight among these false identities. Analogously to this splitting problem, the beneficial merging problem asks whether a coalition of players can increase their power in a weighted voting game by merging their weights. Aziz et al. [ABEP11] analyze the problem of whether merging or splitting players in weighted voting games is beneficial in terms of the Shapley-Shubik and the normalized Banzhaf index, and so do Rey and Rothe [RR10] for the probabilistic Banzhaf index. All these results provide merely NP-hardness lower bounds for these problems, leaving the question about their exact complexity open. For the Shapley--Shubik and the probabilistic Banzhaf index, we raise these lower bounds to hardness for PP, "probabilistic polynomial time", and provide matching upper bounds for beneficial merging and, whenever the number of false identities is fixed, also for beneficial splitting, thus resolving previous conjectures in the affirmative. It follows from our results that beneficial merging and splitting for these two power indices cannot be solved in NP, unless the polynomial hierarchy collapses, which is considered highly unlikely

    A Dispersion Operator for Geometric Semantic Genetic Programming

    Get PDF
    Recent advances in geometric semantic genetic programming (GSGP) have shown that the results obtained by these methods can outperform those obtained by classical genetic programming algorithms, in particular in the context of symbolic regression. However, there are still many open issues on how to improve their search mechanism. One of these issues is how to get around the fact that the GSGP crossover operator cannot generate solutions that are placed outside the convex hull formed by the individuals of the current population. Although the mutation operator alleviates this problem, we cannot guarantee it will find promising regions of the search space within feasible computational time. In this direction, this paper proposes a new geometric dispersion operator that uses multiplicative factors to move individuals to less dense areas of the search space around the target solution before applying semantic genetic operators. Experiments in sixteen datasets show that the results obtained by the proposed operator are statistically significantly better than those produced by GSGP and that the operator does indeed spread the solutions around the target solution

    Continuous extremal optimization for Lennard-Jones Clusters

    Full text link
    In this paper, we explore a general-purpose heuristic algorithm for finding high-quality solutions to continuous optimization problems. The method, called continuous extremal optimization(CEO), can be considered as an extension of extremal optimization(EO) and is consisted of two components, one is with responsibility for global searching and the other is with responsibility for local searching. With only one adjustable parameter, the CEO's performance proves competitive with more elaborate stochastic optimization procedures. We demonstrate it on a well known continuous optimization problem: the Lennerd-Jones clusters optimization problem.Comment: 5 pages and 3 figure

    Optimal transport on supply-demand networks

    Full text link
    Previously, transport networks are usually treated as homogeneous networks, that is, every node has the same function, simultaneously providing and requiring resources. However, some real networks, such as power grid and supply chain networks, show a far different scenario in which the nodes are classified into two categories: the supply nodes provide some kinds of services, while the demand nodes require them. In this paper, we propose a general transport model for those supply-demand networks, associated with a criterion to quantify their transport capacities. In a supply-demand network with heterogenous degree distribution, its transport capacity strongly depends on the locations of supply nodes. We therefore design a simulated annealing algorithm to find the optimal configuration of supply nodes, which remarkably enhances the transport capacity, and outperforms the degree target algorithm, the betweenness target algorithm, and the greedy method. This work provides a start point for systematically analyzing and optimizing transport dynamics on supply-demand networks.Comment: 5 pages, 1 table and 4 figure

    Theoretical analysis of the role of chromatin interactions in long-range action of enhancers and insulators

    Get PDF
    Long-distance regulatory interactions between enhancers and their target genes are commonplace in higher eukaryotes. Interposed boundaries or insulators are able to block these long distance regulatory interactions. The mechanistic basis for insulator activity and how it relates to enhancer action-at-a-distance remains unclear. Here we explore the idea that topological loops could simultaneously account for regulatory interactions of distal enhancers and the insulating activity of boundary elements. We show that while loop formation is not in itself sufficient to explain action at a distance, incorporating transient non-specific and moderate attractive interactions between the chromatin fibers strongly enhances long-distance regulatory interactions and is sufficient to generate a euchromatin-like state. Under these same conditions, the subdivision of the loop into two topologically independent loops by insulators inhibits inter-domain interactions. The underlying cause of this effect is a suppression of crossings in the contact map at intermediate distances. Thus our model simultaneously accounts for regulatory interactions at a distance and the insulator activity of boundary elements. This unified model of the regulatory roles of chromatin loops makes several testable predictions that could be confronted with \emph{in vitro} experiments, as well as genomic chromatin conformation capture and fluorescent microscopic approaches.Comment: 10 pages, originally submitted to an (undisclosed) journal in May 201
    corecore