Search CORE

775 research outputs found

Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Author: Banzhaf W.
Bergstra J.
Feurer M.
Hastie T. J.
Snoek J.
Urbanowicz R. J.
Publication venue
Publication date: 19/03/2016
Field of study

As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

arXiv.org e-Print Archive

Crossref

Scipedia

ChemGAPP:a tool for chemical genomics analysis and phenotypic profiling

Author: Banzhaf Manuel
Doherty Hannah M
Galardini Marco
Kritikos George
Moradigaravand Danesh
Publication venue: 'Oxford University Press (OUP)'
Publication date: 03/04/2023
Field of study

University of Birmingham Research Portal

Nearly optimal solutions for the Chow Parameters Problem and low-weight approximation of halfspaces

Author: Anindya De
Aziz H.
Banzhaf J.
Cheraghchi M.
de Keijzer B.
Dertouzos M.
Feldman V.
Feldman V.
Felsenthal D.
Freixas J.
Ilias Diakonikolas
Muroga S.
Rocco A. Servedio
Takamiya K.
Tannenbaum M.
Vitaly Feldman
Winder R. O.
Publication venue
Publication date: 05/06/2012
Field of study

The \emph{Chow parameters} of a Boolean function

f: \{-1,1\}^n \to \{-1,1\}

are its

n+1

degree-0 and degree-1 Fourier coefficients. It has been known since 1961 (Chow, Tannenbaum) that the (exact values of the) Chow parameters of any linear threshold function

f

uniquely specify

f

within the space of all Boolean functions, but until recently (O'Donnell and Servedio) nothing was known about efficient algorithms for \emph{reconstructing}

f

(exactly or approximately) from exact or approximate values of its Chow parameters. We refer to this reconstruction problem as the \emph{Chow Parameters Problem.} Our main result is a new algorithm for the Chow Parameters Problem which, given (sufficiently accurate approximations to) the Chow parameters of any linear threshold function

f

, runs in time \tilde{O}(n^2)\cdot (1/\eps)^{O(\log^2(1/\eps))} and with high probability outputs a representation of an LTF

f'

that is \eps-close to

f

. The only previous algorithm (O'Donnell and Servedio) had running time \poly(n) \cdot 2^{2^{\tilde{O}(1/\eps^2)}}. As a byproduct of our approach, we show that for any linear threshold function

f

over

\{-1,1\}^n

, there is a linear threshold function

f'

which is \eps-close to

f

and has all weights that are integers at most \sqrt{n} \cdot (1/\eps)^{O(\log^2(1/\eps))}. This significantly improves the best previous result of Diakonikolas and Servedio which gave a \poly(n) \cdot 2^{\tilde{O}(1/\eps^{2/3})} weight bound, and is close to the known lower bound of

\max\{\sqrt{n},

(1/\eps)^{\Omega(\log \log (1/\eps))}\} (Goldberg, Servedio). Our techniques also yield improved algorithms for related problems in learning theory

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Explorer

More on Gribov copies and propagators in Landau-gauge Yang-Mills theory

Author: A. Cucchieri
A. Sternbeck
A. Sternbeck
A. Sternbeck
A. Yamaguchi
A. Yamaguchi
Axel Maas
G. Dell’Antonio
I. L. Bogolubsky
L. von Smekal
M. Bohm
R. J. Rivers
W. Banzhaf
Publication venue: 'American Physical Society (APS)'
Publication date: 09/01/2009
Field of study

Fixing a gauge in the non-perturbative domain of Yang-Mills theory is a non-trivial problem due to the presence of Gribov copies. In particular, there are different gauges in the non-perturbative regime which all correspond to the same definition of a gauge in the perturbative domain. Gauge-dependent correlation functions may differ in these gauges. Two such gauges are the minimal and absolute Landau gauge, both corresponding to the perturbative Landau gauge. These, and their numerical implementation, are described and presented in detail. Other choices will also be discussed. This investigation is performed, using numerical lattice gauge theory calculations, by comparing the propagators of gluons and ghosts for the minimal Landau gauge and the absolute Landau gauge in SU(2) Yang-Mills theory. It is found that the propagators are different in the far infrared and even at energy scales of the order of half a GeV. In particular, also the finite-volume effects are modified. This is observed in two and three dimensions. Some remarks on the four-dimensional case are provided as well.Comment: 23 pages, 16 figures, 6 tables; various changes throughout most of the paper; extended discussion on different possibilities to define the Landau gauge and connection to existing scenarios; in v3: Minor changes, error in eq. (3) & (4) corrected, version to appear in PR

arXiv.org e-Print Archive

Crossref

Deployment of parallel linear genetic programming using GPUs on PC and video game console platforms

Author: F Musgrave
Garnett Wilson
J Andrews
J Koza
M Harris
W Banzhaf
W Banzhaf
Wolfgang Banzhaf
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

False-Name Manipulation in Weighted Voting Games is Hard for Probabilistic Polynomial Time

Author: D. Felsenthal
E. Elkind
H. Aziz
H. Hunt
J. Banzhaf III
J. Gill
K. Prasad
K. Wagner
L. Penrose
L. Shapley
L. Valiant
M. Littman
M. Mundhenk
P. Dubey
P. Faliszewski
S. Toda
Y. Bachrach
Publication venue
Publication date: 07/03/2013
Field of study

False-name manipulation refers to the question of whether a player in a weighted voting game can increase her power by splitting into several players and distributing her weight among these false identities. Analogously to this splitting problem, the beneficial merging problem asks whether a coalition of players can increase their power in a weighted voting game by merging their weights. Aziz et al. [ABEP11] analyze the problem of whether merging or splitting players in weighted voting games is beneficial in terms of the Shapley-Shubik and the normalized Banzhaf index, and so do Rey and Rothe [RR10] for the probabilistic Banzhaf index. All these results provide merely NP-hardness lower bounds for these problems, leaving the question about their exact complexity open. For the Shapley--Shubik and the probabilistic Banzhaf index, we raise these lower bounds to hardness for PP, "probabilistic polynomial time", and provide matching upper bounds for beneficial merging and, whenever the number of false identities is fixed, also for beneficial splitting, thus resolving previous conjectures in the affirmative. It follows from our results that beneficial merging and splitting for these two power indices cannot be solved in NP, unless the polynomial hierarchy collapses, which is considered highly unlikely

arXiv.org e-Print Archive

Crossref

A Dispersion Operator for Geometric Semantic Genetic Programming

Author: Banzhaf W.
Botzheim J.
Castelli M.
Koza J. R.
Pawlak T. P.
Vanneschi L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Recent advances in geometric semantic genetic programming (GSGP) have shown that the results obtained by these methods can outperform those obtained by classical genetic programming algorithms, in particular in the context of symbolic regression. However, there are still many open issues on how to improve their search mechanism. One of these issues is how to get around the fact that the GSGP crossover operator cannot generate solutions that are placed outside the convex hull formed by the individuals of the current population. Although the mutation operator alleviates this problem, we cannot guarantee it will find promising regions of the search space within feasible computational time. In this direction, this paper proposes a new geometric dispersion operator that uses multiplicative factors to move individuals to less dense areas of the search space around the target solution before applying semantic genetic operators. Experiments in sixteen datasets show that the results obtained by the proposed operator are statistically significantly better than those produced by GSGP and that the operator does indeed spread the solutions around the target solution

Crossref

Kent Academic Repository

Continuous extremal optimization for Lennard-Jones Clusters

Author: Bing-Hong Wang
D. E. Goldberg
E. H. L. Aarts
J. Holland
L. T. Wille
Long-Jiu Cheng
M. R. Garey
S. Boettcher
S. Boettcher
Tao Zhou
W. Banzhaf
Wen-Jie Bai
Publication venue: 'American Physical Society (APS)'
Publication date: 16/11/2004
Field of study

In this paper, we explore a general-purpose heuristic algorithm for finding high-quality solutions to continuous optimization problems. The method, called continuous extremal optimization(CEO), can be considered as an extension of extremal optimization(EO) and is consisted of two components, one is with responsibility for global searching and the other is with responsibility for local searching. With only one adjustable parameter, the CEO's performance proves competitive with more elaborate stochastic optimization procedures. We demonstrate it on a well known continuous optimization problem: the Lennerd-Jones clusters optimization problem.Comment: 5 pages and 3 figure

arXiv.org e-Print Archive

Crossref

Optimal transport on supply-demand networks

Author: Bing-Hong Wang
Changsong Zhou
D. E. Goldberg
E. H. L. Aarts
G. Caldarelli
J. Holland
K. Papagiannaki
Li-Chao Zhao
M. E. J. Newman
Tao Zhou
W. Banzhaf
W.-J. Bai
Yu-Han Chen
Publication venue: 'American Physical Society (APS)'
Publication date: 08/08/2009
Field of study

Previously, transport networks are usually treated as homogeneous networks, that is, every node has the same function, simultaneously providing and requiring resources. However, some real networks, such as power grid and supply chain networks, show a far different scenario in which the nodes are classified into two categories: the supply nodes provide some kinds of services, while the demand nodes require them. In this paper, we propose a general transport model for those supply-demand networks, associated with a criterion to quantify their transport capacities. In a supply-demand network with heterogenous degree distribution, its transport capacity strongly depends on the locations of supply nodes. We therefore design a simulated annealing algorithm to find the optimal configuration of supply nodes, which remarkably enhances the transport capacity, and outperforms the degree target algorithm, the betweenness target algorithm, and the greedy method. This work provides a start point for systematically analyzing and optimizing transport dynamics on supply-demand networks.Comment: 5 pages, 1 table and 4 figure

arXiv.org e-Print Archive

Crossref

Theoretical analysis of the role of chromatin interactions in long-range action of enhancers and insulators

Author: A. M. Sengupta
Arya
Banzhaf
Blackwood
Bondarenko
Bushey
Bustin
Capelson
Carter
Chodaparambil
Corces
Cremer
Cui
Dekker
Dorsett
Gerasimova
Geyer
Geyer
Hahnfeldt
Langowski
Luger
Moore
Nielsen
Nobrega
P. Schedl
Phair
Polikanov
Rochman
S. Mukhopadhyay
Savitskaya
Schermelleh
Simonis
Tolhuis
V. M. Studitsky
Valenzuela
van den Engh
Wallace
West
Zheng
Zhou
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 14/03/2011
Field of study

Long-distance regulatory interactions between enhancers and their target genes are commonplace in higher eukaryotes. Interposed boundaries or insulators are able to block these long distance regulatory interactions. The mechanistic basis for insulator activity and how it relates to enhancer action-at-a-distance remains unclear. Here we explore the idea that topological loops could simultaneously account for regulatory interactions of distal enhancers and the insulating activity of boundary elements. We show that while loop formation is not in itself sufficient to explain action at a distance, incorporating transient non-specific and moderate attractive interactions between the chromatin fibers strongly enhances long-distance regulatory interactions and is sufficient to generate a euchromatin-like state. Under these same conditions, the subdivision of the loop into two topologically independent loops by insulators inhibits inter-domain interactions. The underlying cause of this effect is a suppression of crossings in the contact map at intermediate distances. Thus our model simultaneously accounts for regulatory interactions at a distance and the insulator activity of boundary elements. This unified model of the regulatory roles of chromatin loops makes several testable predictions that could be confronted with \emph{in vitro} experiments, as well as genomic chromatin conformation capture and fluorescent microscopic approaches.Comment: 10 pages, originally submitted to an (undisclosed) journal in May 201

arXiv.org e-Print Archive

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central