779 research outputs found
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
As the field of data science continues to grow, there will be an
ever-increasing demand for tools that make machine learning accessible to
non-experts. In this paper, we introduce the concept of tree-based pipeline
optimization for automating one of the most tedious parts of machine
learning---pipeline design. We implement an open source Tree-based Pipeline
Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a
series of simulated and real-world benchmark data sets. In particular, we show
that TPOT can design machine learning pipelines that provide a significant
improvement over a basic machine learning analysis while requiring little to no
input nor prior knowledge from the user. We also address the tendency for TPOT
to design overly complex pipelines by integrating Pareto optimization, which
produces compact pipelines without sacrificing classification accuracy. As
such, this work represents an important step toward fully automating machine
learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet
made from reviewer comment
Nearly optimal solutions for the Chow Parameters Problem and low-weight approximation of halfspaces
The \emph{Chow parameters} of a Boolean function
are its degree-0 and degree-1 Fourier coefficients. It has been known
since 1961 (Chow, Tannenbaum) that the (exact values of the) Chow parameters of
any linear threshold function uniquely specify within the space of all
Boolean functions, but until recently (O'Donnell and Servedio) nothing was
known about efficient algorithms for \emph{reconstructing} (exactly or
approximately) from exact or approximate values of its Chow parameters. We
refer to this reconstruction problem as the \emph{Chow Parameters Problem.}
Our main result is a new algorithm for the Chow Parameters Problem which,
given (sufficiently accurate approximations to) the Chow parameters of any
linear threshold function , runs in time \tilde{O}(n^2)\cdot
(1/\eps)^{O(\log^2(1/\eps))} and with high probability outputs a
representation of an LTF that is \eps-close to . The only previous
algorithm (O'Donnell and Servedio) had running time \poly(n) \cdot
2^{2^{\tilde{O}(1/\eps^2)}}.
As a byproduct of our approach, we show that for any linear threshold
function over , there is a linear threshold function which
is \eps-close to and has all weights that are integers at most \sqrt{n}
\cdot (1/\eps)^{O(\log^2(1/\eps))}. This significantly improves the best
previous result of Diakonikolas and Servedio which gave a \poly(n) \cdot
2^{\tilde{O}(1/\eps^{2/3})} weight bound, and is close to the known lower
bound of (1/\eps)^{\Omega(\log \log (1/\eps))}\} (Goldberg,
Servedio). Our techniques also yield improved algorithms for related problems
in learning theory
More on Gribov copies and propagators in Landau-gauge Yang-Mills theory
Fixing a gauge in the non-perturbative domain of Yang-Mills theory is a
non-trivial problem due to the presence of Gribov copies. In particular, there
are different gauges in the non-perturbative regime which all correspond to the
same definition of a gauge in the perturbative domain. Gauge-dependent
correlation functions may differ in these gauges. Two such gauges are the
minimal and absolute Landau gauge, both corresponding to the perturbative
Landau gauge. These, and their numerical implementation, are described and
presented in detail. Other choices will also be discussed.
This investigation is performed, using numerical lattice gauge theory
calculations, by comparing the propagators of gluons and ghosts for the minimal
Landau gauge and the absolute Landau gauge in SU(2) Yang-Mills theory. It is
found that the propagators are different in the far infrared and even at energy
scales of the order of half a GeV. In particular, also the finite-volume
effects are modified. This is observed in two and three dimensions. Some
remarks on the four-dimensional case are provided as well.Comment: 23 pages, 16 figures, 6 tables; various changes throughout most of
the paper; extended discussion on different possibilities to define the
Landau gauge and connection to existing scenarios; in v3: Minor changes,
error in eq. (3) & (4) corrected, version to appear in PR
False-Name Manipulation in Weighted Voting Games is Hard for Probabilistic Polynomial Time
False-name manipulation refers to the question of whether a player in a
weighted voting game can increase her power by splitting into several players
and distributing her weight among these false identities. Analogously to this
splitting problem, the beneficial merging problem asks whether a coalition of
players can increase their power in a weighted voting game by merging their
weights. Aziz et al. [ABEP11] analyze the problem of whether merging or
splitting players in weighted voting games is beneficial in terms of the
Shapley-Shubik and the normalized Banzhaf index, and so do Rey and Rothe [RR10]
for the probabilistic Banzhaf index. All these results provide merely
NP-hardness lower bounds for these problems, leaving the question about their
exact complexity open. For the Shapley--Shubik and the probabilistic Banzhaf
index, we raise these lower bounds to hardness for PP, "probabilistic
polynomial time", and provide matching upper bounds for beneficial merging and,
whenever the number of false identities is fixed, also for beneficial
splitting, thus resolving previous conjectures in the affirmative. It follows
from our results that beneficial merging and splitting for these two power
indices cannot be solved in NP, unless the polynomial hierarchy collapses,
which is considered highly unlikely
A Dispersion Operator for Geometric Semantic Genetic Programming
Recent advances in geometric semantic genetic programming (GSGP) have shown that the results obtained by these methods can outperform those obtained by classical genetic programming algorithms, in particular in the context of symbolic regression. However, there are still many open issues on how to improve their search mechanism. One of these issues is how to get around the fact that the GSGP crossover operator cannot generate solutions that are placed outside the convex hull formed by the individuals of the current population. Although the mutation operator alleviates this problem, we cannot guarantee it will find promising regions of the search space within feasible computational time. In this direction, this paper proposes a new geometric dispersion operator that uses multiplicative factors to move individuals to less dense areas of the search space around the target solution before applying semantic genetic operators. Experiments in sixteen datasets show that the results obtained by the proposed operator are statistically significantly better than those produced by GSGP and that the operator does indeed spread the solutions around the target solution
Continuous extremal optimization for Lennard-Jones Clusters
In this paper, we explore a general-purpose heuristic algorithm for finding
high-quality solutions to continuous optimization problems. The method, called
continuous extremal optimization(CEO), can be considered as an extension of
extremal optimization(EO) and is consisted of two components, one is with
responsibility for global searching and the other is with responsibility for
local searching. With only one adjustable parameter, the CEO's performance
proves competitive with more elaborate stochastic optimization procedures. We
demonstrate it on a well known continuous optimization problem: the
Lennerd-Jones clusters optimization problem.Comment: 5 pages and 3 figure
Optimal transport on supply-demand networks
Previously, transport networks are usually treated as homogeneous networks,
that is, every node has the same function, simultaneously providing and
requiring resources. However, some real networks, such as power grid and supply
chain networks, show a far different scenario in which the nodes are classified
into two categories: the supply nodes provide some kinds of services, while the
demand nodes require them. In this paper, we propose a general transport model
for those supply-demand networks, associated with a criterion to quantify their
transport capacities. In a supply-demand network with heterogenous degree
distribution, its transport capacity strongly depends on the locations of
supply nodes. We therefore design a simulated annealing algorithm to find the
optimal configuration of supply nodes, which remarkably enhances the transport
capacity, and outperforms the degree target algorithm, the betweenness target
algorithm, and the greedy method. This work provides a start point for
systematically analyzing and optimizing transport dynamics on supply-demand
networks.Comment: 5 pages, 1 table and 4 figure
Theoretical analysis of the role of chromatin interactions in long-range action of enhancers and insulators
Long-distance regulatory interactions between enhancers and their target
genes are commonplace in higher eukaryotes. Interposed boundaries or insulators
are able to block these long distance regulatory interactions. The mechanistic
basis for insulator activity and how it relates to enhancer
action-at-a-distance remains unclear. Here we explore the idea that topological
loops could simultaneously account for regulatory interactions of distal
enhancers and the insulating activity of boundary elements. We show that while
loop formation is not in itself sufficient to explain action at a distance,
incorporating transient non-specific and moderate attractive interactions
between the chromatin fibers strongly enhances long-distance regulatory
interactions and is sufficient to generate a euchromatin-like state. Under
these same conditions, the subdivision of the loop into two topologically
independent loops by insulators inhibits inter-domain interactions. The
underlying cause of this effect is a suppression of crossings in the contact
map at intermediate distances. Thus our model simultaneously accounts for
regulatory interactions at a distance and the insulator activity of boundary
elements. This unified model of the regulatory roles of chromatin loops makes
several testable predictions that could be confronted with \emph{in vitro}
experiments, as well as genomic chromatin conformation capture and fluorescent
microscopic approaches.Comment: 10 pages, originally submitted to an (undisclosed) journal in May
201
- …