7,831 research outputs found
Gray-box optimization and factorized distribution algorithms: where two worlds collide
The concept of gray-box optimization, in juxtaposition to black-box
optimization, revolves about the idea of exploiting the problem structure to
implement more efficient evolutionary algorithms (EAs). Work on factorized
distribution algorithms (FDAs), whose factorizations are directly derived from
the problem structure, has also contributed to show how exploiting the problem
structure produces important gains in the efficiency of EAs. In this paper we
analyze the general question of using problem structure in EAs focusing on
confronting work done in gray-box optimization with related research
accomplished in FDAs. This contrasted analysis helps us to identify, in current
studies on the use problem structure in EAs, two distinct analytical
characterizations of how these algorithms work. Moreover, we claim that these
two characterizations collide and compete at the time of providing a coherent
framework to investigate this type of algorithms. To illustrate this claim, we
present a contrasted analysis of formalisms, questions, and results produced in
FDAs and gray-box optimization. Common underlying principles in the two
approaches, which are usually overlooked, are identified and discussed.
Besides, an extensive review of previous research related to different uses of
the problem structure in EAs is presented. The paper also elaborates on some of
the questions that arise when extending the use of problem structure in EAs,
such as the question of evolvability, high cardinality of the variables and
large definition sets, constrained and multi-objective problems, etc. Finally,
emergent approaches that exploit neural models to capture the problem structure
are covered.Comment: 33 pages, 9 tables, 3 figures. This paper covers some of the topics
of the talk "When the gray box was opened, model-based evolutionary
algorithms were already there" presented in the Model-Based Evolutionary
Algorithms workshop on July 20, 2016, in Denve
Directional Statistics on Permutations
Distributions over permutations arise in applications ranging from
multi-object tracking to ranking of instances. The difficulty of dealing with
these distributions is caused by the size of their domain, which is factorial
in the number of considered entities (). It makes the direct definition of
a multinomial distribution over permutation space impractical for all but a
very small . In this work we propose an embedding of all permutations
for a given in a surface of a hypersphere defined in
\mathbbm{R}^{(n-1)^2}. As a result of the embedding, we acquire ability to
define continuous distributions over a hypersphere with all the benefits of
directional statistics. We provide polynomial time projections between the
continuous hypersphere representation and the -element permutation space.
The framework provides a way to use continuous directional probability
densities and the methods developed thereof for establishing densities over
permutations. As a demonstration of the benefits of the framework we derive an
inference procedure for a state-space model over permutations. We demonstrate
the approach with applications
DeepChrome: Deep-learning for predicting gene expression from histone modifications
Motivation: Histone modifications are among the most important factors that
control gene regulation. Computational methods that predict gene expression
from histone modification signals are highly desirable for understanding their
combinatorial effects in gene regulation. This knowledge can help in developing
'epigenetic drugs' for diseases like cancer. Previous studies for quantifying
the relationship between histone modifications and gene expression levels
either failed to capture combinatorial effects or relied on multiple methods
that separate predictions and combinatorial analysis. This paper develops a
unified discriminative framework using a deep convolutional neural network to
classify gene expression using histone modification data as input. Our system,
called DeepChrome, allows automatic extraction of complex interactions among
important features. To simultaneously visualize the combinatorial interactions
among histone modifications, we propose a novel optimization-based technique
that generates feature pattern maps from the learnt deep model. This provides
an intuitive description of underlying epigenetic mechanisms that regulate
genes. Results: We show that DeepChrome outperforms state-of-the-art models
like Support Vector Machines and Random Forests for gene expression
classification task on 56 different cell-types from REMC database. The output
of our visualization technique not only validates the previous observations but
also allows novel insights about combinatorial interactions among histone
modification marks, some of which have recently been observed by experimental
studies.Comment: This work will be originally published in Bioinformatics Journal
(ECCB 2016
DAG-GNN: DAG Structure Learning with Graph Neural Networks
Learning a faithful directed acyclic graph (DAG) from samples of a joint
distribution is a challenging combinatorial problem, owing to the intractable
search space superexponential in the number of graph nodes. A recent
breakthrough formulates the problem as a continuous optimization with a
structural constraint that ensures acyclicity (Zheng et al., 2018). The authors
apply the approach to the linear structural equation model (SEM) and the
least-squares loss function that are statistically well justified but
nevertheless limited. Motivated by the widespread success of deep learning that
is capable of capturing complex nonlinear mappings, in this work we propose a
deep generative model and apply a variant of the structural constraint to learn
the DAG. At the heart of the generative model is a variational autoencoder
parameterized by a novel graph neural network architecture, which we coin
DAG-GNN. In addition to the richer capacity, an advantage of the proposed model
is that it naturally handles discrete variables as well as vector-valued ones.
We demonstrate that on synthetic data sets, the proposed method learns more
accurate graphs for nonlinearly generated samples; and on benchmark data sets
with discrete variables, the learned graphs are reasonably close to the global
optima. The code is available at \url{https://github.com/fishmoon1234/DAG-GNN}.Comment: ICML2019. Code is available at
https://github.com/fishmoon1234/DAG-GN
Relational inductive biases, deep learning, and graph networks
Artificial intelligence (AI) has undergone a renaissance recently, making
major progress in key domains such as vision, language, control, and
decision-making. This has been due, in part, to cheap data and cheap compute
resources, which have fit the natural strengths of deep learning. However, many
defining characteristics of human intelligence, which developed under much
different pressures, remain out of reach for current approaches. In particular,
generalizing beyond one's experiences--a hallmark of human intelligence from
infancy--remains a formidable challenge for modern AI.
The following is part position paper, part review, and part unification. We
argue that combinatorial generalization must be a top priority for AI to
achieve human-like abilities, and that structured representations and
computations are key to realizing this objective. Just as biology uses nature
and nurture cooperatively, we reject the false choice between
"hand-engineering" and "end-to-end" learning, and instead advocate for an
approach which benefits from their complementary strengths. We explore how
using relational inductive biases within deep learning architectures can
facilitate learning about entities, relations, and rules for composing them. We
present a new building block for the AI toolkit with a strong relational
inductive bias--the graph network--which generalizes and extends various
approaches for neural networks that operate on graphs, and provides a
straightforward interface for manipulating structured knowledge and producing
structured behaviors. We discuss how graph networks can support relational
reasoning and combinatorial generalization, laying the foundation for more
sophisticated, interpretable, and flexible patterns of reasoning. As a
companion to this paper, we have released an open-source software library for
building graph networks, with demonstrations of how to use them in practice
From Machine Learning to Machine Reasoning
A plausible definition of "reasoning" could be "algebraically manipulating
previously acquired knowledge in order to answer a new question". This
definition covers first-order logical inference or probabilistic inference. It
also includes much simpler manipulations commonly used to build large learning
systems. For instance, we can build an optical character recognition system by
first training a character segmenter, an isolated character recognizer, and a
language model, using appropriate labeled training sets. Adequately
concatenating these modules and fine tuning the resulting system can be viewed
as an algebraic operation in a space of models. The resulting model answers a
new question, that is, converting the image of a text page into a computer
readable text.
This observation suggests a conceptual continuity between algebraically rich
inference systems, such as logical or probabilistic inference, and simple
manipulations, such as the mere concatenation of trainable learning systems.
Therefore, instead of trying to bridge the gap between machine learning systems
and sophisticated "all-purpose" inference mechanisms, we can instead
algebraically enrich the set of manipulations applicable to training systems,
and build reasoning capabilities from the ground up.Comment: 15 pages - fix broken pagination in v
Machine Learning Methods for Data Association in Multi-Object Tracking
Data association is a key step within the multi-object tracking pipeline that
is notoriously challenging due to its combinatorial nature. A popular and
general way to formulate data association is as the NP-hard multidimensional
assignment problem (MDAP). Over the last few years, data-driven approaches to
assignment have become increasingly prevalent as these techniques have started
to mature. We focus this survey solely on learning algorithms for the
assignment step of multi-object tracking, and we attempt to unify various
methods by highlighting their connections to linear assignment as well as to
the MDAP. First, we review probabilistic and end-to-end optimization approaches
to data association, followed by methods that learn association affinities from
data. We then compare the performance of the methods presented in this survey,
and conclude by discussing future research directions.Comment: Accepted for publication in ACM Computing Survey
An Atomistic Machine Learning Package for Surface Science and Catalysis
We present work flows and a software module for machine learning model
building in surface science and heterogeneous catalysis. This includes
fingerprinting atomic structures from 3D structure and/or connectivity
information, it includes descriptor selection methods and benchmarks, and it
includes active learning frameworks for atomic structure optimization,
acceleration of screening studies and for exploration of the structure space of
nano particles, which are all atomic structure problems relevant for surface
science and heterogeneous catalysis. Our overall goal is to provide a
repository to ease machine learning model building for catalysis, to advance
the models beyond the chemical intuition of the user and to increase autonomy
for exploration of chemical space
Robust Continuous Co-Clustering
Clustering consists of grouping together samples giving their similar
properties. The problem of modeling simultaneously groups of samples and
features is known as Co-Clustering. This paper introduces ROCCO - a Robust
Continuous Co-Clustering algorithm. ROCCO is a scalable, hyperparameter-free,
easy and ready to use algorithm to address Co-Clustering problems in practice
over massive cross-domain datasets. It operates by learning a graph-based
two-sided representation of the input matrix. The underlying proposed
optimization problem is non-convex, which assures a flexible pool of solutions.
Moreover, we prove that it can be solved with a near linear time complexity on
the input size. An exhaustive large-scale experimental testbed conducted with
both synthetic and real-world datasets demonstrates ROCCO's properties in
practice: (i) State-of-the-art performance in cross-domain real-world problems
including Biomedicine and Text Mining; (ii) very low sensitivity to
hyperparameter settings; (iii) robustness to noise and (iv) a linear empirical
scalability in practice. These results highlight ROCCO as a powerful
general-purpose co-clustering algorithm for cross-domain practitioners,
regardless of their technical background.Comment: Under reviewing proces
Inference in Graphical Models via Semidefinite Programming Hierarchies
Maximum A posteriori Probability (MAP) inference in graphical models amounts
to solving a graph-structured combinatorial optimization problem. Popular
inference algorithms such as belief propagation (BP) and generalized belief
propagation (GBP) are intimately related to linear programming (LP) relaxation
within the Sherali-Adams hierarchy. Despite the popularity of these algorithms,
it is well understood that the Sum-of-Squares (SOS) hierarchy based on
semidefinite programming (SDP) can provide superior guarantees. Unfortunately,
SOS relaxations for a graph with vertices require solving an SDP with
variables where is the degree in the hierarchy. In
practice, for , this approach does not scale beyond a few tens of
variables. In this paper, we propose binary SDP relaxations for MAP inference
using the SOS hierarchy with two innovations focused on computational
efficiency. Firstly, in analogy to BP and its variants, we only introduce
decision variables corresponding to contiguous regions in the graphical model.
Secondly, we solve the resulting SDP using a non-convex Burer-Monteiro style
method, and develop a sequential rounding procedure. We demonstrate that the
resulting algorithm can solve problems with tens of thousands of variables
within minutes, and outperforms BP and GBP on practical problems such as image
denoising and Ising spin glasses. Finally, for specific graph types, we
establish a sufficient condition for the tightness of the proposed partial SOS
relaxation
- …