25,013 research outputs found
Set Cross Entropy: Likelihood-based Permutation Invariant Loss Function for Probability Distributions
We propose a permutation-invariant loss function designed for the neural
networks reconstructing a set of elements without considering the order within
its vector representation. Unlike popular approaches for encoding and decoding
a set, our work does not rely on a carefully engineered network topology nor by
any additional sequential algorithm. The proposed method, Set Cross Entropy,
has a natural information-theoretic interpretation and is related to the
metrics defined for sets. We evaluate the proposed approach in two object
reconstruction tasks and a rule learning task.Comment: The source code will be available at
https://github.com/guicho271828/perminv . (comment for the revision: the
result table was not correctly updated
Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation
We address talker-independent monaural speaker separation from the
perspectives of deep learning and computational auditory scene analysis (CASA).
Specifically, we decompose the multi-speaker separation task into the stages of
simultaneous grouping and sequential grouping. Simultaneous grouping is first
performed in each time frame by separating the spectra of different speakers
with a permutation-invariantly trained neural network. In the second stage, the
frame-level separated spectra are sequentially grouped to different speakers by
a clustering network. The proposed deep CASA approach optimizes frame-level
separation and speaker tracking in turn, and produces excellent results for
both objectives. Experimental results on the benchmark WSJ0-2mix database show
that the new approach achieves the state-of-the-art results with a modest model
size.Comment: 10 pages, 5 figure
Maximum Rank and Asymptotic Rank of Finite Dynamical Systems
A finite dynamical system is a system of multivariate functions over a finite
alphabet used to model a network of interacting entities. The main feature of a
finite dynamical system is its interaction graph, which indicates which local
functions depend on which variables; the interaction graph is a qualitative
representation of the interactions amongst entities on the network. The rank of
a finite dynamical system is the cardinality of its image; the periodic rank is
the number of its periodic points. In this paper, we determine the maximum rank
and the maximum periodic rank of a finite dynamical system with a given
interaction graph over any non-Boolean alphabet. We also obtain a similar
result for Boolean finite dynamical systems (also known as Boolean networks)
whose interaction graphs are contained in a given digraph. We then prove that
the average rank is relatively close (as the size of the alphabet is large) to
the maximum. The results mentioned above only deal with the parallel update
schedule. We finally determine the maximum rank over all block-sequential
update schedules and the supremum periodic rank over all complete update
schedules
Seq2Slate: Re-ranking and Slate Optimization with RNNs
Ranking is a central task in machine learning and information retrieval. In
this task, it is especially important to present the user with a slate of items
that is appealing as a whole. This in turn requires taking into account
interactions between items, since intuitively, placing an item on the slate
affects the decision of which other items should be placed alongside it. In
this work, we propose a sequence-to-sequence model for ranking called
seq2slate. At each step, the model predicts the next `best' item to place on
the slate given the items already selected. The sequential nature of the model
allows complex dependencies between the items to be captured directly in a
flexible and scalable way. We show how to learn the model end-to-end from weak
supervision in the form of easily obtained click-through data. We further
demonstrate the usefulness of our approach in experiments on standard ranking
benchmarks as well as in a real-world recommendation system
A Fast Image Encryption Scheme based on Chaotic Standard Map
In recent years, a variety of effective chaos-based image encryption schemes
have been proposed. The typical structure of these schemes has the permutation
and the diffusion stages performed alternatively. The confusion and diffusion
effect is solely contributed by the permutation and the diffusion stage,
respectively. As a result, more overall rounds than necessary are required to
achieve a certain level of security. In this paper, we suggest to introduce
certain diffusion effect in the confusion stage by simple sequential
add-and-shift operations. The purpose is to reduce the workload of the
time-consuming diffusion part so that fewer overall rounds and hence a shorter
encryption time is needed. Simulation results show that at a similar
performance level, the proposed cryptosystem needs less than one-third the
encryption time of an existing cryptosystem. The effective acceleration of the
encryption speed is thus achieved.Comment: 16 pages, 7 figure
Introducing a Probabilistic Structure on Sequential Dynamical Systems, Simulation and Reduction of Probabilistic Sequential Networks
A probabilistic structure on sequential dynamical systems is introduced here,
the new model will be called Probabilistic Sequential Network, PSN. The
morphisms of Probabilistic Sequential Networks are defined using two algebraic
conditions. It is proved here that two homomorphic Probabilistic Sequential
Networks have the same equilibrium or steady state probabilities if the
morphism is either an epimorphism or a monomorphism. Additionally, the proof of
the set of PSN with its morphisms form the category PSN, having the category of
sequential dynamical systems SDS, as a full subcategory is given. Several
examples of morphisms, subsystems and simulations are given.Comment: 14 page
Finding the Minimal DFA of Very Large Finite State Automata with an Application to Token Passing Networks
Finite state automata (FSA) are ubiquitous in computer science. Two of the
most important algorithms for FSA processing are the conversion of a
non-deterministic finite automaton (NFA) to a deterministic finite automaton
(DFA), and then the production of the unique minimal DFA for the original NFA.
We exhibit a parallel disk-based algorithm that uses a cluster of 29 commodity
computers to produce an intermediate DFA with almost two billion states and
then continues by producing the corresponding unique minimal DFA with less than
800,000 states. The largest previous such computation in the literature was
carried out on a 512-processor CM-5 supercomputer in 1996. That computation
produced an intermediate DFA with 525,000 states and an unreported number of
states for the corresponding minimal DFA. The work is used to provide strong
experimental evidence satisfying a conjecture on a series of token passing
networks. The conjecture concerns stack sortable permutations for a finite
stack and a 3-buffer. The origins of this problem lie in the work on restricted
permutations begun by Knuth and Tarjan in the late 1960s. The parallel
disk-based computation is also compared with both a single-threaded and
multi-threaded RAM-based implementation using a 16-core 128 GB large shared
memory computer.Comment: 14 pages, 4 figure
Simulation of Probabilistic Sequential Systems
In this paper we introduce the idea of probability in the definition of
Sequential Dynamical Systems, thus obtaining a new concept, Probabilistic
Sequential System. The introduction of a probabilistic structure on Sequential
Dynamical Systems is an important and interesting problem.
The notion of homomorphism of our new model, is a natural extension of
homomorphism of sequential dynamical systems introduced and developed by
Laubenbacher and Paregeis in several papers. Our model, give the possibility to
describe the dynamic of the systems using Markov chains and all the advantage
of stochastic theory. The notion of simulation is introduced using the concept
of homomorphisms, as usual. Several examples of homomorphisms, subsystems and
simulations are given
Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling
When using stochastic gradient descent to solve large-scale machine learning
problems, a common practice of data processing is to shuffle the training data,
partition the data across multiple machines if needed, and then perform several
epochs of training on the re-shuffled (either locally or globally) data. The
above procedure makes the instances used to compute the gradients no longer
independently sampled from the training data set. Then does the distributed SGD
method have desirable convergence properties in this practical situation? In
this paper, we give answers to this question. First, we give a mathematical
formulation for the practical data processing procedure in distributed machine
learning, which we call data partition with global/local shuffling. We observe
that global shuffling is equivalent to without-replacement sampling if the
shuffling operations are independent. We prove that SGD with global shuffling
has convergence guarantee in both convex and non-convex cases. An interesting
finding is that, the non-convex tasks like deep learning are more suitable to
apply shuffling comparing to the convex tasks. Second, we conduct the
convergence analysis for SGD with local shuffling. The convergence rate for
local shuffling is slower than that for global shuffling, since it will lose
some information if there's no communication between partitioned data. Finally,
we consider the situation when the permutation after shuffling is not uniformly
distributed (insufficient shuffling), and discuss the condition under which
this insufficiency will not influence the convergence rate. Our theoretical
results provide important insights to large-scale machine learning, especially
in the selection of data processing methods in order to achieve faster
convergence and good speedup. Our theoretical findings are verified by
extensive experiments on logistic regression and deep neural networks
Multi-Issue Social Learning
We consider social learning where agents can only observe part of the
population (modeled as neighbors on an undirected graph), face many decision
problems, and arrival order of the agents is unknown. The central question we
pose is whether there is a natural observability graph that prevents the
information cascade phenomenon. We introduce the `celebrities graph' and prove
that indeed it allows for proper information aggregation in large populations
even when the order at which agents decide is random and even when different
issues are decided in different orders.Comment: Accepted to Mathematical social sciences journa
- …