65 research outputs found
Optimal Assembly for High Throughput Shotgun Sequencing
We present a framework for the design of optimal assembly algorithms for
shotgun sequencing under the criterion of complete reconstruction. We derive a
lower bound on the read length and the coverage depth required for
reconstruction in terms of the repeat statistics of the genome. Building on
earlier works, we design a de Brujin graph based assembly algorithm which can
achieve very close to the lower bound for repeat statistics of a wide range of
sequenced genomes, including the GAGE datasets. The results are based on a set
of necessary and sufficient conditions on the DNA sequence and the reads for
reconstruction. The conditions can be viewed as the shotgun sequencing analogue
of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by
Hybridization.Comment: 26 pages, 18 figure
Regret Bounds and Regimes of Optimality for User-User and Item-Item Collaborative Filtering
We consider an online model for recommendation systems, with each user being
recommended an item at each time-step and providing 'like' or 'dislike'
feedback. Each user may be recommended a given item at most once. A latent
variable model specifies the user preferences: both users and items are
clustered into types. All users of a given type have identical preferences for
the items, and similarly, items of a given type are either all liked or all
disliked by a given user. We assume that the matrix encoding the preferences of
each user type for each item type is randomly generated; in this way, the model
captures structure in both the item and user spaces, the amount of structure
depending on the number of each of the types. The measure of performance of the
recommendation system is the expected number of disliked recommendations per
user, defined as expected regret. We propose two algorithms inspired by
user-user and item-item collaborative filtering (CF), modified to explicitly
make exploratory recommendations, and prove performance guarantees in terms of
their expected regret. For two regimes of model parameters, with structure only
in item space or only in user space, we prove information-theoretic lower
bounds on regret that match our upper bounds up to logarithmic factors. Our
analysis elucidates system operating regimes in which existing CF algorithms
are nearly optimal.Comment: 51 page
Interference alignment for the MIMO interference channel
We study vector space interference alignment for the MIMO interference
channel with no time or frequency diversity, and no symbol extensions. We prove
both necessary and sufficient conditions for alignment. In particular, we
characterize the feasibility of alignment for the symmetric three-user channel
where all users transmit along d dimensions, all transmitters have M antennas
and all receivers have N antennas, as well as feasibility of alignment for the
fully symmetric (M=N) channel with an arbitrary number of users.
An implication of our results is that the total degrees of freedom available
in a K-user interference channel, using only spatial diversity from the
multiple antennas, is at most 2. This is in sharp contrast to the K/2 degrees
of freedom shown to be possible by Cadambe and Jafar with arbitrarily large
time or frequency diversity.
Moving beyond the question of feasibility, we additionally discuss
computation of the number of solutions using Schubert calculus in cases where
there are a finite number of solutions.Comment: 16 pages, 7 figures, final submitted versio
Information Storage in the Stochastic Ising Model
Most information systems store data by modifying the local state of matter,
in the hope that atomic (or sub-atomic) local interactions would stabilize the
state for a sufficiently long time, thereby allowing later recovery. In this
work we initiate the study of information retention in locally-interacting
systems. The evolution in time of the interacting particles is modeled via the
stochastic Ising model (SIM). The initial spin configuration serves as
the user-controlled input. The output configuration is produced by
running steps of the Glauber chain. Our main goal is to evaluate the
information capacity when the time
scales with the size of the system . For the zero-temperature SIM on the
two-dimensional grid and free boundary conditions, it
is easy to show that for . In addition, we show
that on the order of bits can be stored for infinite time in striped
configurations. The achievability is optimal when and
is fixed.
One of the main results of this work is an achievability scheme that stores
more than bits (in orders of magnitude) for superlinear (in )
times. The analysis of the scheme decomposes the system into
independent Z-channels whose crossover probability is found via the (recently
rigorously established) Lifshitz law of phase boundary movement. We also
provide results for the positive but small temperature regime. We show that an
initial configuration drawn according to the Gibbs measure cannot retain more
than a single bit for . On the other hand,
when scaling time with , the stripe-based coding scheme (that stores for
infinite time at zero temperature) is shown to retain its bits for time that is
exponential in
Hardness of parameter estimation in graphical models
We consider the problem of learning the canonical parameters specifying an
undirected graphical model (Markov random field) from the mean parameters. For
graphical models representing a minimal exponential family, the canonical
parameters are uniquely determined by the mean parameters, so the problem is
feasible in principle. The goal of this paper is to investigate the
computational feasibility of this statistical task. Our main result shows that
parameter estimation is in general intractable: no algorithm can learn the
canonical parameters of a generic pair-wise binary graphical model from the
mean parameters in time bounded by a polynomial in the number of variables
(unless RP = NP). Indeed, such a result has been believed to be true (see the
monograph by Wainwright and Jordan (2008)) but no proof was known.
Our proof gives a polynomial time reduction from approximating the partition
function of the hard-core model, known to be hard, to learning approximate
parameters. Our reduction entails showing that the marginal polytope boundary
has an inherent repulsive property, which validates an optimization procedure
over the polytope that does not use any knowledge of its structure (as required
by the ellipsoid method and others).Comment: 15 pages. To appear in NIPS 201
- β¦