Search CORE

65 research outputs found

Optimal Assembly for High Throughput Shotgun Sequencing

Author: Bresler Guy
Bresler Ma'ayan
Tse David
Publication venue
Publication date: 18/02/2013
Field of study

We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.Comment: 26 pages, 18 figure

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

Regret Bounds and Regimes of Optimality for User-User and Item-Item Collaborative Filtering

Author: Bresler Guy
Karzand Mina
Publication venue
Publication date: 07/05/2019
Field of study

We consider an online model for recommendation systems, with each user being recommended an item at each time-step and providing 'like' or 'dislike' feedback. Each user may be recommended a given item at most once. A latent variable model specifies the user preferences: both users and items are clustered into types. All users of a given type have identical preferences for the items, and similarly, items of a given type are either all liked or all disliked by a given user. We assume that the matrix encoding the preferences of each user type for each item type is randomly generated; in this way, the model captures structure in both the item and user spaces, the amount of structure depending on the number of each of the types. The measure of performance of the recommendation system is the expected number of disliked recommendations per user, defined as expected regret. We propose two algorithms inspired by user-user and item-item collaborative filtering (CF), modified to explicitly make exploratory recommendations, and prove performance guarantees in terms of their expected regret. For two regimes of model parameters, with structure only in item space or only in user space, we prove information-theoretic lower bounds on regret that match our upper bounds up to logarithmic factors. Our analysis elucidates system operating regimes in which existing CF algorithms are nearly optimal.Comment: 51 page

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Interference alignment for the MIMO interference channel

Author: Bresler Guy
Cartwright Dustin
Tse David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2014
Field of study

We study vector space interference alignment for the MIMO interference channel with no time or frequency diversity, and no symbol extensions. We prove both necessary and sufficient conditions for alignment. In particular, we characterize the feasibility of alignment for the symmetric three-user channel where all users transmit along d dimensions, all transmitters have M antennas and all receivers have N antennas, as well as feasibility of alignment for the fully symmetric (M=N) channel with an arbitrary number of users. An implication of our results is that the total degrees of freedom available in a K-user interference channel, using only spatial diversity from the multiple antennas, is at most 2. This is in sharp contrast to the K/2 degrees of freedom shown to be possible by Cadambe and Jafar with arbitrarily large time or frequency diversity. Moving beyond the question of feasibility, we additionally discuss computation of the number of solutions using Schubert calculus in cases where there are a finite number of solutions.Comment: 16 pages, 7 figures, final submitted versio

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Information Storage in the Stochastic Ising Model

Author: Bresler Guy
Goldfeld Ziv
Polyanskiy Yury
Publication venue
Publication date: 08/05/2018
Field of study

Most information systems store data by modifying the local state of matter, in the hope that atomic (or sub-atomic) local interactions would stabilize the state for a sufficiently long time, thereby allowing later recovery. In this work we initiate the study of information retention in locally-interacting systems. The evolution in time of the interacting particles is modeled via the stochastic Ising model (SIM). The initial spin configuration

X_0

serves as the user-controlled input. The output configuration

X_t

is produced by running

t

steps of the Glauber chain. Our main goal is to evaluate the information capacity

I_n(t)\triangleq\max_{p_{X_0}}I(X_0;X_t)

when the time

t

scales with the size of the system

n

. For the zero-temperature SIM on the two-dimensional

\sqrt{n}\times\sqrt{n}

grid and free boundary conditions, it is easy to show that

I_n(t) = \Theta(n)

for

t=O(n)

. In addition, we show that on the order of

\sqrt{n}

bits can be stored for infinite time in striped configurations. The

\sqrt{n}

achievability is optimal when

t\to\infty

and

n

is fixed. One of the main results of this work is an achievability scheme that stores more than

\sqrt{n}

bits (in orders of magnitude) for superlinear (in

n

) times. The analysis of the scheme decomposes the system into

\Omega(\sqrt{n})

independent Z-channels whose crossover probability is found via the (recently rigorously established) Lifshitz law of phase boundary movement. We also provide results for the positive but small temperature regime. We show that an initial configuration drawn according to the Gibbs measure cannot retain more than a single bit for

t\geq e^{cn^{\frac{1}{4}+\epsilon}}

. On the other hand, when scaling time with

\beta

, the stripe-based coding scheme (that stores for infinite time at zero temperature) is shown to retain its bits for time that is exponential in

\beta

arXiv.org e-Print Archive

DSpace@MIT

Hardness of parameter estimation in graphical models

Author: Bresler Guy
Gamarnik David
Shah Devavrat
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of learning the canonical parameters specifying an undirected graphical model (Markov random field) from the mean parameters. For graphical models representing a minimal exponential family, the canonical parameters are uniquely determined by the mean parameters, so the problem is feasible in principle. The goal of this paper is to investigate the computational feasibility of this statistical task. Our main result shows that parameter estimation is in general intractable: no algorithm can learn the canonical parameters of a generic pair-wise binary graphical model from the mean parameters in time bounded by a polynomial in the number of variables (unless RP = NP). Indeed, such a result has been believed to be true (see the monograph by Wainwright and Jordan (2008)) but no proof was known. Our proof gives a polynomial time reduction from approximating the partition function of the hard-core model, known to be hard, to learning approximate parameters. Our reduction entails showing that the marginal polytope boundary has an inherent repulsive property, which validates an optimization procedure over the polytope that does not use any knowledge of its structure (as required by the ellipsoid method and others).Comment: 15 pages. To appear in NIPS 201

arXiv.org e-Print Archive

DSpace@MIT