65 research outputs found

    Optimal Assembly for High Throughput Shotgun Sequencing

    Get PDF
    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.Comment: 26 pages, 18 figure

    Regret Bounds and Regimes of Optimality for User-User and Item-Item Collaborative Filtering

    Full text link
    We consider an online model for recommendation systems, with each user being recommended an item at each time-step and providing 'like' or 'dislike' feedback. Each user may be recommended a given item at most once. A latent variable model specifies the user preferences: both users and items are clustered into types. All users of a given type have identical preferences for the items, and similarly, items of a given type are either all liked or all disliked by a given user. We assume that the matrix encoding the preferences of each user type for each item type is randomly generated; in this way, the model captures structure in both the item and user spaces, the amount of structure depending on the number of each of the types. The measure of performance of the recommendation system is the expected number of disliked recommendations per user, defined as expected regret. We propose two algorithms inspired by user-user and item-item collaborative filtering (CF), modified to explicitly make exploratory recommendations, and prove performance guarantees in terms of their expected regret. For two regimes of model parameters, with structure only in item space or only in user space, we prove information-theoretic lower bounds on regret that match our upper bounds up to logarithmic factors. Our analysis elucidates system operating regimes in which existing CF algorithms are nearly optimal.Comment: 51 page

    Interference alignment for the MIMO interference channel

    Full text link
    We study vector space interference alignment for the MIMO interference channel with no time or frequency diversity, and no symbol extensions. We prove both necessary and sufficient conditions for alignment. In particular, we characterize the feasibility of alignment for the symmetric three-user channel where all users transmit along d dimensions, all transmitters have M antennas and all receivers have N antennas, as well as feasibility of alignment for the fully symmetric (M=N) channel with an arbitrary number of users. An implication of our results is that the total degrees of freedom available in a K-user interference channel, using only spatial diversity from the multiple antennas, is at most 2. This is in sharp contrast to the K/2 degrees of freedom shown to be possible by Cadambe and Jafar with arbitrarily large time or frequency diversity. Moving beyond the question of feasibility, we additionally discuss computation of the number of solutions using Schubert calculus in cases where there are a finite number of solutions.Comment: 16 pages, 7 figures, final submitted versio

    Information Storage in the Stochastic Ising Model

    Full text link
    Most information systems store data by modifying the local state of matter, in the hope that atomic (or sub-atomic) local interactions would stabilize the state for a sufficiently long time, thereby allowing later recovery. In this work we initiate the study of information retention in locally-interacting systems. The evolution in time of the interacting particles is modeled via the stochastic Ising model (SIM). The initial spin configuration X0X_0 serves as the user-controlled input. The output configuration XtX_t is produced by running tt steps of the Glauber chain. Our main goal is to evaluate the information capacity In(t)β‰œmax⁑pX0I(X0;Xt)I_n(t)\triangleq\max_{p_{X_0}}I(X_0;X_t) when the time tt scales with the size of the system nn. For the zero-temperature SIM on the two-dimensional nΓ—n\sqrt{n}\times\sqrt{n} grid and free boundary conditions, it is easy to show that In(t)=Θ(n)I_n(t) = \Theta(n) for t=O(n)t=O(n). In addition, we show that on the order of n\sqrt{n} bits can be stored for infinite time in striped configurations. The n\sqrt{n} achievability is optimal when tβ†’βˆžt\to\infty and nn is fixed. One of the main results of this work is an achievability scheme that stores more than n\sqrt{n} bits (in orders of magnitude) for superlinear (in nn) times. The analysis of the scheme decomposes the system into Ξ©(n)\Omega(\sqrt{n}) independent Z-channels whose crossover probability is found via the (recently rigorously established) Lifshitz law of phase boundary movement. We also provide results for the positive but small temperature regime. We show that an initial configuration drawn according to the Gibbs measure cannot retain more than a single bit for tβ‰₯ecn14+Ο΅t\geq e^{cn^{\frac{1}{4}+\epsilon}}. On the other hand, when scaling time with Ξ²\beta, the stripe-based coding scheme (that stores for infinite time at zero temperature) is shown to retain its bits for time that is exponential in Ξ²\beta

    Hardness of parameter estimation in graphical models

    Full text link
    We consider the problem of learning the canonical parameters specifying an undirected graphical model (Markov random field) from the mean parameters. For graphical models representing a minimal exponential family, the canonical parameters are uniquely determined by the mean parameters, so the problem is feasible in principle. The goal of this paper is to investigate the computational feasibility of this statistical task. Our main result shows that parameter estimation is in general intractable: no algorithm can learn the canonical parameters of a generic pair-wise binary graphical model from the mean parameters in time bounded by a polynomial in the number of variables (unless RP = NP). Indeed, such a result has been believed to be true (see the monograph by Wainwright and Jordan (2008)) but no proof was known. Our proof gives a polynomial time reduction from approximating the partition function of the hard-core model, known to be hard, to learning approximate parameters. Our reduction entails showing that the marginal polytope boundary has an inherent repulsive property, which validates an optimization procedure over the polytope that does not use any knowledge of its structure (as required by the ellipsoid method and others).Comment: 15 pages. To appear in NIPS 201
    • …
    corecore