2,877 research outputs found

    Optimal Assembly for High Throughput Shotgun Sequencing

    Get PDF
    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.Comment: 26 pages, 18 figure

    Generic Feasibility of Perfect Reconstruction with Short FIR Filters in Multi-channel Systems

    Full text link
    We study the feasibility of short finite impulse response (FIR) synthesis for perfect reconstruction (PR) in generic FIR filter banks. Among all PR synthesis banks, we focus on the one with the minimum filter length. For filter banks with oversampling factors of at least two, we provide prescriptions for the shortest filter length of the synthesis bank that would guarantee PR almost surely. The prescribed length is as short or shorter than the analysis filters and has an approximate inverse relationship with the oversampling factor. Our results are in form of necessary and sufficient statements that hold generically, hence only fail for elaborately-designed nongeneric examples. We provide extensive numerical verification of the theoretical results and demonstrate that the gap between the derived filter length prescriptions and the true minimum is small. The results have potential applications in synthesis FB design problems, where the analysis bank is given, and for analysis of fundamental limitations in blind signals reconstruction from data collected by unknown subsampled multi-channel systems.Comment: Manuscript submitted to IEEE Transactions on Signal Processin

    Regret Bounds and Regimes of Optimality for User-User and Item-Item Collaborative Filtering

    Full text link
    We consider an online model for recommendation systems, with each user being recommended an item at each time-step and providing 'like' or 'dislike' feedback. Each user may be recommended a given item at most once. A latent variable model specifies the user preferences: both users and items are clustered into types. All users of a given type have identical preferences for the items, and similarly, items of a given type are either all liked or all disliked by a given user. We assume that the matrix encoding the preferences of each user type for each item type is randomly generated; in this way, the model captures structure in both the item and user spaces, the amount of structure depending on the number of each of the types. The measure of performance of the recommendation system is the expected number of disliked recommendations per user, defined as expected regret. We propose two algorithms inspired by user-user and item-item collaborative filtering (CF), modified to explicitly make exploratory recommendations, and prove performance guarantees in terms of their expected regret. For two regimes of model parameters, with structure only in item space or only in user space, we prove information-theoretic lower bounds on regret that match our upper bounds up to logarithmic factors. Our analysis elucidates system operating regimes in which existing CF algorithms are nearly optimal.Comment: 51 page
    • …
    corecore