2,877 research outputs found
Optimal Assembly for High Throughput Shotgun Sequencing
We present a framework for the design of optimal assembly algorithms for
shotgun sequencing under the criterion of complete reconstruction. We derive a
lower bound on the read length and the coverage depth required for
reconstruction in terms of the repeat statistics of the genome. Building on
earlier works, we design a de Brujin graph based assembly algorithm which can
achieve very close to the lower bound for repeat statistics of a wide range of
sequenced genomes, including the GAGE datasets. The results are based on a set
of necessary and sufficient conditions on the DNA sequence and the reads for
reconstruction. The conditions can be viewed as the shotgun sequencing analogue
of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by
Hybridization.Comment: 26 pages, 18 figure
Generic Feasibility of Perfect Reconstruction with Short FIR Filters in Multi-channel Systems
We study the feasibility of short finite impulse response (FIR) synthesis for
perfect reconstruction (PR) in generic FIR filter banks. Among all PR synthesis
banks, we focus on the one with the minimum filter length. For filter banks
with oversampling factors of at least two, we provide prescriptions for the
shortest filter length of the synthesis bank that would guarantee PR almost
surely. The prescribed length is as short or shorter than the analysis filters
and has an approximate inverse relationship with the oversampling factor. Our
results are in form of necessary and sufficient statements that hold
generically, hence only fail for elaborately-designed nongeneric examples. We
provide extensive numerical verification of the theoretical results and
demonstrate that the gap between the derived filter length prescriptions and
the true minimum is small. The results have potential applications in synthesis
FB design problems, where the analysis bank is given, and for analysis of
fundamental limitations in blind signals reconstruction from data collected by
unknown subsampled multi-channel systems.Comment: Manuscript submitted to IEEE Transactions on Signal Processin
Regret Bounds and Regimes of Optimality for User-User and Item-Item Collaborative Filtering
We consider an online model for recommendation systems, with each user being
recommended an item at each time-step and providing 'like' or 'dislike'
feedback. Each user may be recommended a given item at most once. A latent
variable model specifies the user preferences: both users and items are
clustered into types. All users of a given type have identical preferences for
the items, and similarly, items of a given type are either all liked or all
disliked by a given user. We assume that the matrix encoding the preferences of
each user type for each item type is randomly generated; in this way, the model
captures structure in both the item and user spaces, the amount of structure
depending on the number of each of the types. The measure of performance of the
recommendation system is the expected number of disliked recommendations per
user, defined as expected regret. We propose two algorithms inspired by
user-user and item-item collaborative filtering (CF), modified to explicitly
make exploratory recommendations, and prove performance guarantees in terms of
their expected regret. For two regimes of model parameters, with structure only
in item space or only in user space, we prove information-theoretic lower
bounds on regret that match our upper bounds up to logarithmic factors. Our
analysis elucidates system operating regimes in which existing CF algorithms
are nearly optimal.Comment: 51 page
- …