23,510 research outputs found
Seven clusters in genomic triplet distributions
Motivation: In several recent papers new algorithms were proposed for detecting coding regions without requiring learning dataset of already known genes. In this paper we studied cluster structure of several genomes in the space of codon usage. This allowed to interpret some of the results obtained in other studies and propose a simpler method, which is, nevertheless, fully
functional.
Results: Several complete genomic sequences were analyzed, using visualization of tables of triplet counts in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions. Awareness of the existence of this structure allows development of methods for the segmentation of sequences into regions with the same coding phase and non-coding regions.
This method may be completely unsupervised or use some external information. Since the method does not need extraction of ORFs, it can be applied even for unassembled genomes. Accuracy calculated on the base-pair level (both sensitivity and specificity) exceeds 90%. This is not worse as compared to such methods as HMM, however, has the advantage to be much simpler and clear
Maximum Likelihood Decoder for Index Coded PSK Modulation for Priority Ordered Receivers
Index coded PSK modulation over an AWGN broadcast channel, for a given index
coding problem (ICP) is studied. For a chosen index code and an arbitrary
mapping (of broadcast vectors to PSK signal points), we have derived a decision
rule for the maximum likelihood (ML) decoder. The message error performance of
a receiver at high SNR is characterized by a parameter called PSK Index Coding
Gain (PSK-ICG). The PSK-ICG of a receiver is determined by a metric called
minimum inter-set distance. For a given ICP with an order of priority among the
receivers, and a chosen -PSK constellation we propose an algorithm to find
(index code, mapping) pairs, each of which gives the best performance in terms
of PSK-ICG of the receivers. No other pair of index code (of length with
broadcast vectors) and mapping can give a better PSK-ICG for the highest
priority receiver. Also, given that the highest priority receiver achieves its
best performance, the next highest priority receiver achieves its maximum gain
possible and so on in the specified order or priority.Comment: 9 pages, 6 figures and 2 table
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Designing structured tight frames via an alternating projection method
Tight frames, also known as general Welch-bound- equality sequences, generalize orthonormal systems. Numerous applications - including communications, coding, and sparse approximation- require finite-dimensional tight frames that possess additional structural properties. This paper proposes an alternating projection method that is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem. To apply this method, one needs only to solve a matrix nearness problem that arises naturally from the design specifications. Therefore, it is the fast and easy to develop versions of the algorithm that target new design problems. Alternating projection will often succeed even if algebraic constructions are unavailable. To demonstrate that alternating projection is an effective tool for frame design, the paper studies some important structural properties in detail. First, it addresses the most basic design problem: constructing tight frames with prescribed vector norms. Then, it discusses equiangular tight frames, which are natural dictionaries for sparse approximation. Finally, it examines tight frames whose individual vectors have low peak-to-average-power ratio (PAR), which is a valuable property for code-division multiple-access (CDMA) applications. Numerical experiments show that the proposed algorithm succeeds in each of these three cases. The appendices investigate the convergence properties of the algorithm
- …