23,510 research outputs found

    Seven clusters in genomic triplet distributions

    Get PDF
    Motivation: In several recent papers new algorithms were proposed for detecting coding regions without requiring learning dataset of already known genes. In this paper we studied cluster structure of several genomes in the space of codon usage. This allowed to interpret some of the results obtained in other studies and propose a simpler method, which is, nevertheless, fully functional. Results: Several complete genomic sequences were analyzed, using visualization of tables of triplet counts in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions. Awareness of the existence of this structure allows development of methods for the segmentation of sequences into regions with the same coding phase and non-coding regions. This method may be completely unsupervised or use some external information. Since the method does not need extraction of ORFs, it can be applied even for unassembled genomes. Accuracy calculated on the base-pair level (both sensitivity and specificity) exceeds 90%. This is not worse as compared to such methods as HMM, however, has the advantage to be much simpler and clear

    Maximum Likelihood Decoder for Index Coded PSK Modulation for Priority Ordered Receivers

    Full text link
    Index coded PSK modulation over an AWGN broadcast channel, for a given index coding problem (ICP) is studied. For a chosen index code and an arbitrary mapping (of broadcast vectors to PSK signal points), we have derived a decision rule for the maximum likelihood (ML) decoder. The message error performance of a receiver at high SNR is characterized by a parameter called PSK Index Coding Gain (PSK-ICG). The PSK-ICG of a receiver is determined by a metric called minimum inter-set distance. For a given ICP with an order of priority among the receivers, and a chosen 2N2^N-PSK constellation we propose an algorithm to find (index code, mapping) pairs, each of which gives the best performance in terms of PSK-ICG of the receivers. No other pair of index code (of length NN with 2N2^N broadcast vectors) and mapping can give a better PSK-ICG for the highest priority receiver. Also, given that the highest priority receiver achieves its best performance, the next highest priority receiver achieves its maximum gain possible and so on in the specified order or priority.Comment: 9 pages, 6 figures and 2 table

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Designing structured tight frames via an alternating projection method

    Get PDF
    Tight frames, also known as general Welch-bound- equality sequences, generalize orthonormal systems. Numerous applications - including communications, coding, and sparse approximation- require finite-dimensional tight frames that possess additional structural properties. This paper proposes an alternating projection method that is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem. To apply this method, one needs only to solve a matrix nearness problem that arises naturally from the design specifications. Therefore, it is the fast and easy to develop versions of the algorithm that target new design problems. Alternating projection will often succeed even if algebraic constructions are unavailable. To demonstrate that alternating projection is an effective tool for frame design, the paper studies some important structural properties in detail. First, it addresses the most basic design problem: constructing tight frames with prescribed vector norms. Then, it discusses equiangular tight frames, which are natural dictionaries for sparse approximation. Finally, it examines tight frames whose individual vectors have low peak-to-average-power ratio (PAR), which is a valuable property for code-division multiple-access (CDMA) applications. Numerical experiments show that the proposed algorithm succeeds in each of these three cases. The appendices investigate the convergence properties of the algorithm
    corecore