7,522 research outputs found

    Quantification of miRNAs and Their Networks in the light of Integral Value Transformations

    Get PDF
    MicroRNAs (miRNAs) which are on average only 21-25 nucleotides long are key post-transcriptional regulators of gene expression in metazoans and plants. A proper quantitative understanding of miRNAs is required to comprehend their structures, functions, evolutions etc. In this paper, the nucleotide strings of miRNAs of three organisms namely Homo sapiens (hsa), Macaca mulatta (mml) and Pan troglodytes (ptr) have been quantified and classified based on some characterizing features. A network has been built up among the miRNAs for these three organisms through a class of discrete transformations namely Integral Value Transformations (IVTs), proposed by Sk. S. Hassan et al [1, 2]. Through this study we have been able to nullify or justify one given nucleotide string as a miRNA. This study will help us to recognize a given nucleotide string as a probable miRNA, without the requirement of any conventional biological experiment. This method can be amalgamated with the existing analysis pipelines, for small RNA sequencing data (designed for finding novel miRNA). This method would provide more confidence and would make the current analysis pipeline more efficient in predicting the probable candidates of miRNA for biological validation and filter out the improbable candidates

    Artificial Sequences and Complexity Measures

    Get PDF
    In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression approach to Information Extraction and Classification" by A. Baronchelli and V. Loreto. 15 pages; 5 figure

    An Efficient Rank Based Approach for Closest String and Closest Substring

    Get PDF
    This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results

    Synchronization Strings: Explicit Constructions, Local Decoding, and Applications

    Full text link
    This paper gives new results for synchronization strings, a powerful combinatorial object that allows to efficiently deal with insertions and deletions in various communication settings: \bullet We give a deterministic, linear time synchronization string construction, improving over an O(n5)O(n^5) time randomized construction. Independently of this work, a deterministic O(nlog2logn)O(n\log^2\log n) time construction was just put on arXiv by Cheng, Li, and Wu. We also give a deterministic linear time construction of an infinite synchronization string, which was not known to be computable before. Both constructions are highly explicit, i.e., the ithi^{th} symbol can be computed in O(logi)O(\log i) time. \bullet This paper also introduces a generalized notion we call long-distance synchronization strings that allow for local and very fast decoding. In particular, only O(log3n)O(\log^3 n) time and access to logarithmically many symbols is required to decode any index. We give several applications for these results: \bullet For any δ0\delta0 we provide an insdel correcting code with rate 1δϵ1-\delta-\epsilon which can correct any O(δ)O(\delta) fraction of insdel errors in O(nlog3n)O(n\log^3n) time. This near linear computational efficiency is surprising given that we do not even know how to compute the (edit) distance between the decoding input and output in sub-quadratic time. We show that such codes can not only efficiently recover from δ\delta fraction of insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any O(δ/logn)O(\delta/\log n) fraction of block transpositions and replications. \bullet We show that highly explicitness and local decoding allow for infinite channel simulations with exponentially smaller memory and decoding time requirements. These simulations can be used to give the first near linear time interactive coding scheme for insdel errors

    An output-sensitive algorithm for the minimization of 2-dimensional String Covers

    Full text link
    String covers are a powerful tool for analyzing the quasi-periodicity of 1-dimensional data and find applications in automata theory, computational biology, coding and the analysis of transactional data. A \emph{cover} of a string TT is a string CC for which every letter of TT lies within some occurrence of CC. String covers have been generalized in many ways, leading to \emph{k-covers}, \emph{λ\lambda-covers}, \emph{approximate covers} and were studied in different contexts such as \emph{indeterminate strings}. In this paper we generalize string covers to the context of 2-dimensional data, such as images. We show how they can be used for the extraction of textures from images and identification of primitive cells in lattice data. This has interesting applications in image compression, procedural terrain generation and crystallography

    Secret Key Agreement from Correlated Data, with No Prior Information

    Get PDF
    A fundamental question that has been studied in cryptography and in information theory is whether two parties can communicate confidentially using exclusively an open channel. We consider the model in which the two parties hold inputs that are correlated in a certain sense. This model has been studied extensively in information theory, and communication protocols have been designed which exploit the correlation to extract from the inputs a shared secret key. However, all the existing protocols are not universal in the sense that they require that the two parties also know some attributes of the correlation. In other words, they require that each party knows something about the other party's input. We present a protocol that does not require any prior additional information. It uses space-bounded Kolmogorov complexity to measure correlation and it allows the two legal parties to obtain a common key that looks random to an eavesdropper that observes the communication and is restricted to use a bounded amount of space for the attack. Thus the protocol achieves complexity-theoretical security, but it does not use any unproven result from computational complexity. On the negative side, the protocol is not efficient in the sense that the computation of the two legal parties uses more space than the space allowed to the adversary.Comment: Several small errors have been fixed and the presentation has been improved, following the reviewers' observation
    corecore