Search CORE

7,522 research outputs found

Quantification of miRNAs and Their Networks in the light of Integral Value Transformations

Author: Arunava Goswami
Navonil De Sarkar
Pabitra Pal Choudhury
Sk. Sarif Hassan
Vrushali Fangal
Publication venue
Publication date: 01/01/2011
Field of study

MicroRNAs (miRNAs) which are on average only 21-25 nucleotides long are key post-transcriptional regulators of gene expression in metazoans and plants. A proper quantitative understanding of miRNAs is required to comprehend their structures, functions, evolutions etc. In this paper, the nucleotide strings of miRNAs of three organisms namely Homo sapiens (hsa), Macaca mulatta (mml) and Pan troglodytes (ptr) have been quantified and classified based on some characterizing features. A network has been built up among the miRNAs for these three organisms through a class of discrete transformations namely Integral Value Transformations (IVTs), proposed by Sk. S. Hassan et al [1, 2]. Through this study we have been able to nullify or justify one given nucleotide string as a miRNA. This study will help us to recognize a given nucleotide string as a probable miRNA, without the requirement of any conventional biological experiment. This method can be amalgamated with the existing analysis pipelines, for small RNA sequencing data (designed for finding novel miRNA). This method would provide more confidence and would make the current analysis pipeline more efficient in predicting the probable candidates of miRNA for biological validation and filter out the improbable candidates

Crossref

Nature Precedings

Artificial Sequences and Complexity Measures

In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression approach to Information Extraction and Classification" by A. Baronchelli and V. Loreto. 15 pages; 5 figure

arXiv.org e-Print Archive

City Research Online

Crossref

Archivio della ricerca- Università di Roma La Sapienza

An Efficient Rank Based Approach for Closest String and Closest Substring

Author: A Ben-Dor
A Dinu
AS Fraser
AS Fraser
AWC Liew
C de la Higuera
Chuhsing Kate Hsiao
DJ States
EV Koonin
F Nicolas
F Nicolas
J Gramm
J Palmer
JC Wooley
K Lanctot
L Schmitt
L Wang
Liviu P. Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
M Chimani
M Frances
M Karpovsky
M Li
P Diaconis
R Holmquist
Radu Ionescu
S Roman
VI Levenshtein
VY Popov
W Banzhaf
X Deng
X Liu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Synchronization Strings: Explicit Constructions, Local Decoding, and Applications

Author: An
Fast
Guruswami Venkatesan
Haeupler Bernhard
Haeupler Bernhard
Haeupler Bernhard
Hemenway Brett
Sherstov Alexander A
Publication venue
Publication date: 09/11/2017
Field of study

This paper gives new results for synchronization strings, a powerful combinatorial object that allows to efficiently deal with insertions and deletions in various communication settings:

\bullet

We give a deterministic, linear time synchronization string construction, improving over an

O(n^5)

time randomized construction. Independently of this work, a deterministic

O(n\log^2\log n)

time construction was just put on arXiv by Cheng, Li, and Wu. We also give a deterministic linear time construction of an infinite synchronization string, which was not known to be computable before. Both constructions are highly explicit, i.e., the

i^{th}

symbol can be computed in

O(\log i)

time.

\bullet

This paper also introduces a generalized notion we call long-distance synchronization strings that allow for local and very fast decoding. In particular, only

O(\log^3 n)

time and access to logarithmically many symbols is required to decode any index. We give several applications for these results:

\bullet

For any

\delta0

we provide an insdel correcting code with rate

1-\delta-\epsilon

which can correct any

O(\delta)

fraction of insdel errors in

O(n\log^3n)

time. This near linear computational efficiency is surprising given that we do not even know how to compute the (edit) distance between the decoding input and output in sub-quadratic time. We show that such codes can not only efficiently recover from

\delta

fraction of insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any

O(\delta/\log n)

fraction of block transpositions and replications.

\bullet

We show that highly explicitness and local decoding allow for infinite channel simulations with exponentially smaller memory and decoding time requirements. These simulations can be used to give the first near linear time interactive coding scheme for insdel errors

arXiv.org e-Print Archive

Crossref

Swiftly Computing Center Strings

Author: B Ma
C Meneses
F Nicolas
Franziska Hufsky
I Yanai
J Davila
J Gramm
Jens Stoye
JK Lanctot
Katharina Jahn
L Wang
Léon Kuchenbecker
M Frances
S Böcker
S Faro
S Rahmann
Sebastian Böcker
T Kelsey
X Liu
Y Wang
ZZ Chen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Hufsky F, Kuchenbecker L, Jahn K, Stoye J, Böcker S. Swiftly Computing Center Strings. BMC Bioinformatics. 2011;12(1): 106

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

MPG.PuRe

An output-sensitive algorithm for the minimization of 2-dimensional String Covers

Author: A Apostolico
A Apostolico
A Bacciotti
A Katok
A Muchnik
A Tychonoff
A Wlodawer
AK Brodzik
AV Aho
DE Knuth
J Kopf
JR Searle
K Perlin
L Bursill
R Middlestead
RS Bird
S Havlin
WA Sethares
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/05/2019
Field of study

String covers are a powerful tool for analyzing the quasi-periodicity of 1-dimensional data and find applications in automata theory, computational biology, coding and the analysis of transactional data. A \emph{cover} of a string

T

is a string

C

for which every letter of

T

lies within some occurrence of

C

. String covers have been generalized in many ways, leading to \emph{k-covers}, \emph{

\lambda

-covers}, \emph{approximate covers} and were studied in different contexts such as \emph{indeterminate strings}. In this paper we generalize string covers to the context of 2-dimensional data, such as images. We show how they can be used for the extraction of textures from images and identification of primitive cells in lattice data. This has interesting applications in image compression, procedural terrain generation and crystallography

arXiv.org e-Print Archive

Crossref

Secret Key Agreement from Correlated Data, with No Prior Information

Author: Zimand Marius
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)
Publication date: 01/01/2020
Field of study

A fundamental question that has been studied in cryptography and in information theory is whether two parties can communicate confidentially using exclusively an open channel. We consider the model in which the two parties hold inputs that are correlated in a certain sense. This model has been studied extensively in information theory, and communication protocols have been designed which exploit the correlation to extract from the inputs a shared secret key. However, all the existing protocols are not universal in the sense that they require that the two parties also know some attributes of the correlation. In other words, they require that each party knows something about the other party's input. We present a protocol that does not require any prior additional information. It uses space-bounded Kolmogorov complexity to measure correlation and it allows the two legal parties to obtain a common key that looks random to an eavesdropper that observes the communication and is restricted to use a bounded amount of space for the attack. Thus the protocol achieves complexity-theoretical security, but it does not use any unproven result from computational complexity. On the negative side, the protocol is not efficient in the sense that the computation of the two legal parties uses more space than the space allowed to the adversary.Comment: Several small errors have been fixed and the presentation has been improved, following the reviewers' observation

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server