7,522 research outputs found
Quantification of miRNAs and Their Networks in the light of Integral Value Transformations
MicroRNAs (miRNAs) which are on average only 21-25 nucleotides long are key post-transcriptional regulators of gene expression in metazoans and plants. A proper quantitative understanding of miRNAs is required to comprehend their structures, functions, evolutions etc. In this paper, the nucleotide strings of miRNAs of three organisms namely Homo sapiens (hsa), Macaca mulatta (mml) and Pan troglodytes (ptr) have been quantified and classified based on some characterizing features. A network has been built up among the miRNAs for these three organisms through a class of discrete transformations namely Integral Value Transformations (IVTs), proposed by Sk. S. Hassan et al [1, 2]. Through this study we have been able to nullify or justify one given nucleotide string as a miRNA. This study will help us to recognize a given nucleotide string as a probable miRNA, without the requirement of any conventional biological experiment. This method can be amalgamated with the existing analysis pipelines, for small RNA sequencing data (designed for finding novel miRNA). This method would provide more confidence and would make the current analysis pipeline more efficient in predicting the probable candidates of miRNA for biological validation and filter out the improbable candidates
Artificial Sequences and Complexity Measures
In this paper we exploit concepts of information theory to address the
fundamental problem of identifying and defining the most suitable tools to
extract, in a automatic and agnostic way, information from a generic string of
characters. We introduce in particular a class of methods which use in a
crucial way data compression techniques in order to define a measure of
remoteness and distance between pairs of sequences of characters (e.g. texts)
based on their relative information content. We also discuss in detail how
specific features of data compression techniques could be used to introduce the
notion of dictionary of a given sequence and of Artificial Text and we show how
these new tools can be used for information extraction purposes. We point out
the versatility and generality of our method that applies to any kind of
corpora of character strings independently of the type of coding behind them.
We consider as a case study linguistic motivated problems and we present
results for automatic language recognition, authorship attribution and self
consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression
approach to Information Extraction and Classification" by A. Baronchelli and
V. Loreto. 15 pages; 5 figure
An Efficient Rank Based Approach for Closest String and Closest Substring
This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results
Synchronization Strings: Explicit Constructions, Local Decoding, and Applications
This paper gives new results for synchronization strings, a powerful
combinatorial object that allows to efficiently deal with insertions and
deletions in various communication settings:
We give a deterministic, linear time synchronization string
construction, improving over an time randomized construction.
Independently of this work, a deterministic time
construction was just put on arXiv by Cheng, Li, and Wu. We also give a
deterministic linear time construction of an infinite synchronization string,
which was not known to be computable before. Both constructions are highly
explicit, i.e., the symbol can be computed in time.
This paper also introduces a generalized notion we call
long-distance synchronization strings that allow for local and very fast
decoding. In particular, only time and access to logarithmically
many symbols is required to decode any index.
We give several applications for these results:
For any we provide an insdel correcting
code with rate which can correct any fraction
of insdel errors in time. This near linear computational
efficiency is surprising given that we do not even know how to compute the
(edit) distance between the decoding input and output in sub-quadratic time. We
show that such codes can not only efficiently recover from fraction of
insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any
fraction of block transpositions and replications.
We show that highly explicitness and local decoding allow for
infinite channel simulations with exponentially smaller memory and decoding
time requirements. These simulations can be used to give the first near linear
time interactive coding scheme for insdel errors
Swiftly Computing Center Strings
Hufsky F, Kuchenbecker L, Jahn K, Stoye J, Böcker S. Swiftly Computing Center Strings. BMC Bioinformatics. 2011;12(1): 106
An output-sensitive algorithm for the minimization of 2-dimensional String Covers
String covers are a powerful tool for analyzing the quasi-periodicity of
1-dimensional data and find applications in automata theory, computational
biology, coding and the analysis of transactional data. A \emph{cover} of a
string is a string for which every letter of lies within some
occurrence of . String covers have been generalized in many ways, leading to
\emph{k-covers}, \emph{-covers}, \emph{approximate covers} and were
studied in different contexts such as \emph{indeterminate strings}.
In this paper we generalize string covers to the context of 2-dimensional
data, such as images. We show how they can be used for the extraction of
textures from images and identification of primitive cells in lattice data.
This has interesting applications in image compression, procedural terrain
generation and crystallography
Secret Key Agreement from Correlated Data, with No Prior Information
A fundamental question that has been studied in cryptography and in
information theory is whether two parties can communicate confidentially using
exclusively an open channel. We consider the model in which the two parties
hold inputs that are correlated in a certain sense. This model has been studied
extensively in information theory, and communication protocols have been
designed which exploit the correlation to extract from the inputs a shared
secret key. However, all the existing protocols are not universal in the sense
that they require that the two parties also know some attributes of the
correlation. In other words, they require that each party knows something about
the other party's input. We present a protocol that does not require any prior
additional information. It uses space-bounded Kolmogorov complexity to measure
correlation and it allows the two legal parties to obtain a common key that
looks random to an eavesdropper that observes the communication and is
restricted to use a bounded amount of space for the attack. Thus the protocol
achieves complexity-theoretical security, but it does not use any unproven
result from computational complexity. On the negative side, the protocol is not
efficient in the sense that the computation of the two legal parties uses more
space than the space allowed to the adversary.Comment: Several small errors have been fixed and the presentation has been
improved, following the reviewers' observation
- …