Search CORE

24,192 research outputs found

Higher accuracy protein Multiple Sequence Alignment by Stochastic Algorithm

Author: Alpana Dey
Justin Jose
Krishna Kant
M. S. Jeevitesh
Narayan Behera
Publication venue
Publication date: 03/03/2010
Field of study

Multiple Sequence Alignment gives insight into evolutionary, structural and functional relationships among the proteins. Here, a novel Protein Alignment by Stochastic Algorithm (PASA) is developed. Evolutionary operators of a genetic algorithm, namely, mutation and selection are utilized in combining the output of two most important sequence alignment programs and then developing an optimized new algorithm. Efficiency of protein alignments is evaluated in terms of Total Column score which is equal to the number of correctly aligned columns between a test alignment and the reference alignment divided by the total number of columns in the reference alignment. The PASA optimizer achieves, on an average, significant better alignment over the well known individual bioinformatics tools. This PASA is statistically the most accurate protein alignment method today. It can have potential applications in drug discovery processes in the biotechnology industry

Nature Precedings

Edit Distance: Sketching, Streaming and Document Exchange

Author: Belazzougui Djamal
Zhang Qin
Publication venue
Publication date: 14/07/2016
Field of study

We show that in the document exchange problem, where Alice holds

x \in \{0,1\}^n

and Bob holds

y \in \{0,1\}^n

, Alice can send Bob a message of size

O(K(\log^2 K+\log n))

bits such that Bob can recover

x

using the message and his input

y

if the edit distance between

x

and

y

is no more than

K

, and output "error" otherwise. Both the encoding and decoding can be done in time

\tilde{O}(n+\mathsf{poly}(K))

. This result significantly improves the previous communication bounds under polynomial encoding/decoding time. We also show that in the referee model, where Alice and Bob hold

x

and

y

respectively, they can compute sketches of

x

and

y

of sizes

\mathsf{poly}(K \log n)

bits (the encoding), and send to the referee, who can then compute the edit distance between

x

and

y

together with all the edit operations if the edit distance is no more than

K

, and output "error" otherwise (the decoding). To the best of our knowledge, this is the first result for sketching edit distance using

\mathsf{poly}(K \log n)

bits. Moreover, the encoding phase of our sketching algorithm can be performed by scanning the input string in one pass. Thus our sketching algorithm also implies the first streaming algorithm for computing edit distance and all the edits exactly using

\mathsf{poly}(K \log n)

bits of space.Comment: Full version of an article to be presented at the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016

arXiv.org e-Print Archive

Crossref

Suffix Tree of Alignment: An Efficient Index for Similar Data

Author: A. Amir
D. Gusfield
E. Ukkonen
E.M. McCreight
G. Navarro
H.H. Do
J. Ziv
K. Sadakane
M. Crochemore
M. Farach-Colton
P. Bille
R. Grossi
R.A. Baeza-Yates
S. Huang
S. Karlin
S. Kuruppu
V. Levenshtein
V. Mäkinen
V. Mäkinen
Publication venue
Publication date: 01/01/2013
Field of study

We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings

A

and

B

is a compacted trie representing all suffixes in

A

and

B

. It has

|A|+|B|

leaves and can be constructed in

O(|A|+|B|)

time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of

A

and

B

. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of

A

and

B

has

|A| + l_d + l_1

leaves where

l_d

is the sum of the lengths of all parts of

B

different from

A

and

l_1

is the sum of the lengths of some common parts of

A

and

B

. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern

P

O(|P|+occ)

time where

occ

is the number of occurrences of

P

A

and

B

. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires

O(|A| + l_d + l_1 + l_2)

time where

l_2

is the sum of the lengths of other common substrings of

A

and

B

. When the suffix tree of

A

is already given, it requires

O(l_d + l_1 + l_2)

time.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

Accelerating exhaustive pairwise metagenomic comparisons

Author: A Alyass
B Nichols
BD Ondov
CD Polychronopoulos
G Benoit
G Jing
H Li
JA Hanley
MLV Pitteway
O Gotoh
O Torreno
SF Altschul
Y Liu
Y Liu
Publication venue: Springer, Cham
Publication date: 01/01/2017
Field of study

In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. Parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. These algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show thats equential optimizations yield up to 8x speedup for scenarios with larger data.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

Aligning Multiple Sequences with Genetic Algorithm

Author: Adebiyi E. F.
Akinyemi I. O.
Fatumo S.
Publication venue
Publication date: 01/06/2009
Field of study

The alignment of biological sequences is a crucial tool in molecular biology and genome analysis. It helps to build a phylogenetic tree of related DNA sequences and also to predict the function and structure of unknown protein sequences by aligning with other sequences whose function and structure is already known. However, finding an optimal multiple sequence alignment takes time and space exponential with the length or number of sequences increases. Genetic Algorithms (GAs) are strategies of random searching that optimize an objective function which is a measure of alignment quality (distance) and has the ability for exploratory search through the solution space and exploitation of current results

Covenant University Repository

Sparse Long Blocks and the Micro-Structure of the Longest Common Subsequences

Author: Amsalu S.
Houdré C.
Matzinger H.
Publication venue
Publication date: 01/01/2014
Field of study

Consider two random strings having the same length and generated by an iid sequence taking its values uniformly in a fixed finite alphabet. Artificially place a long constant block into one of the strings, where a constant block is a contiguous substring consisting only of one type of symbol. The long block replaces a segment of equal size and its length is smaller than the length of the strings, but larger than its square-root. We show that for sufficiently long strings the optimal alignment corresponding to a Longest Common Subsequence (LCS) treats the inserted block very differently depending on the size of the alphabet. For two-letter alphabets, the long constant block gets mainly aligned with the same symbol from the other string, while for three or more letters the opposite is true and the block gets mainly aligned with gaps. We further provide simulation results on the proportion of gaps in blocks of various lengths. In our simulations, the blocks are "regular blocks" in an iid sequence, and are not artificially inserted. Nonetheless, we observe for these natural blocks a phenomenon similar to the one shown in case of artificially-inserted blocks: with two letters, the long blocks get aligned with a smaller proportion of gaps; for three or more letters, the opposite is true. It thus appears that the microscopic nature of two-letter optimal alignments and three-letter optimal alignments are entirely different from each other.Comment: To appear: Journal of Statistical Physic

arXiv.org e-Print Archive

CiteSeerX

JigsawNet: Shredded Image Reassembly using Convolutional Neural Network and Loop-based Composition

Author: Le Canyu
Li Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/09/2018
Field of study

This paper proposes a novel algorithm to reassemble an arbitrarily shredded image to its original status. Existing reassembly pipelines commonly consist of a local matching stage and a global compositions stage. In the local stage, a key challenge in fragment reassembly is to reliably compute and identify correct pairwise matching, for which most existing algorithms use handcrafted features, and hence, cannot reliably handle complicated puzzles. We build a deep convolutional neural network to detect the compatibility of a pairwise stitching, and use it to prune computed pairwise matches. To improve the network efficiency and accuracy, we transfer the calculation of CNN to the stitching region and apply a boost training strategy. In the global composition stage, we modify the commonly adopted greedy edge selection strategies to two new loop closure based searching algorithms. Extensive experiments show that our algorithm significantly outperforms existing methods on solving various puzzles, especially those challenging ones with many fragment pieces

arXiv.org e-Print Archive

Louisiana State University