Search CORE

14 research outputs found

OPTYMALIZACJA PROBLEMU NAJWIĘKSZEJ PODTABLICY DLA SPECYFICZNYCH DANYCH

Author: Rojek Tomasz
Publication venue: 'Index Copernicus'
Publication date: 01/01/2017
Field of study

The maximum subarray problem (MSP) is to the find maximum contiguous sum in an array. This paper describes a method of Kadanes algorithm (the state of the art) optimization for specific data (continuous sequences of zeros or negative real numbers). When the data are unfavourable, the modification of the algorithm causes a non significant performance loss (1% > decrease in performance). The modification does not improve time complexity but reduces the number of elementary operations. Various experimental data sets have been used to evaluate possible time efficiency improvement. For the most favourable data sets an increase in efficiency of 25% can be achieved.Problem najwiekszej podtablicy to inaczej znalezienie podciągu, którego suma na największą wartość. Artykuł opisuje optymalizację algorytmu Kadane dla specyficznych danych (z powtarzającymi się ciągami zer lub liczb negatywnych). W przypadku niekorzystnych danych wejściowych zaproponowa modyfikacja nieznacznie spowalnia działanie algorytmu (mniej niż 1% szybkości działania). Ulepszenie algorytmu nie zmienia rzędu asymptotycznego tempa wzrostu, lecz zmniejsza ilość elementarnych operacji. Eksperymenty wykazały, że dla sprzyjających danych możemy zmniejszyć efektywny czas działania algorytmu o 25%

Biblioteka Nauki - repozytorium artykuÅÃ³w

Lublin University of Technology Journals

Locating regions in a sequence under density constraints

Author: Benjamin A. Burton
Boztaş S.
Greenberg R. I.
Huang X.
Lin Y.-L.
Mathias Hiron
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2013
Field of study

Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0s and 1s for a substring whose relative proportion of 1s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.Comment: 17 pages, 8 figures; v2: minor revisions, additional explanations; to appear in SIAM Journal on Computin

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Queensland eSpace

An Optimal Algorithm for the Maximum-Density Segment Problem

Author: Holmquist G. P.
Hsueh-I Lu
Huang X.
Kai-min Chung
Scotto L.
Sueoka N.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 17/11/2003
Field of study

We address a fundamental problem arising from analysis of biomolecular sequences. The input consists of two numbers

w_{\min}

and

w_{\max}

and a sequence

S

n

number pairs

(a_i,w_i)

with

w_i>0

. Let {\em segment}

S(i,j)

S

be the consecutive subsequence of

S

between indices

i

and

j

. The {\em density} of

S(i,j)

d(i,j)=(a_i+a_{i+1}+...+a_j)/(w_i+w_{i+1}+...+w_j)

. The {\em maximum-density segment problem} is to find a maximum-density segment over all segments

S(i,j)

with

w_{\min}\leq w_i+w_{i+1}+...+w_j \leq w_{\max}

. The best previously known algorithm for the problem, due to Goldwasser, Kao, and Lu, runs in

O(n\log(w_{\max}-w_{\min}+1))

time. In the present paper, we solve the problem in O(n) time. Our approach bypasses the complicated {\em right-skew decomposition}, introduced by Lin, Jiang, and Chao. As a result, our algorithm has the capability to process the input sequence in an online manner, which is an important feature for dealing with genome-scale sequences. Moreover, for a type of input sequences

S

representable in

O(m)

space, we show how to exploit the sparsity of

S

and solve the maximum-density segment problem for

S

O(m)

time.Comment: 15 pages, 12 figures, an early version of this paper was presented at 11th Annual European Symposium on Algorithms (ESA 2003), Budapest, Hungary, September 15-20, 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications

Author: Alexandrov
Bentley
Bernardi
Bernardi
Charlesworth
Chung
Duret
Eyre-Walker
Eyre-Walker
Fields
Filipski
Francino
Fullerton
Greenberg
Guldberg
Hardison
Henke
Holmquist
Hsueh-I Lu
Huang
Ikehara
Inman
Jin
Kim
Lin
Macaya
Madsen
Michael H. Goldwasser
Ming-Yang Kao
Murata
Nekrutenko
Rice
Scotto
Sellers
Sharp
Soriano
Stojanovic
Sueoka
Wang
Wolfe
Wu
Zoubak
Publication venue: 'Elsevier BV'
Publication date: 04/11/2002
Field of study

We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (a_i,w_i) for i = 1,..,n and w_i>0, a segment A(i,j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i,j) is w(i,j) = sum_{i <= k <= j} w_k, and the density is (sum_{i<= k <= j} a_k)/ w(i,j). The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm, improving upon the O(n \log L)-time algorithm by Lin, Jiang and Chao. When both L and U are specified, there are no previous nontrivial results. We solve the problem in O(n) time if w_i=1 for all i, and more generally in O(n+n\log(U-L+1)) time when w_i>=1 for all i.Comment: 23 pages, 13 figures. A significant portion of these results appeared under the title, "Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics," in Proceedings of the Second Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes in Computer Science (Springer-Verlag, Berlin), R. Guigo and D. Gusfield editors, 2002, pp. 157--17

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

National Taiwan University Repository

A method for finding single-nucleotide polymorphisms with allele frequencies in sequences of deep coverage

Author: Huang Xiaoqiu
Huang Xiaoqiu
Wang Jianmin
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The allele frequencies of single-nucleotide polymorphisms (SNPs) are needed to select an optimal subset of common SNPs for use in association studies. Sequence-based methods for finding SNPs with allele frequencies may need to handle thousands of sequences from the same genome location (sequences of deep coverage). RESULTS: We describe a computational method for finding common SNPs with allele frequencies in single-pass sequences of deep coverage. The method enhances a widely used program named PolyBayes in several aspects. We present results from our method and PolyBayes on eighteen data sets of human expressed sequence tags (ESTs) with deep coverage. The results indicate that our method used almost all single-pass sequences in computation of the allele frequencies of SNPs. CONCLUSION: The new method is able to handle single-pass sequences of deep coverage efficiently. Our work shows that it is possible to analyze sequences of deep coverage by using pairwise alignments of the sequences with the finished genome sequence, instead of multiple sequence alignments

Digital Repository @ Iowa State University (ISU)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A comparative study of sequence analysis tools in computational biology

Author: Chuang Wei-Jen
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1999
Field of study

A biomolecular object, such as a deoxyribonucleic acid (DNA), a ribonucleic acid (RNA) or a protein molecule, is made up of a long chain of subunits. A protein is represented as a sequence made from 20 different amino acids, each represented as a letter. There are a vast number of ways in which similar structural domains can be generated in proteins by different amino acid sequences. By contrast, the structure of DNA, made up of only four different nucleotide building blocks that occur in two pairs, is relatively simple, regular, and predictable. Biomolecular sequence alignment/string search is the most important issue and challenging task in many areas of science and information processing. It involves identifying one-to-one correspondences between subunits of different sequences. An efficient algorithm or tool is involved with many important factors, these include the following: Scoring systems, Alignment statistics, Database redundancy and sequence repetitiveness. Sequence motifs are derived from multiple alignments and can be used to examine individual sequences or an entire database for subtle patterns. With motifs, it is sometimes possible to detect distant relationships that may not be demonstrable based on comparisons of primary sequences alone. A more comprehensive solution to the efficient string search is approached by building a small, representative set of motifs and using this as a screening database with automatic masking of matching query subsequences. This technology is still under development but recent studies indicate that a representative set of only 1,000 - 3,000 sequences may suffice and such a database can be searched in seconds

Digital Commons @ New Jersey Institute of Technology (NJIT)

Fast and Space-Efficient Location of Heavy or Dense Segments in Run-Length Encoded Sequences

Author: A. Nekrutenko
F. Larsen
M. Gardiner-Garden
R.C. Hardison
S. Hannenhalli
X. Huang
Y. Lin Ling
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2003
Field of study

This paper considers several variations of an optimization problem with potential applications in such areas as biomolecular sequence analysis and image processing. Given a sequence of items, each with a weight and a length, the goal is to find a subsequence of consecutive items of optimal value, where value is either total weight or total weight divided by total length. There may also be a specified lower and/or upper bound on the acceptable length of subsequences. This paper shows that all the variations of the problem are solvable in linear time and space even with non-uniform item lengths and divisible items, implying that run-length encoded sequences can be handled in time and space linear in the number of runs. Furthermore, some problem variations can be solved in constant space. Also, these time and space bounds suffice for certain problem variations in which we call for reporting of many “good” subsequences

Crossref

Loyola eCommons