Search CORE

25 research outputs found

Recommended from our members

A fast look-up algorithm for detecting repetitive DNA sequences

Author: Guan X.
Uberbacher E.C.
Publication venue: Oak Ridge National Laboratory
Publication date: 31/12/1996
Field of study

We have presented a fast linear time algorithm for recognizing tandem repeats. Our algorithm is a one pass algorithm. No information about the periodicity of tandem repeats is needed. The use of the indices calculated from non-continuous and overlapping {kappa}-tuples allow tandem repeats with insertions and deletions to be recognized

UNT Digital Library

Recommended from our members

A new method for modeling and solving the protein fold recognition problem

Author: Uberbacher E.C.
Xu Dong
Xu Ying
Publication venue: Lockheed Martin Energy Systems, Inc., Oak Ridge, TN (United States)
Publication date: 31/12/1998
Field of study

Computational recognition of native-like folds from a protein fold database is considered to be a promising alternative approach to the ab initio fold prediction. We present a new and effective method for protein fold recognition through optimally aligning (threading) an amino acid sequence and a protein fold (template). A protein fold, in our database, is represented as a series of core secondary structures, and the alignment quality is determined by three factors. They are (1) the fitness between each amino acid and the environment of its assigned (aligned) template position; (2) pairwise interaction preferences between amino acids that are spatially close; and (3) alignment gap penalties. Our threading algorithm constructs an optimum alignment between an amino acid sequence of size n and a protein fold template of size m in 0((m + n{sup 1+0.5C}-M log(n))n{sup C+1}) time and 0(nm + n{sup C+2}) space, where M is the number of core secondary structures in the fold, and C is a (small) nonnegative integer, determined by a mathematical property of the pairwise interactions in the fold. C is less than or equal to 3 for about 90% of the 296 unique folds in our database, when pairwise interactions are restricted to amino acids 3, when threading requires too much memory and time to be practical on a typical workstation

UNT Digital Library

Unsupervised learning of multiple motifs in biopolymers using expectation maximization

Author: A. Bairoch
A.P. Dempster
B. Crombrugghe de
C.B. Harley
C.E. Lawrence
C.E. Lawrence
Charles Elkan
D. Haussler
E.C. Uberbacher
G.D. Stormo
G.D. Stormo
G.D. Stormo
G.Z. Hertz
J.M. Varley
J.R. Quinlan
L. Breiman
L.F. Kolakowski
L.R. Cardon
O.G. Berg
T.L. Bailey
Timothy L. Bailey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1995
Field of study

Crossref

Solving Globally-Optimal Threading Problems in ''Polynomial-Time''

Author: Uberbacher E.C.
Publication venue
Publication date: 04/02/2008
Field of study

Computational protein threading is a powerful technique for recognizing native-like folds of a protein sequence from a protein fold database. In this paper, we present an improved algorithm (over our previous work) for solving the globally-optimal threading problem, and illustrate how the computational complexity and the fold recognition accuracy of the algorithm change as the cutoff distance for pairwise interactions changes. For a given fold of m residues and M core secondary structures (or simply cores) and a protein sequence of n residues, the algorithm guarantees to find a sequence-fold alignment (threading) that is globally optimal, measured collectively by (1) the singleton match fitness, (2) pairwise interaction preference, and (3) alignment gap penalties, in O(mn + MnN{sup 1.5C-1}) time and O(mn + nN{sup C-1}) space. C, the topological complexity of a fold as we term, is a value which characterizes the overall structure of the considered pairwise interactions in the fold, which are typically determined by a specified cutoff distance between the beta carbon atoms of a pair of amino acids in the fold. C is typically a small positive integer. N represents the maximum number of possible alignments between an individual core of the fold and the protein sequence when its neighboring cores are already aligned, and its value is significantly less than n. When interacting amino acids are required to see each other, C is bounded from above by a small integer no matter how large the cutoff distance is. This indicates that the protein threading problem is polynomial-time solvable if the condition of seeing each other between interacting amino acids is sufficient for accurate fold recognition. A number of extensions have been made to our basic threading algorithm to allow finding a globally-optimal threading under various constraints, which include consistencies with (1) specified secondary structures (both cores and loops), (2) disulfide bonds, (3) active sites, etc

SciTec Connect (Office of Scientific and Technical Information - OSTI, U.S. Department of Energy)

Recommended from our members

Reference-based gene model prediction on DNA contigs

Author: Uberbacher E.C.
Xu Y.
Publication venue: Oak Ridge National Laboratory
Publication date: 01/01/1997
Field of study

This paper presents an algorithm for constructing multiple gene models on a set of contigs of a large genomic clone. The algorithm first uses pattern recognition-based methods to locate exons or partial exons in each contig, and then applies protein homology or EST information from the databases, as reference models, to parse the predicted exons into gene models. In the phase of gene model construction, the algorithm uses a unified framework for genes ranging from situation with homologous proteins/ESTs to no homologous protein/EST in the database. By exploiting protein homology or EST information, the algorithm is able to (1) parse exons into multiple gene models over a set of DNA contigs (possibly unoriented and unordered); (2) remove falsely predicted exons; and (3) identify and locate exons missed by the initial exon prediction

UNT Digital Library

Recommended from our members

Gene prediction by pattern recognition and homology search

Author: Uberbacher E.C.
Xu Y.
Publication venue: Oak Ridge National Laboratory
Publication date: 01/05/1996
Field of study

This paper presents an algorithm for combining pattern recognition-based exon prediction and database homology search in gene model construction. The goal is to use homologous genes or partial genes existing in the database as reference models while constructing (multiple) gene models from exon candidates predicted by pattern recognition methods. A unified framework for gene modeling is used for genes ranging from situations with strong homology to no homology in the database. To maximally use the homology information available, the algorithm applies homology on three levels: (1) exon candidate evaluation, (2) gene-segment construction with a reference model, and (3) (complete) gene modeling. Preliminary testing has been done on the algorithm. Test results show that (a) perfect gene modeling can be expected when the initial exon predictions are reasonably good and a strong homology exists in the database; (b) homology (not necessarily strong) in general helps improve the accuracy of gene modeling; (c) multiple gene modeling becomes feasible when homology exists in the database for the involved genes

UNT Digital Library

Recommended from our members

Solving Globally-Optimal Threading Problems in ''Polynomial-Time''

Author: Uberbacher E.C.
Xu D.
Xu Y.
Publication venue: Oak Ridge National Laboratory
Publication date: 12/04/1999
Field of study

UNT Digital Library

Recommended from our members

An iternative algorithm for correcting sequencing errors in DNA coding regions

Author: Mural R.J.
Uberbacher E.C.
Xu Ying
Publication venue: Argonne National Laboratory
Publication date: 31/12/1995
Field of study

Insertion and deletion (indel) sequencing errors in DNA coding regions disrupt DNA-to-protein translation frames, and hence make most frame-sensitive coding recognition approaches fail. This paper extends the authors` previous work on indel detection and `correction` algorithms, and presents a more effective algorithm for localizing indels that appear in DNA coding regions and `correcting` the located indels by inserting or deleting DNA bases. The algorithm localizes indels by discovering changes of the preferred translation frames within presumed coding regions, and then `corrects` the indel errors to restore a consistent translation frame within each coding region. An iterative strategy is exploited to repeatedly localize and `correct` indel errors until no more indels can be found. Test results have shown that the algorithm can accurately locate the positions of indels. The technology presented here has proved to be very useful for single pass EST/cDNA or genomic sequences, and is also often beneficial for higher quality sequences from large genomic clones

UNT Digital Library

Recommended from our members

Image exploitation using multi-sensor/neural network systems

Author: Lee R.W.
Uberbacher E.C.
Xu Y.
Publication venue: Oak Ridge National Laboratory
Publication date: 31/12/1995
Field of study

We have developed and evaluated a tool for change detection and other analysis tasks relevant to image exploitation. The tool, visGRAIL, integrates three key elements: (1) the use of multiple algorithms to extract information from images - feature extractors or {open_quotes}sensors{close_quotes}, (2) an algorithm to fuse the information - presently a neural network, and (3) empirical estimation of the fusion parameters based on a representative set of images. The system was applied to test images in the RADIUS Common Development Environment (RCDE). In a task designed to distinguish natural scenes from those containing various amounts of human-made objects and structure, the system classified correctly 95% of 350 images in a test set. This paper describes details of the feature extractors, and presents analyses of the discriminatory characteristics of the features. visGRAIL has been integrated into the RCDE

UNT Digital Library

Recommended from our members

Optimal reconstruction of a surface using a reference library

Author: Olman V.
Uberbacher E.C.
Xu Ying
Publication venue: Oak Ridge National Laboratory
Publication date: 01/09/1997
Field of study

To reconstruct (approximate) an arbitrary surface using subsurfaces (patches) from a library of surfaces in an optimal way is an interesting algorithmic problem and has many applications in image processing. This paper presents an efficient algorithm for an optimal reconstruction of a query surface using patches from a reference library of surfaces, under the constraint that the smallest patch size is above some specified value. In this algorithm, a surface is given as an integer function f(x, y) over a finite 2-D grid. The algorithm partitions a query surface into patches in such a way that each patch is represented by a similar patch from a library surface, and the total difference between the query surface and the representing (composite) surface is minimized, where the boundary of a patch is not pre-determined but solely determined by the optimization process. By using a minimum spanning tree-based data structure, this optimization problem can be solved efficiently. An application of this technique in computational forensics is outlined

UNT Digital Library