9 research outputs found

    Translation conditional models for protein coding sequences

    Get PDF
    A coding sequence is defined as a DNA sequence coding the primary structure of a protein (a polypeptide). Such a sequence must satisfy a specific constraint, which consists in coding a functional protein, As the genetic code is degenerated, there exists, for a given polypeptide, a set of synonymous sequences which would code the same polypeptide, Translation conditional models are being defined on such sets. The aim of this paper is to give a common formalism, Besides the codon bias model, a few other conditional models will be defined. Statistical estimators and comparison methods will be briefly presented. These models can be used for gene classification, or to find out, in a real sequence, remarkable features. An example will be presented on Escherichia coli genes

    Potential Energy Function for Continuous State Models of Globular Proteins

    Full text link
    One of the approaches to protein structure prediction is to obtain energy functions which can recognize the native conformation of a given sequence among a zoo of conformations. The discriminations can be done by assigning the lowest energy to the native conformation, with the guarantee that the native is in the zoo. Well-adjusted functions, then, can be used in the search for other (near-) natives. Here the aim is the discrimination at relatively high resolution (RMSD difference between the native and the closest nonnative is around 1 Ă…) by pairwise energy potentials. The potential is trained using the experimentally determined native conformation of only one protein, instead of the usual large survey over many proteins. The novel feature is that the native structure is compared to a vastly wider and more challenging array of nonnative structures found not only by the usual threading procedure, but by wide-ranging local minimization of the potential. Because of this extremely demanding search, the native is very close to the apparent global minimum of the potential function. The global minimum property holds up for one other protein having 60% sequence identity, but its performance on completely dissimilar proteins is of course much weaker.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63101/1/106652700750050835.pd

    Bounds on edit metric codes with combinatorial DNA constraints

    Get PDF
    The design of a large and reliable DNA codeword library is a key problem in DNA based computing. DNA codes, namely sets of fixed length edit metric codewords over the alphabet {A, C, G, T}, satisfy certain combinatorial constraints with respect to biological and chemical restrictions of DNA strands. The primary constraints that we consider are the reverse--complement constraint and the fixed GC--content constraint, as well as the basic edit distance constraint between codewords. We focus on exploring the theory underlying DNA codes and discuss several approaches to searching for optimal DNA codes. We use Conway's lexicode algorithm and an exhaustive search algorithm to produce provably optimal DNA codes for codes with small parameter values. And a genetic algorithm is proposed to search for some sub--optimal DNA codes with relatively large parameter values, where we can consider their sizes as reasonable lower bounds of DNA codes. Furthermore, we provide tables of bounds on sizes of DNA codes with length from 1 to 9 and minimum distance from 1 to 9

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Radial Distance Weighted Discrimination

    Get PDF
    Motivated by the challenge of using DNA-seq data to identify viruses in human blood samples, we propose a novel classification algorithm called "Radial Distance Weighted Discrimination" (or Radial DWD). This classifier is designed for binary classification, assuming one class is surrounded by the other class in very diverse radial directions, which is seen to be typical for our virus detection data. This separation of the 2 classes in multiple radial directions naturally motivates the development of Radial DWD. While classical machine learning methods such as the Support Vector Machine and linear Distance Weighted Discrimination, can sometimes give reasonable answers for a given data set, their generalizability is severely compromised because of the linear separating boundary. Radial DWD addresses this challenge by using a much more appropriate (in this particular case) spherical separating boundary. Simulations show that for appropriate radial contexts, this gives much better generalizability than linear methods, and also much better than conventional kernel based (nonlinear) Support Vector Machines, because the latter methods essentially use much of the information in the data for determining the shape of the separating boundary. Real virus detection data also demonstrates the effectiveness of Radial DWD.Doctor of Philosoph
    corecore