333 research outputs found

    "Multiple Sequence Alignment Using External Sources Of Information"

    Get PDF
    Multiple sequence alignment is an alignment of three or more protein or nucleic acid sequences. The alignment area has always been of much interest for researchers, this is due to that fact that many scientifi c researchs depend in their workflow on sequence alignments. Thus, having an alignment of high quality is of high importance. Much work has been done and is still carried in this field to help improving the quality of alignments. Many approaches have been developed so far for performing pairwise and multiple sequence alignments, yet, most of those approaches rely basically on the sequences to be aligned as their only input. Recently, some approaches began to incorporate additional sources of information in the alignment process, the sources of external data can come from user knowledge or online databases. This data, when integrated in the workflow of the alignment programs, may add new constraints to the produced alignment and improve its quality by making it biologically more meaningful. In this thesis, I will introduce new approaches for multiple sequence alignment which use the alignment software DIALIGN along with external information from databases, where useful information is extracted and then integrated in the alignment process. By testing those approaches on benchmark databases, I will show that using additional data during alignment produced better results than using DIALIGN alone without any external input other than the sequences to be aligned

    New algorithms and methods for protein and DNA sequence comparison

    Get PDF

    The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation

    Get PDF
    SeqFEATURE, a tool for protein function annotation, models protein functions described by sequence motifs using a structural representation. The tool shows significantly improved performance over other methods when sequence and structural similarity are low

    A tool for reconstructing phylogenies from the composition of protein motifs

    Get PDF
    The aim of this work was the development of a tool for phylogenetic analysis. In particular, the tool implements an alignment free approach that consider biological signals as vector units. We called it TBP as for Trees from Biologically significant Patterns. Some preliminary experiments hint that some evolutionary signal might be indeed encoded with presence/absence of biologically significant pattern

    A structural study for the optimisation of functional motifs encoded in protein sequences

    Get PDF
    BACKGROUND: A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. RESULTS: Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases), the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. CONCLUSION: Our method can be applied to any type of functional motif or pattern (not only PROSITE ones) which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of structurally conserved residues is already available on request and will be soon accessible on our web server. The procedure is intended for the use of pattern database curators and of scientists interested in a specific protein family for which no specific or selective patterns are yet available

    Plant protein-coding gene families: emerging bioinformatics approaches

    Get PDF
    Protein-coding gene families are sets of similar genes with a shared evolutionary origin and, generally, with similar biological functions. In plants, the size and role of gene families has been only partially addressed. However, suitable bioinformatics tools are being developed to cluster the enormous number of sequences currently available in databases. Specifically, comparative genomic databases promise to become powerful tools for gene family annotation in plant clades. In this review, I evaluate the data retrieved from various gene family databases, the ease with which they can be extracted and how useful the extracted information is
    • …
    corecore