333 research outputs found
"Multiple Sequence Alignment Using External Sources Of Information"
Multiple sequence alignment is an alignment of three or more protein
or nucleic acid sequences. The alignment area has always been of much
interest for researchers, this is due to that fact that many scientifi c researchs
depend in their workflow on sequence alignments. Thus, having an alignment
of high quality is of high importance. Much work has been done and is
still carried in this field to help improving the quality of alignments. Many
approaches have been developed so far for performing pairwise and multiple
sequence alignments, yet, most of those approaches rely basically on the
sequences to be aligned as their only input. Recently, some approaches began
to incorporate additional sources of information in the alignment process, the
sources of external data can come from user knowledge or online databases.
This data, when integrated in the workflow of the alignment programs, may
add new constraints to the produced alignment and improve its quality
by making it biologically more meaningful. In this thesis, I will introduce
new approaches for multiple sequence alignment which use the alignment
software DIALIGN along with external information from databases, where
useful information is extracted and then integrated in the alignment process.
By testing those approaches on benchmark databases, I will show that
using additional data during alignment produced better results than using
DIALIGN alone without any external input other than the sequences to be
aligned
The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation
SeqFEATURE, a tool for protein function annotation, models protein functions described by sequence motifs using a structural representation. The tool shows significantly improved performance over other methods when sequence and structural similarity are low
A tool for reconstructing phylogenies from the composition of protein motifs
The aim of this work was the development of a tool for phylogenetic analysis. In particular, the tool implements an alignment free approach that consider biological signals as vector units. We called it TBP as for Trees from Biologically significant Patterns. Some preliminary experiments hint that some evolutionary signal might be indeed encoded with presence/absence of biologically significant pattern
A structural study for the optimisation of functional motifs encoded in protein sequences
BACKGROUND: A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. RESULTS: Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases), the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. CONCLUSION: Our method can be applied to any type of functional motif or pattern (not only PROSITE ones) which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of structurally conserved residues is already available on request and will be soon accessible on our web server. The procedure is intended for the use of pattern database curators and of scientists interested in a specific protein family for which no specific or selective patterns are yet available
Plant protein-coding gene families: emerging bioinformatics approaches
Protein-coding gene families are sets of similar genes with a shared evolutionary origin and, generally, with similar biological functions. In plants, the size and role of gene families has been only partially addressed. However, suitable bioinformatics tools are being developed to cluster the enormous number of sequences currently available in databases. Specifically, comparative genomic databases promise to become powerful tools for gene family annotation in plant clades. In this review, I evaluate the data retrieved from various gene family databases, the ease with which they can be extracted and how useful the extracted information is
- …