6 research outputs found

    Reproducing the manual annotation of multiple sequence alignments using a SVM classifier

    Get PDF
    Motivation: Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually remove them. Although considered necessary in some cases, manual editing is time consuming and not reproducible. We present here an automated editing method based on the classification of ‘valid’ and ‘invalid’ sites

    AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses.</p> <p>Results</p> <p>AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, <monospace>Entropy</monospace> being the method that provides the highest number of regions with the greatest length, and <monospace>Weighted</monospace> being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. <it>In silico </it>and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly".</p> <p>Conclusions</p> <p>AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at <url>http://www.scbi.uma.es/alignminer</url>.</p

    H2r: Identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments

    Get PDF
    BACKGROUND: A multiple sequence alignment (MSA) generated for a protein can be used to characterise residues by means of a statistical analysis of single columns. In addition to the examination of individual positions, the investigation of co-variation of amino acid frequencies offers insights into function and evolution of the protein and residues. RESULTS: We introduce conn(k), a novel parameter for the characterisation of individual residues. For each residue k, conn(k) is the number of most extreme signals of co-evolution. These signals were deduced from a normalised mutual information (MI) value U(k, l) computed for all pairs of residues k, l. We demonstrate that conn(k) is a more robust indicator than an individual MI-value for the prediction of residues most plausibly important for the evolution of a protein. This proposition was inferred by means of statistical methods. It was further confirmed by the analysis of several proteins. A server, which computes conn(k)-values is available at http://www-bioinf.uni-regensburg.de. CONCLUSION: The algorithms H2r, which analyses MSAs and computes conn(k)-values, characterises a specific class of residues. In contrast to strictly conserved ones, these residues possess some flexibility in the composition of side chains. However, their allocation is sensibly balanced with several other positions, as indicated by conn(k)

    Automatic extraction of reliable regions from multiple sequence alignments-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Automatic extraction of reliable regions from multiple sequence alignments"</p><p>http://www.biomedcentral.com/1471-2105/8/S5/S9</p><p>BMC Bioinformatics 2007;8(Suppl 5):S9-S9.</p><p>Published online 24 May 2007</p><p>PMCID:PMC1892097.</p><p></p>n to the cumulative running time of the alignment programs used to generate the input alignments. The running times of Mumsa were multiplied by 100 to be visible in the plot. The sequence files were generated by ROSE [16] using an average sequence length of 500 residues and and average evolutionary distance of 250. It is clear that the running time of Mumsa is at least two orders of magnitude lower than that required by the alignment programs

    Automatic extraction of reliable regions from multiple sequence alignments-0

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Automatic extraction of reliable regions from multiple sequence alignments"</p><p>http://www.biomedcentral.com/1471-2105/8/S5/S9</p><p>BMC Bioinformatics 2007;8(Suppl 5):S9-S9.</p><p>Published online 24 May 2007</p><p>PMCID:PMC1892097.</p><p></p>d Dialign alignment of the Balibase 3.0 test case BB20007. The parameter was chosen to be two, requiring that residues in the output alignment appear in at least two input alignments. Each residue is colored according to the average occurrence of the POARs it is involved in. Regions that appear in red are identically aligned in all 5 input alignments while green and blue regions are only aligned identically in fewer and fewer cases. It is clear that all alignment programs find conserved motifs in the sequences but disagree on how the residues in between should be aligned
    corecore