119 research outputs found

    Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

    Get PDF
    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

    Process-oriented Iterative Multiple Alignment for Medical Process Mining

    Full text link
    Adapted from biological sequence alignment, trace alignment is a process mining technique used to visualize and analyze workflow data. Any analysis done with this method, however, is affected by the alignment quality. The best existing trace alignment techniques use progressive guide-trees to heuristically approximate the optimal alignment in O(N2L2) time. These algorithms are heavily dependent on the selected guide-tree metric, often return sum-of-pairs-score-reducing errors that interfere with interpretation, and are computationally intensive for large datasets. To alleviate these issues, we propose process-oriented iterative multiple alignment (PIMA), which contains specialized optimizations to better handle workflow data. We demonstrate that PIMA is a flexible framework capable of achieving better sum-of-pairs score than existing trace alignment algorithms in only O(NL2) time. We applied PIMA to analyzing medical workflow data, showing how iterative alignment can better represent the data and facilitate the extraction of insights from data visualization.Comment: accepted at ICDMW 201

    Multiple sequence alignment based on set covers

    Full text link
    We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches

    A methodology for determining amino-acid substitution matrices from set covers

    Full text link
    We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration

    Pre-column dervatization of amino acids from nigella sativa L seed hydrolysates by reversed phase HPLC

    Get PDF
    A rapid and sensitive method for analysis of amino acid hydrolysates of nigella sativa L seed has been developed using O-phthaldialehyde(OPA ) as a pre-column derivatizing agent. OPA reagents in the presence of mercaptoethanol react rapidly with primary amino acids ( less than 60 sec.) to form isindole derivatives which easily separated with good selectivity on ODS column. Resolution of amino acid derivatives is carried out with a methanol gradient in 0.01 maqueous sodium acetate. pH 7.1 . The quantitation of amino acid derivatives is reproducible within an average relative deviation of + 1.4% the linearity for most amino acids were more than 0.9993 with detection limit of 0.2 ppm. 15 amino acid were detected in the analysis of the seed protein hydrolysate. The presence of glutamic acid, alanine, leucine, cystine phenylalanine, aspartic acid in large quantities. The common separated amino acids were detected by U.V at 338 nm within 21 minutes

    Comparison of Sequence Alignment Algorithms

    Get PDF
    The fact that biological sequences can be represented as strings belonging to a finite alphabet (A, C, G, and T for DNA) plays an important role in connecting biology to computer science. String representation allows researchers to apply various string comparison techniques available in computer science. As a result, various applications have been developed that facilitate the task of sequence alignment. The problem of finding sequence alignments consists of finding the best match between two biological sequences. A best match can infer an evolutionary relationship and functional similarity. However, there is a lack of research on how reliable and efficient these applications are especially when it comes to comparing two sequences that might not be highly similar (but could have common patterns that are small yet biologically significant). This study compares two biological sequence comparison packages, namely WuBlast2 and Fasta3, which implement Blast and FastA algorithms, respectively. In order to do so, a framework was developed to facilitate the task of data collection and create meaningful reports. Amino acid sequences corresponding to related proteins, as well as the DNA sequences encoding these proteins, were analyzed with matching parameters for each application. Observations showed a trend of increasing variations between the matches produced by the two applications with decreasing sequence similarity

    AlignStat: a web-tool and R package for statistical comparison of alternative multiple sequence alignments

    Get PDF
    Background: Alternative sequence alignment algorithms yield different results. It is therefore useful to quantify the similarities and differences between alternative alignments of the same sequences. These measurements can identify regions of consensus that are likely to be most informative in downstream analysis. They can also highlight systematic differences between alignments that relate to differences in the alignment algorithms themselves. Results: Here we present a simple method for aligning two alternative multiple sequence alignments to one another and assessing their similarity. Differences are categorised into merges, splits or shifts in one alignment relative to the other. A set of graphical visualisations allow for intuitive interpretation of the data. Conclusions: AlignStat enables the easy one-off online use of MSA similarity comparisons or into R pipelines. The web-tool is available at AlignStat.Science.LaTrobe.edu.au. The R package, readme and example data are available on CRAN and GitHub.com/TS404/AlignStat

    Applying a User-centred Approach to Interactive Visualization Design

    Get PDF
    Analysing users in their context of work and finding out how and why they use different information resources is essential to provide interactive visualisation systems that match their goals and needs. Designers should actively involve the intended users throughout the whole process. This chapter presents a user-centered approach for the design of interactive visualisation systems. We describe three phases of the iterative visualisation design process: the early envisioning phase, the global specification hase, and the detailed specification phase. The whole design cycle is repeated until some criterion of success is reached. We discuss different techniques for the analysis of users, their tasks and domain. Subsequently, the design of prototypes and evaluation methods in visualisation practice are presented. Finally, we discuss the practical challenges in design and evaluation of collaborative visualisation environments. Our own case studies and those of others are used throughout the whole chapter to illustrate various approaches
    corecore