119 research outputs found
Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment
Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique
in bioinformatics used to infer related residues among biological sequences.
Thus alignment accuracy is crucial to a vast range of analyses, often in ways
difficult to assess in those analyses. To compare the performance of different
aligners and help detect systematic errors in alignments, a number of
benchmarking strategies have been pursued. Here we present an overview of the
main strategies--based on simulation, consistency, protein structure, and
phylogeny--and discuss their different advantages and associated risks. We
outline a set of desirable characteristics for effective benchmarking, and
evaluate each strategy in light of them. We conclude that there is currently no
universally applicable means of benchmarking MSA, and that developers and users
of alignment tools should base their choice of benchmark depending on the
context of application--with a keen awareness of the assumptions underlying
each benchmarking strategy.Comment: Revie
Process-oriented Iterative Multiple Alignment for Medical Process Mining
Adapted from biological sequence alignment, trace alignment is a process
mining technique used to visualize and analyze workflow data. Any analysis done
with this method, however, is affected by the alignment quality. The best
existing trace alignment techniques use progressive guide-trees to
heuristically approximate the optimal alignment in O(N2L2) time. These
algorithms are heavily dependent on the selected guide-tree metric, often
return sum-of-pairs-score-reducing errors that interfere with interpretation,
and are computationally intensive for large datasets. To alleviate these
issues, we propose process-oriented iterative multiple alignment (PIMA), which
contains specialized optimizations to better handle workflow data. We
demonstrate that PIMA is a flexible framework capable of achieving better
sum-of-pairs score than existing trace alignment algorithms in only O(NL2)
time. We applied PIMA to analyzing medical workflow data, showing how iterative
alignment can better represent the data and facilitate the extraction of
insights from data visualization.Comment: accepted at ICDMW 201
Multiple sequence alignment based on set covers
We introduce a new heuristic for the multiple alignment of a set of
sequences. The heuristic is based on a set cover of the residue alphabet of the
sequences, and also on the determination of a significant set of blocks
comprising subsequences of the sequences to be aligned. These blocks are
obtained with the aid of a new data structure, called a suffix-set tree, which
is constructed from the input sequences with the guidance of the
residue-alphabet set cover and generalizes the well-known suffix tree of the
sequence set. We provide performance results on selected BAliBASE amino-acid
sequences and compare them with those yielded by some prominent approaches
A methodology for determining amino-acid substitution matrices from set covers
We introduce a new methodology for the determination of amino-acid
substitution matrices for use in the alignment of proteins. The new methodology
is based on a pre-existing set cover on the set of residues and on the
undirected graph that describes residue exchangeability given the set cover.
For fixed functional forms indicating how to obtain edge weights from the set
cover and, after that, substitution-matrix elements from weighted distances on
the graph, the resulting substitution matrix can be checked for performance
against some known set of reference alignments and for given gap costs. Finding
the appropriate functional forms and gap costs can then be formulated as an
optimization problem that seeks to maximize the performance of the substitution
matrix on the reference alignment set. We give computational results on the
BAliBASE suite using a genetic algorithm for optimization. Our results indicate
that it is possible to obtain substitution matrices whose performance is either
comparable to or surpasses that of several others, depending on the particular
scenario under consideration
Pre-column dervatization of amino acids from nigella sativa L seed hydrolysates by reversed phase HPLC
A rapid and sensitive method for analysis of amino acid hydrolysates of nigella sativa L seed has been developed using O-phthaldialehyde(OPA ) as a pre-column derivatizing agent. OPA reagents in the presence of mercaptoethanol react rapidly with primary amino acids ( less than 60 sec.) to form isindole derivatives which easily separated with good selectivity on ODS column.
Resolution of amino acid derivatives is carried out with a methanol gradient in 0.01 maqueous sodium acetate. pH 7.1 .
The quantitation of amino acid derivatives is reproducible within an average relative deviation of + 1.4% the linearity for most amino acids were more than 0.9993 with detection limit of 0.2 ppm. 15 amino acid were detected in the analysis of the seed protein hydrolysate. The presence of glutamic acid, alanine, leucine, cystine phenylalanine, aspartic acid in large quantities. The common separated amino acids were detected by U.V at 338 nm within 21 minutes
Comparison of Sequence Alignment Algorithms
The fact that biological sequences can be represented as strings belonging to a finite alphabet (A, C, G, and T for DNA) plays an important role in connecting biology to computer science. String representation allows researchers to apply various string comparison techniques available in computer science. As a result, various applications have been developed that facilitate the task of sequence alignment. The problem of finding sequence alignments consists of finding the best match between two biological sequences. A best match can infer an evolutionary relationship and functional similarity. However, there is a lack of research on how reliable and efficient these applications are especially when it comes to comparing two sequences that might not be highly similar (but could have common patterns that are small yet biologically significant). This study compares two biological sequence comparison packages, namely WuBlast2 and Fasta3, which implement Blast and FastA algorithms, respectively. In order to do so, a framework was developed to facilitate the task of data collection and create meaningful reports. Amino acid sequences corresponding to related proteins, as well as the DNA sequences encoding these proteins, were analyzed with matching parameters for each application. Observations showed a trend of increasing variations between the matches produced by the two applications with decreasing sequence similarity
AlignStat: a web-tool and R package for statistical comparison of alternative multiple sequence alignments
Background: Alternative sequence alignment algorithms yield different results. It is therefore useful to quantify the similarities and differences between alternative alignments of the same sequences. These measurements can identify regions of consensus that are likely to be most informative in downstream analysis. They can also highlight systematic differences between alignments that relate to differences in the alignment algorithms themselves.
Results: Here we present a simple method for aligning two alternative multiple sequence alignments to one another and assessing their similarity. Differences are categorised into merges, splits or shifts in one alignment relative to the other. A set of graphical visualisations allow for intuitive interpretation of the data.
Conclusions: AlignStat enables the easy one-off online use of MSA similarity comparisons or into R pipelines. The web-tool is available at AlignStat.Science.LaTrobe.edu.au. The R package, readme and example data are available on CRAN and GitHub.com/TS404/AlignStat
Applying a User-centred Approach to Interactive Visualization Design
Analysing users in their context of work and finding out how and why they use different information resources is essential to provide interactive visualisation systems that match their goals and needs. Designers should actively involve the intended users throughout the whole process. This chapter presents a user-centered approach for the design of interactive visualisation systems. We describe three phases of the iterative visualisation design process: the early envisioning phase, the global specification hase, and the detailed specification phase. The whole design cycle is repeated until some criterion of success is reached. We discuss different techniques for the analysis of users, their tasks and domain. Subsequently, the design of prototypes and evaluation methods in visualisation practice are presented. Finally, we discuss the practical challenges in design and evaluation of collaborative visualisation environments. Our own case studies and those of others are used throughout the whole chapter to illustrate various approaches
- …