79,574 research outputs found
Dependence Cluster Visualization
Large clusters of mutual dependence have long been regarded as
a problem impeding comprehension, testing, maintenance, and reverse
engineering. An effective visualization can aid an engineer
in addressing the presence of large clusters. Such a visualization is
presented. It allows a program’s dependence clusters to be considered
from an abstract high level down thru a concrete source-level.
At the highest level of abstraction, the visualization uses a heat-map
(a color scheme) to efficiently overview the clusters found in an entire
system. Other levels include three source code views that allow
a user to “zoom” in on the clusters starting from the high-level system
view, down through a file view, and then onto the actual source
code where each cluster can be studied in detail.
Also presented are two case studies, the first is the open-source
calculator bc and the second is the industrial program copia, which
performs signal processing. The studies consider qualitative evaluations
of the visualization. From the results, it is seen that the visualization
reveals high-level structure of programs and interactions
between its components. The results also show that the visualization
highlights potential candidates (functions/files) for re-factoring
in bc and finds dependence pollution in copia
Seven clusters in genomic triplet distributions
Motivation: In several recent papers new algorithms were proposed for detecting coding regions without requiring learning dataset of already known genes. In this paper we studied cluster structure of several genomes in the space of codon usage. This allowed to interpret some of the results obtained in other studies and propose a simpler method, which is, nevertheless, fully
functional.
Results: Several complete genomic sequences were analyzed, using visualization of tables of triplet counts in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions. Awareness of the existence of this structure allows development of methods for the segmentation of sequences into regions with the same coding phase and non-coding regions.
This method may be completely unsupervised or use some external information. Since the method does not need extraction of ORFs, it can be applied even for unassembled genomes. Accuracy calculated on the base-pair level (both sensitivity and specificity) exceeds 90%. This is not worse as compared to such methods as HMM, however, has the advantage to be much simpler and clear
- …