79,574 research outputs found

    Dependence Cluster Visualization

    Get PDF
    Large clusters of mutual dependence have long been regarded as a problem impeding comprehension, testing, maintenance, and reverse engineering. An effective visualization can aid an engineer in addressing the presence of large clusters. Such a visualization is presented. It allows a program’s dependence clusters to be considered from an abstract high level down thru a concrete source-level. At the highest level of abstraction, the visualization uses a heat-map (a color scheme) to efficiently overview the clusters found in an entire system. Other levels include three source code views that allow a user to “zoom” in on the clusters starting from the high-level system view, down through a file view, and then onto the actual source code where each cluster can be studied in detail. Also presented are two case studies, the first is the open-source calculator bc and the second is the industrial program copia, which performs signal processing. The studies consider qualitative evaluations of the visualization. From the results, it is seen that the visualization reveals high-level structure of programs and interactions between its components. The results also show that the visualization highlights potential candidates (functions/files) for re-factoring in bc and finds dependence pollution in copia

    Seven clusters in genomic triplet distributions

    Get PDF
    Motivation: In several recent papers new algorithms were proposed for detecting coding regions without requiring learning dataset of already known genes. In this paper we studied cluster structure of several genomes in the space of codon usage. This allowed to interpret some of the results obtained in other studies and propose a simpler method, which is, nevertheless, fully functional. Results: Several complete genomic sequences were analyzed, using visualization of tables of triplet counts in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions. Awareness of the existence of this structure allows development of methods for the segmentation of sequences into regions with the same coding phase and non-coding regions. This method may be completely unsupervised or use some external information. Since the method does not need extraction of ORFs, it can be applied even for unassembled genomes. Accuracy calculated on the base-pair level (both sensitivity and specificity) exceeds 90%. This is not worse as compared to such methods as HMM, however, has the advantage to be much simpler and clear
    • …
    corecore