3,128 research outputs found

    Identifying Cloned Navigational Patterns in Web Applications

    Get PDF
    Web Applications are subject to continuous and rapid evolution. Often programmers indiscriminately duplicate Web pages without considering systematic development and maintenance methods. This practice creates code clones that make Web Applications hard to maintain and reuse. We present an approach to identify duplicated functionalities in Web Applications through cloned navigational pattern analysis. Cloned patterns can be generalized in a reengineering process, thus to simplify the structure and future maintenance of the Web Applications. The proposed method first identifies pairs of cloned pages by analyzing similarity at structure, content, and scripting code. Two pages are considered clones if their similarity is greater than a given threshold. Cloned pages are then grouped into clusters and the links connecting pages of two clusters are grouped too. An interconnection metric has been defined on the links between two clusters to express the effort required to reengineer them as well as to select the patterns of interest. To further reduce the comprehension effort, we filter out links and nodes of the clustered navigational schema that do not contribute to the identification of cloned navigational patterns. A tool supporting the proposed approach has been developed and validated in a case study

    An Approach and an Eclipse Based Environment for Enhancing the Navigation Structure of Web Sites

    Get PDF
    This paper presents an approach based on information retrieval and clustering techniques for automatically enhancing the navigation structure of a Web site for improving navigability. The approach increments the set of navigation links provided in each page of the site with a semantic navigation map, i.e., a set of links enabling navigating from a given page to other pages of the site showing similar or related content. The approach uses Latent Semantic Indexing to compute a dissimilarity measure between the pages of the site and a graph-theoretic clustering algorithm to group pages showing similar or related content according to the calculated dissimilarity measure. AJAX code is finally used to extend each Web page with an associated semantic navigation map. The paper also presents a prototype of a tool developed to support the approach and the results from a case study conducted to assess the validity and feasibility of the proposal

    Metric Selection and Metric Learning for Matching Tasks

    Get PDF
    A quarter of a century after the world-wide web was born, we have grown accustomed to having easy access to a wealth of data sets and open-source software. The value of these resources is restricted if they are not properly integrated and maintained. A lot of this work boils down to matching; finding existing records about entities and enriching them with information from a new data source. In the realm of code this means integrating new code snippets into a code base while avoiding duplication. In this thesis, we address two different such matching problems. First, we leverage the diverse and mature set of string similarity measures in an iterative semisupervised learning approach to string matching. It is designed to query a user to make a sequence of decisions on specific cases of string matching. We show that we can find almost optimal solutions after only a small amount of such input. The low labelling complexity of our algorithm is due to addressing the cold start problem that is inherent to Active Learning; by ranking queries by variance before the arrival of enough supervision information, and by a self-regulating mechanism that counteracts initial biases. Second, we address the matching of code fragments for deduplication. Programming code is not only a tool, but also a resource that itself demands maintenance. Code duplication is a frequent problem arising especially from modern development practice. There are many reasons to detect and address code duplicates, for example to keep a clean and maintainable codebase. In such more complex data structures, string similarity measures are inadequate. In their stead, we study a modern supervised Metric Learning approach to model code similarity with Neural Networks. We find that in such a model representing the elementary tokens with a pretrained word embedding is the most important ingredient. Our results show both qualitatively (by visualization) that relatedness is modelled well by the embeddings and quantitatively (by ablation) that the encoded information is useful for the downstream matching task. As a non-technical contribution, we unify the common challenges arising in supervised learning approaches to Record Matching, Code Clone Detection and generic Metric Learning tasks. We give a novel account to string similarity measures from a psychological standpoint and point out and document one longstanding naming conflict in string similarity measures. Finally, we point out the overlap of latest research in Code Clone Detection with the field of Natural Language Processing

    Determinants of anti-PD-1 response and resistance in clear cell renal cell carcinoma

    Get PDF
    ADAPTeR is a prospective, phase II study of nivolumab (anti-PD-1) in 15 treatment-naive patients (115 multiregion tumor samples) with metastatic clear cell renal cell carcinoma (ccRCC) aiming to understand the mechanism underpinning therapeutic response. Genomic analyses show no correlation between tumor molecular features and response, whereas ccRCC-specific human endogenous retrovirus expression indirectly correlates with clinical response. T cell receptor (TCR) analysis reveals a significantly higher number of expanded TCR clones pre-treatment in responders suggesting pre-existing immunity. Maintenance of highly similar clusters of TCRs post-treatment predict response, suggesting ongoing antigen engagement and survival of families of T cells likely recognizing the same antigens. In responders, nivolumab-bound CD8+ T cells are expanded and express GZMK/B. Our data suggest nivolumab drives both maintenance and replacement of previously expanded T cell clones, but only maintenance correlates with response. We hypothesize that maintenance and boosting of a pre-existing response is a key element of anti-PD-1 mode of action

    Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution

    Full text link
    The standard approach to analyzing 16S tag sequence data, which relies on clustering reads by sequence similarity into Operational Taxonomic Units (OTUs), underexploits the accuracy of modern sequencing technology. We present a clustering-free approach to multi-sample Illumina datasets that can identify independent bacterial subpopulations regardless of the similarity of their 16S tag sequences. Using published data from a longitudinal time-series study of human tongue microbiota, we are able to resolve within standard 97% similarity OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S tags differing by as little as 1 nucleotide (99.2% similarity). A comparative analysis of oral communities of two cohabiting individuals reveals that most such subpopulations are shared between the two communities at 100% sequence identity, and that dynamical similarity between subpopulations in one host is strongly predictive of dynamical similarity between the same subpopulations in the other host. Our method can also be applied to samples collected in cross-sectional studies and can be used with the 454 sequencing platform. We discuss how the sub-OTU resolution of our approach can provide new insight into factors shaping community assembly.Comment: Updated to match the published version. 12 pages, 5 figures + supplement. Significantly revised for clarity, references added, results not change

    Find Unique Usages: Helping Developers Understand Common Usages

    Full text link
    When working in large and complex codebases, developers face challenges using \textit{Find Usages} to understand how to reuse classes and methods. To better understand these challenges, we conducted a small exploratory study with 4 participants. We found that developers often wasted time reading long lists of similar usages or prematurely focused on a single usage. Based on these findings, we hypothesized that clustering usages by the similarity of their surrounding context might enable developers to more rapidly understand how to use a function. To explore this idea, we designed and implemented \textit{Find Unique Usages}, which extracts usages, computes a diff between pairs of usages, generates similarity scores, and uses these scores to form usage clusters. To evaluate this approach, we conducted a controlled experiment with 12 participants. We found that developers with Find Unique Usages were significantly faster, completing their task in 35% less time
    corecore