6 research outputs found

    A Comparative Study of Code Query Technologies

    Full text link
    When analyzing software systems we are faced with the challenge of how to implement a particular analysis for different programming languages. A solution for this problem is to write a single analysis using a code query language abstracting from the specificities of languages being analyzed. Over the past ten years many code query technologies have been developed, based on different formalisms. Each technology comes with its own query language and set of features. To determine the state of the art of code querying we compare the languages and tools for seven code query technologies: Grok, Rscript, JRelCal, SemmleCode, JGraLab, CrocoPat and JTransformer. The specification of a package stability metric is used as a running example to compare the languages. The comparison involves twelve criteria, some of which are concerned with properties of the query language (paradigm, types, parametrization, polymorphism, modularity, and libraries), and some of which are concerned with the tool itself (output formats, interactive interface, API support, interchange formats, extraction support, and licensing). We contextualize the criteria in two usage scenarios: interactive and tool integration. We conclude that there is no particularly weak or dominant tool. As important improvement points, we identify the lack of library mechanisms, interchange formats, and possibilities for integration with source code extraction components

    Querying Source Code using an Algebraic Query Language

    No full text
    Querying and analyzing source code interactively is a critical task in reverse engineering and program understanding. Current source code query systems lack sufficient formalism and offer limited query capabilities. In this paper, we introduce the formal framework of Source Code Algebra (SCA), and outline a source code query system based on it. SCA provides a formal data model for source code, an algebraic expression-based query language, and opportunities for query optimization. An algebraic model of source code addresses the issues of conceptual integrity, expressive power, and performance of a source code query system within a unified framework. Keywords: Reverse engineering, source code query, query language, many-sorted algebra. 1 Overview The answers are in the source code. Mark Weiser, in Source Code [28]. Software reverse engineering, code re-engineering, and program understanding have begun to emerge as the latest challenges in the field of software engineering. Interest in ..

    Abstraction : a notion for reverse engineering.

    Get PDF

    Graph database management systems: storage, management and query processing

    Get PDF
    The proliferation of graph data, generated from diverse sources, have given rise to many research efforts concerning graph analysis. Interactions in social networks, publication networks, protein networks, software code dependencies and transportation systems are all examples of graph-structured data originating from a variety of application domains and demonstrating different characteristics. In recent years, graph database management systems (GDBMS) have been introduced for the management and analysis of graph data. Motivated by the growing number of real-life applications making use of graph database systems, this thesis focuses on the effectiveness and efficiency aspects of such systems. Specifically, we study the following topics relevant to graph database systems: (i) modeling large-scale applications in GDBMS; (ii) storage and indexing issues in GDBMS, and (iii) efficient query processing in GDBMS. In this thesis, we adopt two different application scenarios to examine how graph database systems can model complex features and perform relevant queries on each of them. Motivated by the popular application of social network analytics, we selected Twitter, a microblogging platform, to conduct our detailed analysis. Addressing limitations of existing models, we pro- pose a data model for the Twittersphere that proactively captures Twitter-specific interactions. We examine the feasibility of running analytical queries on GDBMS and offer empirical analysis of the performance of the proposed approach. Next, we consider a use case of modeling software code dependencies in a graph database system, and investigate how these systems can support capturing the evolution of a codebase overtime. We study a code comprehension tool that extracts software dependencies and stores them in a graph database. On a versioned graph built using a very large codebase, we demonstrate how existing code comprehension queries can be efficiently processed and also show the benefit of running queries across multiple versions. Another important aspect of this thesis is the study of storage aspects of graph systems. Throughput of many graph queries can be significantly affected by disk I/O performance; therefore graph database systems need to focus on effective graph storage for optimising disk operations. We observe that the locality of edges plays an important role and we address the edge-labeling problem which aims to label both incoming and outgoing edges of a graph maximizing the ‘edge-consecutiveness’ metric. By achieving a better layout and locality of edges on disk, we show that our proposed algorithms result in significantly improved disk I/O performance leading to faster execution of neighbourhood queries. Some applications require the integrated processing of queries from graph and the textual domains within a graph database system. Aggregation of these dimensions facilitates gaining key insights in several application scenarios. For example, in a social network setting, one may want to find the closest k users in the network (graph traversal) who talk about a particular topic A (textual search). Motivated by such practical use cases, in this thesis we study the top-k social-textual ranking query that essentially requires efficient combination of a keyword search query with a graph traversal. We propose algorithms that leverage graph partitioning techniques, based on the premise that socially close users will be placed within the same partition, allowing more localised computations. We show that our proposed approaches are able to achieve significantly better results compared to standard baselines and demonstrating robust behaviour under changing parameters
    corecore