1,815 research outputs found

    Lightweight Multilingual Software Analysis

    Full text link
    Developer preferences, language capabilities and the persistence of older languages contribute to the trend that large software codebases are often multilingual, that is, written in more than one computer language. While developers can leverage monolingual software development tools to build software components, companies are faced with the problem of managing the resultant large, multilingual codebases to address issues with security, efficiency, and quality metrics. The key challenge is to address the opaque nature of the language interoperability interface: one language calling procedures in a second (which may call a third, or even back to the first), resulting in a potentially tangled, inefficient and insecure codebase. An architecture is proposed for lightweight static analysis of large multilingual codebases: the MLSA architecture. Its modular and table-oriented structure addresses the open-ended nature of multiple languages and language interoperability APIs. We focus here as an application on the construction of call-graphs that capture both inter-language and intra-language calls. The algorithms for extracting multilingual call-graphs from codebases are presented, and several examples of multilingual software engineering analysis are discussed. The state of the implementation and testing of MLSA is presented, and the implications for future work are discussed.Comment: 15 page

    Analyzing Solidity smart contracts

    Get PDF
    Masteroppgave i informatikkINF399KMAMN-IN

    Підхід до розробки інформаційної системи для екстракції даних з веб

    Get PDF
    Today, the Internet contains a huge number of sources of information, which is constantly used in our daily lives. It often happens that similar in meaning information is presented in different forms on different resources (for example, electronic libraries, online stores, news sites and etc.). In this paper, we analyze the extraction of information from certain type of web sources that is required by the user. The analysis of the data extraction problem was carried out. When considering the main approaches to data extraction, the strengths and weaknesses of each were identified. The main aspects of the extraction of web knowledge were formulated. Approaches and information technologies for solving problems of syntactic analysis based on existing information systems are analyzed. Based on the analysis, the task of developing models and software components for extracting data from certain types of web resources were solving. A conceptual model of extracting data was developed taking into account web space as an external data source. A requirements specification for the software component was created, which will allow to continue working on the project and to clearly understand the requirements and constraints for implementation. During the process of modeling software, the following diagrams have been developed, such as activities, sequences and deployments, which will then be used to create the finished software application. For further development of the software, a programming platform and types of testing (load and modular) were defined. The obtained results allow to state that the proposed design solution, which will be implemented as a prototype of the software system, can perform the task of extracting data from different sources on the basis of a single semantic template.Сьогодні Інтернет містить величезну кількість джерел інформації, яка постійно використовується в нашому щоденному житті. Часто буває, що схожа за змістом інформація представлена в різній формі на різних ресурсах (наприклад, електронні бібліотеки, інтернет-магазини, новинні сайти). У даній роботі аналізується вилучення інформації з веб-джерел певного типу, яке потрібно користувачеві. Проведено аналіз проблеми вилучення даних. При розгляді основних підходів до екстракції даних були виділені сильні і слабкі сторони кожного. Сформульовано основні аспекти вилучення веб-знань. Проаналізовано підходи та інформаційні технології вирішення проблем синтаксичного аналізу на основі існуючих інформаційних систем. На основі проведеного аналізу була сформована задача розробки моделей і програмних компонентів для отримання даних з веб-ресурсів певного типу. Розроблено концептуальну модель вилучення даних з урахуванням веб-простору як зовнішнього джерела даних. Була створена специфікація вимог для програмного компонента, що дозволить продовжити роботу над проектом, щоб чітко розуміти вимоги і обмеження для реалізації. При моделюванні програмного забезпечення були розроблені наступні діаграми, такі як діаграми класів, активності, послідовності і розгортання, які потім будуть використовуватися для створення готового додатка. Для подальшої розробки програмного забезпечення була визначена платформа програмування і види тестування (навантажувальний і модульне). Отримані результати дозволяють стверджувати, що пропоноване проектне рішення, яке буде реалізовано у вигляді прототипу програмної системи, може виконувати завдання екстракції даних з різних джерел на основі одного семантичного шаблону

    Tool comparison of semantic parsers

    Get PDF
    Natural Language Processing (NLP) is a vital aspect for artificial intelligence systems to achieve integration into human lives, which has been a goal for researchers in this industry. While NLP focuses on an array of problems, semantic parsing will be specifically focused on throughout this paper. These parsers have been considerably targeted for improvement through the scientific community and demand for semantic parsers that achieve high accuracy has increased. There have been many approaches developed for this specific purpose and in this paper, a deep analysis was performed to compare the performance of semantic parsing systems. The implications of this comparison provides a viewpoint of how semantic parsers from different eras compare on a set of shared metrics

    Generating Accurate Dependencies for Large Software

    Get PDF
    Dependencies between program elements can reflect the architecture, design, and implementation of a software project. According a industry report, intra- and inter-module dependencies can be a significant source of latent threats to software maintainability in long-term software development, especially when the software has millions of lines of code. This thesis introduces the design and implementation of an accurate and scalable analysis tool that extracts code dependencies from large C/C++ software projects. The tool analyzes both symbol-level and module-level dependencies of a software system and provides an utilization-based dependency model. The accurate dependencies generated by the tool can be provided as the input to other software analysis suits; the results along can help developers identify potential underutilized and inconsistent dependencies in the software. Such information points to potential refactoring opportunities and assists developers with large-scale refactoring tasks.1 yea

    What to Fix? Distinguishing between design and non-design rules in automated tools

    Full text link
    Technical debt---design shortcuts taken to optimize for delivery speed---is a critical part of long-term software costs. Consequently, automatically detecting technical debt is a high priority for software practitioners. Software quality tool vendors have responded to this need by positioning their tools to detect and manage technical debt. While these tools bundle a number of rules, it is hard for users to understand which rules identify design issues, as opposed to syntactic quality. This is important, since previous studies have revealed the most significant technical debt is related to design issues. Other research has focused on comparing these tools on open source projects, but these comparisons have not looked at whether the rules were relevant to design. We conducted an empirical study using a structured categorization approach, and manually classify 466 software quality rules from three industry tools---CAST, SonarQube, and NDepend. We found that most of these rules were easily labeled as either not design (55%) or design (19%). The remainder (26%) resulted in disagreements among the labelers. Our results are a first step in formalizing a definition of a design rule, in order to support automatic detection.Comment: Long version of accepted short paper at International Conference on Software Architecture 2017 (Gothenburg, SE

    Extracting Social Network from Literary Prose

    Get PDF
    This thesis develops an approach to extract social networks from literary prose, namely, Jane Austen’s published novels from eighteenth- and nineteenth- century. Dialogue interaction plays a key role while we derive the networks, thus our technique relies upon our ability to determine when two characters are in conversation. Our process involves encoding plain literary text into the Text Encoding Initiative’s (TEI) XML format, character name identification, conversation and co-occurrence detection, and social network construction. Previous work in social network construction for literature have focused on drama, specifically manually TEI-encoded Shakespearean plays in which character interactions are much easier to track in due to their dialogue-driven narrative structure. In contrast, prose is structured quite differently; character speeches are not very clearly formatted, making it more difficult to assign specific dialogue to each character. We implement two different parsing strategies based on context size (chapter scope and paragraph scope) to detect character interactions. To check the accuracy of our methods, we conduct one evaluation that is based on network statistics and another evaluation that involves measuring similarity (edit distance) between the networks constructed from manually encoded novels versus our constructed graphs. Our findings suggest that the choice of context size is non-trivial and can have a substantial influence on the resulting networks. In general, the paragraph level interaction approach seemed to be more accurate
    corecore