37 research outputs found

    Automated identification and qualitative characterization of safety concerns reported in UAV software platforms

    Get PDF
    Unmanned Aerial Vehicles (UAVs) are nowadays used in a variety of applications. Given the cyber-physical nature of UAVs, software defects in these systems can cause issues with safety-critical implications. An important aspect of the lifecycle of UAV software is to minimize the possibility of harming humans or damaging properties through a continuous process of hazard identification and safety risk management. Specifically, safety-related concerns typically emerge during the operation of UAV systems, reported by end-users and developers in the form of issue reports and pull requests. However, popular UAV systems daily receive tens or hundreds of reports of varying types and quality. To help developers timely identifying and triaging safety-critical UAV issues, we (i) experiment with automated approaches (previously used for issue classification) for detecting the safety-related matters appearing in the titles and descriptions of issues and pull requests reported in UAV platforms, and (ii) propose a categorization of the main hazards and accidents discussed in such issues. Our results (i) show that shallow machine learning-based approaches can identify safety-related sentences with precision, recall, and F-measure values of about 80\%; and (ii) provide a categorization and description of the relationships between safety issue hazards and accidents

    Гібридизація способів детекції плагіаризму в програмному коді

    Get PDF
    Актуальність теми. Ми живемо у вік прогресивного розвитку комп’ютерних технологій. В наш час програмні продукти з’являються і розвиваються з неймовірною швидкістю, тенденції і тренди на технології змінюються щоденно. Одночасно виникають і проблеми плагіаризму, де під плагіаризмом в програмному коді розуміють часткове або повне дублювання фрагментів коду програми. Під терміном дублювання можна розглядати як ідентичні фрагменти коду в рамках однієї програми, так і плагіаризм у різних програмах. Детекція плагіаризму завжди була і залишається актуальною задачею в розробці програмного забезпечення, як інструмент метричного аналізу і рефакторингу. На сьогодні існує велика кількість програмних продуктів, які допомагають ідентифікувати плагіаризм у програмному коді. Значна кількість інтегрованих середовищ розробки містить у собі відповідні розширення для полегшення розробки, підтримки і впровадження програмних продуктів, тому що накопичування великої кількості дублюючого програмного коду може призводити до виникнення значної кількості проблем, оскільки при детекції помилок в певному фрагменті коду, всі дублюючі фрагменти також повинні бути модифіковані, що може значно ускладнити впровадження і підтримку програмного продукту. Завдяки детекції плагіаризму дублюючий код може абстрагуватися і, як результат, загальний розмір програмного коду може бути зменшений. Крім того детекція плагіаризму в програмному коді може використовуватися для рефакторингу проєктів, зменшення кодової бази та пошуку порушення прав копірайтингу. Об’єктом дослідження є детекція плагіаризму в програмному коді. Предметом дослідження є гібридизація способів визначення програмних клонів у вхідних програмах. Мета роботи: розробка нового гібридного підходу для більш ефективної детекції плагіаризму порівняно з аналогами. Наукова новизна полягає в наступному: запропоновано гібридний спосіб детекції плагіаризму у програмному коді, який дає можливість детекції плагіаризму для широкого спектру мов програмування, забезпечуючи покращення результатів існуючих підходів. Практична цінність отриманих в роботі результатів полягає в тому, що запропонований спосіб є універсальним рішенням для даної сфери, забезпечуючи можливість детекції плагіаризму для широкого спектру мов програмування без втрати ефективності і має можливість застосування як інструмент для полегшення процесу розробки програмного забезпечення, також передбачена можливість інтеграції з іншим програмним забезпеченням. Апробація роботи. Основні положення і результати роботи були представлені та обговорювались на: 1. XIV науковій конференції магістрантів та аспірантів «Прикладна математика та комп’ютинг» ПМК-2021 (Київ, 17-19 листопада 2021 р.); 2. VІIІ Міжнародній науково-технічній Internet-конференції «Сучасні методи, інформаційне, програмне та технічне забезпечення систем керування організаційно-технічними та технологічними комплексами» (Київ, 26 листопада 2021р.). Структура та обсяг роботи. Магістерська дисертація складається з вступу, чотирьох розділів та висновків. У вступі подано загальну характеристику роботи, зроблено оцінку сучасного стану проблеми, обґрунтовано актуальність напрямку досліджень, сформульовано мету і задачі досліджень та показано практичну цінність роботи. У першому розділі розглянуто основні техніки детекції плагіаризму, програмне забезпечення, яке їх реалізує, їх переваги та недоліки. У другому розділі наведено відомості про методи, алгоритми та інструменти, які використовуються для обробки програмного коду в магістерській роботі. У третьому розділі подано опис розробленого способу та етапів його реалізації. У четвертому розділі міститься тестування даного програмного комплексу з метою доведення правильності роботи. У висновках представлені результати проведеної роботи. Робота представлена на 80 аркушах, містить посилання на список використаних літературних джерел.Actuality of theme. We live in the age of progressive development of computer technology. Nowadays, software appears and develops with great speed and trends in technology is changing daily. At the same time there are problems of plagiarism, where plagiarism in program code means parts or complete duplication of fragments of the program code. The term duplication can be considered as identical fragments of code within one program, and plagiarism in different programs. Detection of plagiarism always remains an urgent task in software development as a tool of metric analysis and refactoring. Today, there are a large number of software products that helps to identify plagiarism in program code. A significant number of integrated development environments includes appropriate extensions to facilitate the development, maintenance and implementation of software products, because the accumulation of large amounts of duplicate program code can lead to many problems, because when detecting errors in a particular piece of code, all duplicate fragments must be modified, which can greatly complicate the implementation and maintenance of the software product. Due to the detection of plagiarism, duplicate code can be abstracted and, as a result, the overall size of the program code can be reduced. In addition, the detection of plagiarism in program code can be used of projects refactoring, reduce the code base and find violations of copy-writing rights. The object of the study is the plagiarism detection in program code. The subject of the study is a hybrid method of determining software clones in input programs. Purpose: an analysis of plagiarism detection in software code with purpose of developing of new software using hybrid approaches. Research methods. Methods of discrete mathematics, experimental method. The scientific novelty is: A new method of detecting plagiarism in program code is proposed, which enables the detection of plagiarism for a wide range of programming languages, improving the results of existing approaches. The practical value of the results obtained is that the proposed method is a universal solution in this sphere, providing the ability to detect plagiarism for a wide range of programming languages without loss of efficiency and can be used as a tool to facilitate software development, also provides the ability to integrate with other software. Testing the work. The main provisions and results of the work were presented on the PMK-2021 and VIII International scientific Internet-conference. Structure and scope of work. Introduction is a general description of the work, the estimation of a modern condition of a problem is made, the urgency of a direction of researches is proved, the purpose and tasks of researches are formulated and practical value of work is shown. The first section discusses the main techniques for detecting plagiarism, the software that implements them, their advantages and disadvantages. The second section provides information about the methods, algorithms and tools that is used to process program code in this work. The third section describes developed method and stages of its implementation. The fourth section contains testing of this software package to prove the correctness of the work. The master's thesis is made on 80 papers, contains 3 appendices and links to the list of used literature sources with 20 titles. 36 drawings and 4 tables are presented in the paper

    A User-aware Intelligent Refactoring for Discrete and Continuous Software Integration

    Full text link
    Successful software products evolve through a process of continual change. However, this process may weaken the design of the software and make it unnecessarily complex, leading to significantly reduced productivity and increased fault-proneness. Refactoring improves the software design while preserving overall functionality and behavior, and is an important technique in managing the growing complexity of software systems. Most of the existing work on software refactoring uses either an entirely manual or a fully automated approach. Manual refactoring is time-consuming, error-prone and unsuitable for large-scale, radical refactoring. Furthermore, fully automated refactoring yields a static list of refactorings which, when applied, leads to a new and often hard to comprehend design. In addition, it is challenging to merge these refactorings with other changes performed in parallel by developers. In this thesis, we propose a refactoring recommendation approach that dynamically adapts and interactively suggests refactorings to developers and takes their feedback into consideration. Our approach uses Non-dominated Sorting Genetic Algorithm (NSGAII) to find a set of good refactoring solutions that improve software quality while minimizing the deviation from the initial design. These refactoring solutions are then analyzed to extract interesting common features between them such as the frequently occurring refactorings in the best non-dominated solutions. We combined our interactive approach and unsupervised learning to reduce the developer’s interaction effort when refactoring a system. The unsupervised learning algorithm clusters the different trade-off solutions, called the Pareto front, to guide the developers in selecting their region of interests and reduce the number of refactoring options to explore. To reduce the interaction effort, we propose an approach to convert multi-objective search into a mono-objective one after interacting with the developer to identify a good refactoring solution based on their preferences. Since developers may want to focus on specific code locations, the ”Decision Space” is also important. Therefore, our interactive approach enables developers to pinpoint their preference simultaneously in the objective (quality metrics) and decision (code location) spaces. Due to an urgent need for refactoring tools that can support continuous integration and some recent development processes such as DevOps that are based on rapid releases, we propose, for the first time, an intelligent software refactoring bot, called RefBot. Our bot continuously monitors the software repository and find the best sequence of refactorings to fix the quality issues in Continous Integration/Continous Development (CI/CD) environments as a set of pull-requests generated after mining previous code changes to understand the profile of developers. We quantitatively and qualitatively evaluated the performance and effectiveness of our proposed approaches via a set of studies conducted with experienced developers who used our tools on both open source and industry projects.Ph.D.College of Engineering & Computer ScienceUniversity of Michigan-Dearbornhttps://deepblue.lib.umich.edu/bitstream/2027.42/154775/1/Vahid Alizadeh Final Dissertation.pdfDescription of Vahid Alizadeh Final Dissertation.pdf : Dissertatio

    Identifying developers’ habits and expectations in copy and paste programming practice

    Full text link
    Máster Universitario en Investigación e Innovación en Inteligencia Computacional y Sistemas InteractivosBoth novice and experienced developers rely more and more in external sources of code to include into their programs by copy and paste code snippets. This behavior differs from the traditional software design approach where cohesion was achieved via a conscious design effort. Due to this fact, it is essential to know how copy and paste programming practices are actually carried out, so that IDEs (Integrated Development Environments) and code recommenders can be designed to fit with developer expectations and habit

    Code similarity and clone search in large-scale source code data

    Get PDF
    Software development is tremendously benefited from the Internet by having online code corpora that enable instant sharing of source code and online developer's guides and documentation. Nowadays, duplicated code (i.e., code clones) not only exists within or across software projects but also between online code repositories and websites. We call them "online code clones."' They can lead to license violations, bug propagation, and re-use of outdated code similar to classic code clones between software systems. Unfortunately, they are difficult to locate and fix since the search space in online code corpora is large and no longer confined to a local repository. This thesis presents a combined study of code similarity and online code clones. We empirically show that many code snippets on Stack Overflow are cloned from open source projects. Several of them become outdated or violate their original license and are possibly harmful to reuse. To develop a solution for finding online code clones, we study various code similarity techniques to gain insights into their strengths and weaknesses. A framework, called OCD, for evaluating code similarity and clone search tools is introduced and used to compare 34 state-of-the-art techniques on pervasively modified code and boiler-plate code. We also found that clone detection techniques can be enhanced by compilation and decompilation. Using the knowledge from the comparison of code similarity analysers, we create and evaluate Siamese, a scalable token-based clone search technique via multiple code representations. Our evaluation shows that Siamese scales to large-scale source code data of 365 million lines of code and offers high search precision and recall. Its clone search precision is comparable to seven state-of-the-art clone detection tools on the OCD framework. Finally, we demonstrate the usefulness of Siamese by applying the tool to find online code clones, automatically analyse clone licenses, and recommend tests for reuse

    Succinct Data Structures for Parameterized Pattern Matching and Related Problems

    Get PDF
    Let T be a fixed text-string of length n and P be a varying pattern-string of length |P| \u3c= n. Both T and P contain characters from a totally ordered alphabet Sigma of size sigma \u3c= n. Suffix tree is the ubiquitous data structure for answering a pattern matching query: report all the positions i in T such that T[i + k - 1] = P[k], 1 \u3c= k \u3c= |P|. Compressed data structures support pattern matching queries, using much lesser space than the suffix tree, mainly by relying on a crucial property of the leaves in the tree. Unfortunately, in many suffix tree variants (such as parameterized suffix tree, order-preserving suffix tree, and 2-dimensional suffix tree), this property does not hold. Consequently, compressed representations of these suffix tree variants have been elusive. We present the first compressed data structures for two important variants of the pattern matching problem: (1) Parameterized Matching -- report a position i in T if T[i + k - 1] = f(P[k]), 1 \u3c= k \u3c= |P|, for a one-to-one function f that renames the characters in P to the characters in T[i,i+|P|-1], and (2) Order-preserving Matching -- report a position i in T if T[i + j - 1] and T[i + k -1] have the same relative order as that of P[j] and P[k], 1 \u3c= j \u3c k \u3c= |P|. For each of these two problems, the existing suffix tree variant requires O(n*log n) bits of space and answers a query in O(|P|*log sigma + occ) time, where occ is the number of starting positions where a match exists. We present data structures that require O(n*log sigma) bits of space and answer a query in O((|P|+occ) poly(log n)) time. As a byproduct, we obtain compressed data structures for a few other variants, as well as introduce two new techniques (of independent interest) for designing compressed data structures for pattern matching

    SmartInspect: Smart Contract Inspection Technical Report

    Get PDF
    Smart contracts are embedded procedures stored with the data they act upon. Debugging deployed Smart Contracts is a difficult task since once deployed, the code cannot be reexecuted and inspecting a simple attribute is not easily possible because data is encoded. In this technical report, we present SmartInspect to address the lack of inspectability of a deployed contract. Our solution analyses the contract state by using decompilation techniques and a mirror-based architecture to represent the object responsible for interpreting the contract state. SmartInspect allows developers and also end-users of a contract to better visualize and understand the contract stored state without needing to redeploy, nor develop any ad-hoc code

    pBWT: Achieving succinct data structures for parameterized pattern matching and related problems

    Get PDF
    The fields of succinct data structures and compressed text indexing have seen quite a bit of progress over the last two decades. An important achievement, primarily using techniques based on the Burrows-Wheeler Transform (BWT), was obtaining the full functionality of the suffix tree in the optimal number of bits. A crucial property that allows the use of BWT for designing compressed indexes is order-preserving suffix links. Specifically, the relative order between two suffixes in the subtree of an internal node is same as that of the suffixes obtained by truncating the furst character of the two suffixes. Unfortunately, in many variants of the text-indexing problem, for e.g., parameterized pattern matching, 2D pattern matching, and order-isomorphic pattern matching, this property does not hold. Consequently, the compressed indexes based on BWT do not directly apply. Furthermore, a compressed index for any of these variants has been elusive throughout the advancement of the field of succinct data structures. We achieve a positive breakthrough on one such problem, namely the Parameterized Pattern Matching problem. Let T be a text that contains n characters from an alphabet , which is the union of two disjoint sets: containing static characters (s-characters) and containing parameterized characters (p-characters). A pattern P (also over ) matches an equal-length substring S of T i the s-characters match exactly, and there exists a one-to-one function that renames the p-characters in S to that in P. The task is to find the starting positions (occurrences) of all such substrings S. Previous index [Baker, STOC 1993], known as Parameterized Suffix Tree, requires (n log n) bits of space, and can find all occ occurrences in time O(jPj log +occ), where = jj. We introduce an n log +O(n)-bit index with O(jPj log +occlog n log ) query time. At the core, lies a new BWT-like transform, which we call the Parame- terized Burrows-Wheeler Transform (pBWT). The techniques are extended to obtain a succinct index for the Parameterized Dictionary Matching problem of Idury and Schaer [CPM, 1994]
    corecore