6 research outputs found

    String processing model for knowledge-driven systems

    Get PDF
    The purpose of the work is to confirm experimentally theoretical estimates for time complexity of operations of the string processing model linked with the metric space for solving data processing problems in knowledge-driven systems including the research and comparison of the operation characteristics of these operations with the characteristics of similar operations for the most relevant data structures. Integral and unit testing were used to obtain the results of the performed computational experiments and verify their correctness. The C \ C++ implementation of operations of the string processing model was tested. The paper gives definitions of concepts necessary for the calculation of metric features calculated over strings. As a result of the experiments, theoretical estimates of the computational complexity of the implemented operations and the validity of the choice of parameters of the used data structures were confirmed, which ensures near-optimal throughput and operation time indicators of operations. According to the obtained results, the advantage is the ability to guarantee the time complexity of the string processing operations no higher than O  at all stages of a life cycle of data structures used to represent strings, from their creation to destruction, which allows for high throughput in data processing and responsiveness of systems built on the basis of the implemented operations. In case of solving particular string processing problems and using more suitable for these cases data structures such as vector or map the implemented operations have disadvantages meaning they are inferior in terms of the amount of data processed per time unit. The string processing model is focused on the application in knowledge-driven systems at the data management level.The purpose of the work is to confirm experimentally theoretical estimates for time complexity of operations of the string processing model linked with the metric space for solving data processing problems in knowledge-driven systems including the research and comparison of the operation characteristics of these operations with the characteristics of similar operations for the most relevant data structures. Integral and unit testing were used to obtain the results of the performed computational experiments and verify their correctness. The C \ C++ implementation of operations of the string processing model was tested. The paper gives definitions of concepts necessary for the calculation of metric features calculated over strings. As a result of the experiments, theoretical estimates of the computational complexity of the implemented operations and the validity of the choice of parameters of the used data structures were confirmed, which ensures near-optimal throughput and operation time indicators of operations. According to the obtained results, the advantage is the ability to guarantee the time complexity of the string processing operations no higher than O  at all stages of a life cycle of data structures used to represent strings, from their creation to destruction, which allows for high throughput in data processing and responsiveness of systems built on the basis of the implemented operations. In case of solving particular string processing problems and using more suitable for these cases data structures such as vector or map the implemented operations have disadvantages meaning they are inferior in terms of the amount of data processed per time unit. The string processing model is focused on the application in knowledge-driven systems at the data management level

    Towards Incorporation of Software Security Testing Framework in Software Development

    Get PDF
    The aim of this paper is to provide secure software using security testing approach. The researchers have reviewed and analyzed the software testing frameworks and software security testing frameworks to efficiently incorporate both of them. Later, the researchers proposed to fully utilize the acceptance testing in software testing framework to achieve by incorporating it in software security testing framework. This incorporation is able to improve the security attribute needed during requirement stage of software development process. The advantage of acceptance test is to expose the system of the real situation, including vulnerability, risk, impacts and the intruders which provide a various set of security attribute to the requirement stage. This finding is recommended to establish a baseline in formulating the test pattern to achieve effective test priority

    Towards a framework for differential unit testing of object-oriented programs

    No full text
    Software developers often face the task of determining how the behaviors of one version of a program unit differ from (or are the same as) the behaviors of a (slightly) different version of the same program unit. In such situations, developers would like to generate tests that exhibit the behavioral differences between the two versions, if any differences exist. We call this type of testing differential unit testing. Some examples of differential unit testing include regression testing, N-version testing, and mutation testing. We propose a framework, called Diffut, that enables differential unit testing of object-oriented programs. Diffut enables “simultaneous ” execution of the pairs of corresponding methods from the two versions: methods can receive the same inputs (consisting of the object graph reachable from the receiver and method arguments), and Diffut compares their outputs (consisting of the object graph reachable from the receiver and method return values). Given two versions of a Java class, Diffut automatically synthesizes annotations (in the form of preconditions and postconditions) in the Java Modeling Language (JML) and inserts them into the unit under test to allow the simultaneous execution of the corresponding methods.

    Mining Software Repositories for Release Engineers - Empirical Studies on Integration and Infrastructures-as-Code

    Get PDF
    RÉSUMÉ Release engineering (Releng) est le processus de la mise en production logicielle des contributions des dĂ©veloppeurs en un produit intĂ©grĂ© livrĂ© aux utilisateurs. Ce processus consiste des phases d’intĂ©gration, de construction et des tests, de dĂ©ploiement et de livraison, pour finalement entrer au marchĂ©. Alors que, traditionnellement, la mise en production prend plusieurs mois pour livrer un produit logiciel complet aux clients, la mise en production moderne vise Ă  apporter de la valeur au client plus rapidement, dans l’ordre des jours ou des semaines, pour recevoir de rĂ©troaction utile plus rapidement et pour minimiser le temps perdu sur des fonctionnalitĂ©s Ă©chouĂ©es. De nos jours, une panoplie d’outils et de techniques a Ă©mergĂ© pour soutenir la mise en production. Ils visent essentiellement Ă  automatiser les phases dans le pipeline de la mise en production, ce qui rĂ©duit le travail manuel et rend le processus reproductible et rapide. Par exemple, Puppet est l’un des outils les plus populaires pour Infrastructure-as-Code (IaC), ce qui automatise le processus de mettre en place une nouvelle infrastructure (par exemple, une machine virtuelle ou un conteneur dans lequel une application peut ĂȘtre compilĂ©e, testĂ©e et dĂ©ployĂ©e) selon des spĂ©cifications textuelles. IaC a Ă©voluĂ© rapidement en raison de la croissance de l’infonuagique. Cependant, de nombreux problĂšmes existent encore pour la mise en production. Par exemple, alors que de nombreux outils de la mise en production gagnent en popularitĂ©, le choix de la technique la plus appropriĂ©e exige des praticiens d’évaluer empiriquement la performance des techniques dans un contexte rĂ©aliste, avec des donnĂ©es reprĂ©sentatives. Pire encore, Ă  un niveau plus haut, les ingĂ©nieurs de la mise en production doivent analyser le progrĂšs de l’organisation dans chaque phase de la mise en production, afin de savoir si la prochaine date de sortie peut ĂȘtre respectĂ©e ou si des obstacles se produisent. De nouveau, il n’y a pas de mĂ©thode cohĂ©rente et Ă©tablie pour ce faire. Pour aider les praticiens Ă  mieux analyser leur processus de la mise en production, nous explorons la façon selon laquelle la fouille de rĂ©fĂ©rentiels logiciels (Mining Software Repositories; MSR) est capable d’analyser le progrĂšs d’une phase de la mise en production ou d’évaluer la performance d’un outil de mise en production. MSR agit sur les donnĂ©es stockĂ©es dans des rĂ©fĂ©rentiels de logiciels tels que les systĂšmes de gestion de versions, les rĂ©fĂ©rentiels de bogues ou des environnements de rĂ©vision technique. Au lieu que les dĂ©veloppeurs, les testeurs et les examinateurs utilisent ces rĂ©fĂ©rentiels juste pour enregistrer des donnĂ©es de dĂ©veloppement (telles que les changements de code, rapports de bogues ou des rĂ©visions techniques), MSR rend ces donnĂ©es actionnables en les analysant. Par exemple, en faisant l’extraction de l’information des changements de code source et de rapports de bogues, on peut recrĂ©er l’ensemble du processus de dĂ©veloppement d’un projet ou de la mise en production. De nos jours, de nombreux rĂ©fĂ©rentiels logiciels Ă  source libre sont disponibles au public, offrant des possibilitĂ©s pour l’analyse empirique de la mise en production en utilisant des technologies MSR. Dans cette thĂšse, on a fait des analyses MSR pour deux phases critiques de la mise en production, c.-Ă -d. l’intĂ©gration et le provisionnement de l’environnement d’un logiciel (avec IaC), sur plusieurs projets larges Ă  source libre. Cette sĂ©rie d’études empiriques ciblait de comprendre le progrĂšs du processus de la mise en production et d’évaluer la performance des outils de point. Nous nous sommes concentrĂ©s principalement sur ces deux phases parce qu’elles sont essentielles dans la mise en production, et un grand nombre de donnĂ©es est disponible pour elles. D’abord, nous avons constatĂ© que la rĂ©vision technique et l’intĂ©gration de changements de code sont impactĂ©es par de diffĂ©rents facteurs. Nos rĂ©sultats suggĂšrent que les dĂ©veloppeurs rĂ©ussissent Ă  faire passer leurs contributions Ă  travers la rĂ©vision technique plus rapidement en changeant moins de sous-systĂšmes Ă  la fois et de diviser une grande contribution en plusieurs contributions plus petites. En outre, les dĂ©veloppeurs peuvent faire accepter leurs contributions plus facilement et plus rapidement en participant davantage dans la communautĂ© de dĂ©veloppeurs Ă  source libre et d’acquĂ©rir plus d’expĂ©rience dans des sous-systĂšmes similaires. Dans cette Ă©tude sur le noyau Linux, nous avons trouvĂ© que l’un des dĂ©fis majeurs de MSR dans le contexte de la mise en production logicielle est de relier les diffĂ©rents rĂ©fĂ©rentiels nĂ©cessaires. Par exemple, le noyau Linux ou le projet Apache HTTPD utilisent tous les deux des listes de diffusion pour effectuer le processus de rĂ©vision technique. Les experts examinent des contributions par courriel, et un fil de courriels est utilisĂ© pour ramasser toutes les diffĂ©rentes versions d’une contribution. Cependant, souvent un nouveau fil de discussion, avec sujet diffĂ©rent, est utilisĂ© pour soumettre une nouvelle rĂ©vision, ce qui signifie qu’aucun lien physique Ă©troit n’existe entre toutes les rĂ©visions d’une contribution. En plus, les versions rĂ©visĂ©es d’une contribution non plus n’ont de lien physique avec la version acceptĂ©e dans le rĂ©fĂ©rentiel de gestion de versions, Ă  moins qu’un identifiant de validation soit affichĂ© dans un courriel. Surtout quand une contribution a Ă©tĂ© rĂ©visĂ©e plusieurs fois et a beaucoup Ă©voluĂ© en comparaison avec la version initiale, le suivi Ă  partir de sa toute premiĂšre version est difficile Ă  faire. Nous avons proposĂ© trois approches de diffĂ©rente granularitĂ© et de rigueur diffĂ©rente, dans le but de rĂ©cupĂ©rer les liens physiques entre les rĂ©visions de contributions dans le mĂȘme fil de courriels. Dans notre Ă©tude, nous avons constatĂ© que la technique au niveau des lignes individuelles fonctionne le mieux pour lier des contributions entre diffĂ©rents fils de courriels, tandis qu’une combinaison de cette approche avec celle Ă  base de sommes de contrĂŽle rĂ©alise la meilleure performance pour relier les contributions dans un fil de courriels avec la version finale dans le rĂ©fĂ©rentiel de gestion de versions. Être capable de reconstituer l’historique complet de contributions nous a permis d’analyser le progrĂšs de la phase de rĂ©vision du noyau Linux. Nous avons constatĂ© que 25% des contributions acceptĂ©es prennent plus de quatre semaines pour leur rĂ©vision technique. DeuxiĂšmement, pour Ă©valuer la capacitĂ© de MSR pour analyser la performance des outils de mise en production, nous avons Ă©valuĂ© dans un projet commercial une approche d’intĂ©gration hybride qui combine les techniques de branchement et de “feature toggles”. Des branches permettent aux dĂ©veloppeurs de travailler sur diffĂ©rentes fonctionnalitĂ©s d’un systĂšme en parallĂšle, en isolation (sans impacter d’autres Ă©quipes), tandis qu’un feature toggle permet aux dĂ©veloppeurs de travailler dans une branche sur diffĂ©rentes tĂąches en cachant des fonctionnalitĂ©s sous dĂ©veloppement avec des conditions “if” dans le code source. Au lieu de rĂ©viser leur processus d’intĂ©gration entiĂšrement pour abandonner les branches et de passer aux feature toggles, l’approche hybride est un compromis qui tente de minimiser les risques des branches tout en profitant des avantages des feature toggles. Nous avons comparĂ© la performance avant et aprĂšs l’adoption de l’approche hybride, et avons constatĂ© que cette structure hybride peut rĂ©duire l’effort d’intĂ©gration et amĂ©liorer la productivitĂ©. Par consĂ©quent, l’approche hybride semble une pratique valable. Dans la phase de provisionnement, nous nous sommes concentrĂ©s sur l’évaluation de l’utilisation et de l’effort requis pour des outils populaires de “Infrastructure-as-Code” (IaC), qui permettent de spĂ©cifier les requis d’environnement dans un format textuel. Nous avons Ă©tudiĂ© empiriquement les outils IaC dans OpenStack et MediaWiki, deux projets Ă©normĂ©ment larges qui ont adoptĂ© deux des langues IaC actuellement les plus populaires: Puppet et Chef. Tout d’abord, nous avons comparĂ© l’effort de maintenance liĂ© Ă  IaC avec celui du codage et des tests. Nous avons constatĂ© que le code IaC prend une partie importante du systĂšme dans les deux projets et change frĂ©quemment, avec de grands changements de code. Les changements de code IaC sont Ă©troitement couplĂ©s avec les changements de code source, ce qui implique que les changements de code source ou des tests nĂ©cessitent des changements complĂ©mentaires au code source IaC, et pourrait causer un effort plus large de maintenance et de gestion de complexitĂ©. Cependant, nous avons Ă©galement observĂ© un couplage lĂ©ger avec des cas de test IaC et les donnĂ©es de provisionnement, qui sont de nouveaux types d’artĂ©facts dans le domaine de IaC. Par consĂ©quent, IaC peut nĂ©cessiter plus d’effort que les ingĂ©nieurs expectent. D’autres Ă©tudes empiriques devraient ĂȘtre envisagĂ©es. L’ingĂ©nierie de la mise en production moderne a dĂ©veloppĂ© rapidement, tandis que de nombreux nouvelles techniques et outils ont Ă©mergĂ© pour le soutenir de diffĂ©rentes perspectives. Cependant, le manque de techniques pour comprendre le progrĂšs des phases de la mise en production ou d’évaluer la performance d’outils de la mise en production rend le travail difficile pour les praticiens qui ont Ă  maintenir la qualitĂ© de leur processus de mise en production. Dans cette thĂšse, nous avons menĂ© une sĂ©rie d’études empiriques en utilisant des techniques de fouille des rĂ©fĂ©rentiels logiciels sur des donnĂ©es de larges projets Ă  source libre, qui montrent que, malgrĂ© des dĂ©fis, la technologie MSR peut aider les ingĂ©nieurs de la mise en production Ă  mieux comprendre leur progrĂšs et Ă  Ă©valuer le coĂ»t des outils et des activitĂ©s de la mise en production. Nous sommes heureux de voir que notre travail a inspirĂ© d’autres chercheurs pour analyser davantage le processus d’intĂ©gration, ainsi que la qualitĂ© du code IaC.---------- ABSTRACT Release engineering (Releng) is the process of delivering integrated work from developers as a complete product to end users. This process comprises the phases of Integration, Building and Testing, Deployment and Release to finally reach the market. While traditional software engineering takes several months to deliver a complete software product to customers, modern Release engineering aims to bring value to customer more quickly, receive useful feedback faster, and reduce time wasted on unsuccessful features in development process. A wealth of tools/techniques emerged to support Release engineering. They basically aim to automate phases in the Release engineering pipeline, reducing the manual labor, and making the procedure repeatable and fast. For example, Puppet is one of the most popular Infrastructure-as-Code (IaC) tools, which automates the process of setting up a new infrastructure (e.g., a virtual machine or a container in which an application can be compiled, tested and deployed) according to specifications. Infrastructure-as-Code has evolved rapidly due to the growth of cloud computing. However, many problems also come along. For example, while many Release engineering tools gain popularity, choosing the most suitable technique requires practitioners to empirically evaluate the performance of the technique in a realistic setting, with data mimicking their own setup. Even worse, at a higher level, release engineers need to understand the progress of each release engineering phase, in order to know whether the next release deadline can be met or where bottlenecks occur. Again, they have no clear methodology to do this. To help practitioners analyze their Release engineering process better, we explore the way of mining software repositories (MSR) on two critical phases of Releng of large open-source projects. Software repositories like version control systems, bug repositories or code reviewing environments, are used on a daily basis by developers, testers and reviewers to record information about the development process, such as code changes, bug reports or code reviews. By analyzing the data, one can recreate the process of how software is built and analyze how each phase of Releng applies in this project. Many repositories of open-source software projects are available publicly, which offers opportunities for empirical research of Release engineering. Therefore, we conduct a series of empirical studies of mining software repositories of popular open-source software projects, to understand the progress of Release engineering and evaluate the performance of state-of-the-art tools. We mainly focus on two phases: Integration and Provisioning (Infrastructure-as-Code), because these two phases are most critical in Release engineering and ample quantity data is available. In our empirical study of the Integration process, we evaluate how well MSR techniques based on version control and review data explain the major factors impacting the probability and time taken for a patch to be successfully integrated into an upcoming release. We selected the Linux kernel, one of the most popular OSS projects having a long history and a strict integration hierarchy, as our case study. We collected data from reviewing and integration tools of the Linux kernel (mailing lists and Git respectively), and extracted characteristics covering six dimensions. Then, we built models with acceptance/time as output and analyzed which characteristics have impact on the reviewing and integration processes. We found that reviewing and integration are impacted by different factors. Our findings suggest that developers manage to get their patch go through review phase faster by changing less subsystems at a time and splitting a large patch into multiple smaller patches. Also, developers can make patches accepted more easily and sooner by participating more in the community and gaining more experience in similar patches. In this study on the Linux kernel, we found that one major challenge of MSR is to link different repositories. For example, the Linux kernel and Apache project both use mailing lists to perform the reviewing process. Contributors submit and maintainers review patches all by emails, where usually an email thread is used to collect all different versions of a patch. However, often a new email thread, with different subject, is being used to submit a new patch revision, which means that no strong physical links between all patch revisions exist. On top of that, the accepted patch also does not have a physical link to the resulting commit in the version control system, unless a commit identifier is posted in an email. Especially when a patch has been revised multiple times and evolved a lot from the original version, tracking its very first version is difficult. We proposed three approaches of different granularity and strictness, aiming to recover the physical links between emails in the same thread. In the study, we found that a line-based technique works best to link emails between threads while the combination of line-based and checksum-based technique achieves the best performance for linking emails in a thread with the final, accepted commit. Being able to reconstruct the full history of a patch allowed us to analyze the performance of the reviewing phase. We found that 25% of commits have a reviewing history longer than four weeks. To evaluate the ability of MSR to analyze the performance of Releng tools, we evaluated a hybrid integration approach, which combines branching and toggling techniques together, in a commercial project. Branching allows developers to work on different branches in parallel, while toggling enables developers on the same branch on different tasks. Instead of revising their whole integration process to drop branching and move to toggling, hybrid toggling is a compromise that tries to minimize the risks of branching while enjoy the benefits of toggling. We compared the performance before and after adopting hybrid toggling, and found that this hybrid structure can reduce integration effort and improve productivity. Hence, hybrid toggling seems a worthwhile practice. In the Provisioning phase, we focus on evaluating the usage and effort of the popular tools used in modern Release engineering: Infrastructure-as-Code (IaC). We empirically studied IaC tools in OpenStack and MediaWiki, which have a huge code base and adopt two currently most popular IaC languages: Puppet and Chef. First, we study maintenance effort related to the regular development and testing process of OpenStack, then compare this to IaC-related effort in both case studies. We found that IaC code takes a large proportion in both projects and it changes frequently, with large churn size. Meanwhile, IaC code changes are tightly coupled with source code changes, which implies that changes to source or test code require accompanying changes to IaC, which might lead to increased complexity and maintenance effort. Furthermore, the most common reason for such coupling is “Integration of new modules or service”. However, we also observed IaC code has light coupling with IaC test cases and test data, which are new kinds of artifacts in IaC domain. Hence, IaC may take more effort than engineers expect and further empirical studies should be considered. Modern Release engineering has developed rapidly, while many new techniques/tools emerge to support it from different perspectives. However, lack of knowledge of the current Release engineering progress and performance of these techniques makes it difficult for practitioners to sustain high quality Releng approach in practice. In this thesis, we conducted a series of empirical studies of mining software repositories of large open-source projects, that show that, despite some challenges, MSR technology can help release engineers understand better the progress of and evaluate the cost of Release engineering tools and activities. We are glad to see our work has inspired other researchers to further analyze the integration process as well as the quality of IaC code
    corecore