13 research outputs found

    The Seed is Strong: Seeding Strategies in Search-Based Software Testing

    Full text link
    Abstract—Search-based techniques have been shown useful for the task of generating tests, for example in the case of object-oriented software. But, as for any meta-heuristic search, the efficiency is heavily dependent on many different factors; seeding is one such factor that may strongly influence this efficiency. In this paper, we evaluate new and typical strategies to seed the initial population as well as to seed values introduced during the search when generating tests for object-oriented code. We report the results of a large empirical analysis carried out on 20 Java projects (for a total of 1,752 public classes). Our experiments show with strong statistical confidence that, even for a testing tool that is already able to achieve high coverage, the use of appropriate seeding strategies can further improve performance. Keywords-test case generation; search-based testing; testing classes; search-based software engineering I

    Automated test data generation using a scatter search approach

    Get PDF
    The techniques for the automatic generation of test cases try to efficiently find a small set of cases that allow a given adequacy criterion to be fulfilled, thus contributing to a reduction in the cost of software testing. In this paper we present and analyze two versions of an approach based on the Scatter Search metaheuristic technique for the automatic generation of software test cases using a branch coverage adequacy criterion. The first test case generator, called TCSS, uses a diversity property to extend the search of test cases to all branches of the program under test in order to generate test cases that cover these. The second, called TCSS-LS, is an extension of the previous test case generator which combines the diversity property with a local search method that allows the intensification of the search for test cases that cover the difficult branches. We present the results obtained by our generators and carry out a detailed comparison with many other generators, showing a good performance of our approac

    Search-based Multi-Vulnerability Testing of XML Injections in Web Applications

    Get PDF
    Modern web applications often interact with internal web services, which are not directly accessible to users. However, malicious user inputs can be used to exploit security vulnerabilities in web services through the application front-ends. Therefore, testing techniques have been proposed to reveal security flaws in the interactions with back-end web services, e.g., XML Injections (XMLi). Given a potentially malicious message between a web application and web services, search-based techniques have been used to find input data to mislead the web application into sending such a message, possibly compromising the target web service. However, state-of-the-art techniques focus on (search for) one single malicious message at a time. Since, in practice, there can be many different kinds of malicious messages, with only a few of them which can possibly be generated by a given front-end, searching for one single message at a time is ineffective and may not scale. To overcome these limitations, we propose a novel co-evolutionary algorithm (COMIX) that is tailored to our problem and uncover multiple vulnerabilities at the same time. Our experiments show that COMIX outperforms a single-target search approach for XMLi and other multi-target search algorithms originally defined for white-box unit testing

    Heuristic Splitting of Source Code Identifiers

    Get PDF
    RÉSUMÉ La maintenance regroupe l’ensemble des activités effectuées pour modifier un logiciel après sa mise en opérations. La maintenance est la phase la plus coûteuse du développement logiciel. La compréhension de programmes est une activité cognitive qui repose sur la construction de représentations mentales à partir des artefacts logiciels. Les développeurs passent un temps considérable à lire et comprendre leurs programmes avant d’effectuer des changements. Une documentation claire et concise peut aider les développeurs à inspecter et à comprendre leurs programmes. Mais, l'un des problèmes majeurs que rencontrent les développeurs durant la maintenance est que la documentation est souvent obsolète ou tout simplement pas disponible. Par conséquent, il est important de rendre le code source plus lisible, par exemple en insistant auprès des développeurs pour qu’ils ajoutent des commentaires dans leur code et respectent des règles syntaxiques et sémantiques en écrivant les identificateurs des concepts dans leurs programmes. Mais certains identificateurs sont constitués de termes des mots qui sont abrégés ou transformés. La reconnaissance des termes composants les identificateurs n'est pas une tâche facile, surtout lorsque la convention de nommage n'est pas respectée. À notre connaissance deux familles d’approches existent pour décomposer les identificateurs : la plus simple considère l’utilisation du renommage et la présence des séparateurs explicites. La stratégie la plus complète est implémentée par l’outil Samurai (Enslen, Hill et al. 2009), elle se base sur le lexique et utilise les algorithmes gloutons pour identifier les mots qui constitue les identificateurs. Samurai est une technique qui considère que si un identificateur est utilisé dans une partie du code, il est probablement utilisé dans le même contexte que son code d’origine (root). Toutefois, les approches mentionnées ci-dessus ont leurs limites. Premièrement,elles sont pour la plus part incapables d’associer des sous chaînes d’identifiants à des mots ou des termes; comme par exemple, des termes spécifiques à un domaine ou des mots de la langue anglais. Ces associations pourrait être utile pour comprendre le degré d’expressivité des termes décrit dans le code source par rapport aux artefacts de haut niveau qui leurs sont associés (De Lucia, Di Penta et al. 2006). Deuxièmement, ils sont incapables de prendre en compte les transformations de mots, tel que l’abréviation de pointeur en pntr. Notre approche est inspirée de la technique de reconnaissance de la parole. La décomposition que nous proposons est basée sur une version modifiée de l’algorithme Dynamic Time Warping (DTW) proposé par Herman Ney pour la reconnaissance de la parole (Ney 1984) et sur une métrique qui est la distance de Levenshtein (Levenshtein 1966). Elle a été développée dans le but de traiter les limitations des approches existantes surtout celles qui consistent en la segmentation des identificateurs contenant des abréviations et à la gestion des transformations des mots du dictionnaire. L’approche proposée a été appliquée à des identificateurs extraits de deux programmes différents : JHotDraw et Lynx. Les résultats obtenus ont été comparés aux oracles construits manuellement et également à ceux d’un algorithme de "splitting" basé sur la casse. Les résultats obtenus ont révélé que notre approche a un aspect nondéterministe relatif à l’établissent des méthodes de transformation appliquées et aux mots du dictionnaire et aux choix des mots du dictionnaire qui subissent ces transformations. Ils montrent que l'approche proposée à de meilleurs résultats que celle basée sur la casse. En particulier, pour le programme Lynx, le Camel Case Splitting n’a été en mesure de décomposer correctement qu’environ 18% des identificateurs, contrairement à notre approche qui a été capable de décomposer 93% des identificateurs. En ce qui concerne JHotDraw, le Camel Case splitter a montré une exactitude de 91% tandis que notre approche a assuré 96% de résultats corrects.----------ABSTRACT Maintenance is the most costly phase of software life cycle. In industry, the maintenance cost of a program is estimated at over 50% of its total life cycle costs (Sommerville 2000). Practical experience with large projects has shown that developers still face difficulties in maintaining their program (Pigoski 1996). Studies (Corbi 1989) have shown that over half of this maintenance is devoted to understanding the program itself. Program comprehension is therefore essential. Program comprehension is a cognitive activity that relies on the construction of mental representations from software artifacts. Comprehension is more difficult for source code (Takang, Grubb et al. 1996). Several tools for nderstanding have been developed (Storey 2006); these tools range from simple visual inspection of the text (such as the explorers of code) to the dynamic analysis of program performance through program execution. While many efforts focus on automating the understanding of programs, a significant part of this work must still be done manually, such as: analyzing the source code, technical reports, and documentation. A clear and concise documentation can help developers to inspect and understand their programs. Unfortunately, one of the major problems faced by developers, during maintenance, is that documentation is often outdated, or not available. Indeed, developers are often concerned about time and costs constraints,neglecting to update the documentation of different versions of their programs. In the source code, identifiers and comments are key means to support developers during their understanding and maintenance activities. Indeed, identifiers are often composed of terms reflecting domain concepts. Usually, identifiers are built by considering a set of rules for choosing the character sequence. Some identifiers are composed of terms that are abbreviated and transformed of the words. Recognizing the terms in the identifiers is not an easy task when naming convention is not used. In this thesis we will use a technique inspired from speech recognition, Dynamic Time Warping and meta-heuristic algorithms, to split identifiers into component terms. We propose a novel approach to identify terms composing identifiers that is organized in the following steps: A dictionary of English words is built and will be our source of words. We take an identifier and look through the dictionary to find terms that are exactly contained in the identifier. For each word of a dictionary, we will compute the distance between word and the input identifier. For terms that exactly exist in both dictionary and identifier, the distance is zero and we obtain an exact splitting of the identifier and the process terminate successfully. Other words of the dictionary with non-zero distance may indicate that the identifier is built from terms that are not exactly in the dictionary and some modification should be applied on the words. Some words of the dictionary have more characters than the terms in the identifier. Some transformations such as deleting all vowels or deleting some characters will be applied on the words of the dictionary. The modification of the words is applied in the context of a Hill Climbing search. For each new transformed word, we will calculate its distance to the input identifier via Dynamic Time Warping (DTW). If the recently created word reduces the global minimum distance then we add that word to the current dictionary otherwise another transformation is applied. We will continue these steps until we reach to the distance of zero or the character number of the dictionary word become less than three or all the possible transformation have been applied. The identifier is split with words such that their distances are zero or have the lowest distance between other words of the dictionary. To analyze the proposed identifier splitting approach, with the purpose of evaluating its ability to adequately identify dictionary words composing identifiers, even in presence of word transformations, we carried out a case study on two software systems, JHotDraw and Lynx. Results based on manually-built oracles indicate that the proposed approach outperforms a simple Camel Case splitter. In particular, for Lynx, the Camel Case splitter was able to correctly split only about 18% of identifiers versus 93% with our approach, while on JHotDraw, the Camel Case splitter exhibited a correctness of 91%, while our approach ensured 96% of correct results. Our approach was also able to map abbreviations to dictionary words, in 44% and 70% of cases for JHotDraw and Lynx, respectively. We conclude that DTW, Hill Climbing and transformations are useful to split identifiers into words and propose future directions or research

    Automatic Generation of Tests to Exploit XML Injection Vulnerabilities in Web Applications

    Get PDF
    Modern enterprise systems can be composed of many web services (e.g., SOAP and RESTful). Users of such systems might not have direct access to those services, and rather interact with them through a single-entry point which provides a GUI (e.g., a web page or a mobile app). Although the interactions with such entry point might be secure, a hacker could trick such systems to send malicious inputs to those internal web services. A typical example is XML injection targeting SOAP communications. Previous work has shown that it is possible to automatically generate such kind of attacks using search-based techniques. In this paper, we improve upon previous results by providing more efficient techniques to generate such attacks. In particular, we investigate four different algorithms and two different fitness functions. A large empirical study, involving also two industrial systems, shows that our technique is effective at automatically generating XML injection attacks
    corecore