4,163 research outputs found

    Empirical Evaluation of Mutation-based Test Prioritization Techniques

    Full text link
    We propose a new test case prioritization technique that combines both mutation-based and diversity-based approaches. Our diversity-aware mutation-based technique relies on the notion of mutant distinguishment, which aims to distinguish one mutant's behavior from another, rather than from the original program. We empirically investigate the relative cost and effectiveness of the mutation-based prioritization techniques (i.e., using both the traditional mutant kill and the proposed mutant distinguishment) with 352 real faults and 553,477 developer-written test cases. The empirical evaluation considers both the traditional and the diversity-aware mutation criteria in various settings: single-objective greedy, hybrid, and multi-objective optimization. The results show that there is no single dominant technique across all the studied faults. To this end, \rev{we we show when and the reason why each one of the mutation-based prioritization criteria performs poorly, using a graphical model called Mutant Distinguishment Graph (MDG) that demonstrates the distribution of the fault detecting test cases with respect to mutant kills and distinguishment

    Practical Combinatorial Interaction Testing: Empirical Findings on Efficiency and Early Fault Detection

    Get PDF
    Combinatorial interaction testing (CIT) is important because it tests the interactions between the many features and parameters that make up the configuration space of software systems. Simulated Annealing (SA) and Greedy Algorithms have been widely used to find CIT test suites. From the literature, there is a widely-held belief that SA is slower, but produces more effective tests suites than Greedy and that SA cannot scale to higher strength coverage. We evaluated both algorithms on seven real-world subjects for the well-studied two-way up to the rarely-studied six-way interaction strengths. Our findings present evidence to challenge this current orthodoxy: real-world constraints allow SA to achieve higher strengths. Furthermore, there was no evidence that Greedy was less effective (in terms of time to fault revelation) compared to SA; the results for the greedy algorithm are actually slightly superior. However, the results are critically dependent on the approach adopted to constraint handling. Moreover, we have also evaluated a genetic algorithm for constrained CIT test suite generation. This is the first time strengths higher than 3 and constraint handling have been used to evaluate GA. Our results show that GA is competitive only for pairwise testing for subjects with a small number of constraints

    The development of 'for experts systems' as heuristic reasoning platforms in risk decision support: a consideration of tool design, technology transfer and compatability with Bayesian decision analysis

    Get PDF
    This work considers the creation of two risk and decision support systems, one for the National Air Traffic Services of the UK and one for Unilever, a multi-national. Their development contributes to risk decision science in the area of decision support in particular. This contribution is based on the development real-life systems, it has three key elements. One, it addresses the fact that, for practical environments like these, the science of risk and decisions is insufficiently resolved to be accepted and easily used. Two, the systems share an arena with subjective Bayesian decision analysis. The benefits of a hybrid form of the two approaches to generate higher levels of user acceptance and organisational transfer is discussed. Three, they take the unique approach of being 'for experts' systems rather than 'expert systems'. This approach offers a number of benefits to applied user communities. These include: a decision support system which remains grounded within the reasoning world view of the decision makers; an expansion and refinement of the existing 'natural heuristics' that decision makers use currently; a scoring and visualisation environment which is both fast and flexible but allows for, previously unavailable, levels of reasoning transparency and comparison. Taken in total the combination of the tool design, the heuristic artefacts within them and their influence on the hosts organisations, the two systems have proven they can provide an effective and valued 'heuristic reasoning platform' for risks and issues. A future research direction is to explore ways in which the highly transferable heuristic artefacts in these systems, particularly measurement and data manipulation, might be strengthened via hybridisation with more powerful, but less transferred, formal systems like Bayes decision analysis

    Improvements to Test Case Prioritisation considering Efficiency and Effectiveness on Real Faults

    Get PDF
    Despite the best efforts of programmers and component manufacturers, software does not always work perfectly. In order to guard against this, developers write test suites that execute parts of the code and compare the expected result with the actual result. Over time, test suites become expensive to run for every change, which has led to optimisation techniques such as test case prioritisation. Test case prioritisation reorders test cases within the test suite with the goal of revealing faults as soon as possible. Test case prioritisation has received a lot of research that has indicated that prioritised test suites can reveal faults faster, but due to a lack of real fault repositories available for research, prior evaluations have often been conducted on artificial faults. This thesis aims to investigate whether the use of artificial faults represents a threat to the validity of previous studies, and proposes new strategies for test case prioritisation that increase the effectiveness of test case prioritisation on real faults. This thesis conducts an empirical evaluation of existing test case prioritisation strategies on real and artificial faults, which establishes that artificial faults provide unreliable results for real faults. The study found that there are four occasions on which a strategy for test case prioritisation would be considered no better than the baseline when using one fault type, but would be considered a significant improvement over the baseline when using the other. Moreover, this evaluation reveals that existing test case prioritisation strategies perform poorly on real faults, with no strategies significantly outperforming the baseline. Given the need to improve test case prioritisation strategies for real faults, this thesis proceeds to consider other techniques that have been shown to be effective on real faults. One such technique is defect prediction, a technique that provides estimates that a class contains a fault. This thesis proposes a test case prioritisation strategy, called G-Clef, that leverages defect prediction estimates to reorder test suites. While the evaluation of G-Clef indicates that it outperforms existing test case prioritisation strategies, the average predicted location of a faulty class is 13% of all classes in a system, which shows potential for improvement. Finally, this thesis conducts an investigative study as to whether sentiments expressed in commit messages could be used to improve the defect prediction element of G-Clef. Throughout the course of this PhD, I have created a tool called Kanonizo, an open-source tool for performing test case prioritisation on Java programs. All of the experiments and strategies used in this thesis were implemented into Kanonizo

    Evaluation of a fuzzy-expert system for fault diagnosis in power systems

    Get PDF
    A major problem with alarm processing and fault diagnosis in power systems is the reliance on the circuit alarm status. If there is too much information available and the time of arrival of the information is random due to weather conditions etc., the alarm activity is not easily interpreted by system operators. In respect of these problems, this thesis sets out the work that has been carried out to design and evaluate a diagnostic tool which assists power system operators during a heavy period of alarm activity in condition monitoring. The aim of employing this diagnostic tool is to monitor and raise uncertain alarm information for the system operators, which serves a proposed solution for restoring such faults. The diagnostic system uses elements of AI namely expert systems, and fuzzy logic that incorporate abductive reasoning. The objective of employing abductive reasoning is to optimise an interpretation of Supervisory Control and Data Acquisition (SCADA) based uncertain messages when the SCADA based messages are not satisfied with simple logic alone. The method consists of object-oriented programming, which demonstrates reusability, polymorphism, and readability. The principle behind employing objectoriented techniques is to provide better insights and solutions compared to conventional artificial intelligence (Al) programming languages. The characteristics of this work involve the development and evaluation of a fuzzy-expert system which tries to optimise the uncertainty in the 16-lines 12-bus sample power system. The performance of employing this diagnostic tool is assessed based on consistent data acquisition, readability, adaptability, and maintainability on a PC. This diagnostic tool enables operators to control and present more appropriate interpretations effectively rather than a mathematical based precise fault identification when the mathematical modelling fails and the period of alarm activity is high. This research contributes to the field of power system control, in particular Scottish Hydro-Electric PLC has shown interest and supplied all the necessary information and data. The AI based power system is presented as a sample application of Scottish Hydro-Electric and KEPCO (Korea Electric Power Corporation)

    Prioritisation of requests, bugs and enhancements pertaining to apps for remedial actions. Towards solving the problem of which app concerns to address initially for app developers

    Get PDF
    Useful app reviews contain information related to the bugs reported by the app’s end-users along with the requests or enhancements (i.e., suggestions for improvement) pertaining to the app. App developers expend exhaustive manual efforts towards the identification of numerous useful reviews from a vast pool of reviews and converting such useful reviews into actionable knowledge by means of prioritisation. By doing so, app developers can resolve the critical bugs and simultaneously address the prominent requests or enhancements in short intervals of apps’ maintenance and evolution cycles. That said, the manual efforts towards the identification and prioritisation of useful reviews have limitations. The most common limitations are: high cognitive load required to perform manual analysis, lack of scalability associated with limited human resources to process voluminous reviews, extensive time requirements and error-proneness related to the manual efforts. While prior work from the app domain have proposed prioritisation approaches to convert reviews pertaining to an app into actionable knowledge, these studies have limitations and lack benchmarking of the prioritisation performance. Thus, the problem to prioritise numerous useful reviews still persists. In this study, initially, we conducted a systematic mapping study of the requirements prioritisation domain to explore the knowledge on prioritisation that exists and seek inspiration from the eminent empirical studies to solve the problem related to the prioritisation of numerous useful reviews. Findings of the systematic mapping study inspired us to develop automated approaches for filtering useful reviews, and then to facilitate their subsequent prioritisation. To filter useful reviews, this work developed six variants of the Multinomial Naïve Bayes method. Next, to prioritise the order in which useful reviews should be addressed, we proposed a group-based prioritisation method which initially classified the useful reviews into specific groups using an automatically generated taxonomy, and later prioritised these reviews using a multi-criteria heuristic function. Subsequently, we developed an individual prioritisation method that directly prioritised the useful reviews after filtering using the same multi-criteria heuristic function. Some of the findings of the conducted systematic mapping study not only provided the necessary inspiration towards the development of automated filtering and prioritisation approaches but also revealed crucial dimensions such as accuracy and time that could be utilised to benchmark the performance of a prioritisation method. With regards to the proposed automated filtering approach, we observed that the performance of the Multinomial Naïve Bayes variants varied based on their algorithmic structure and the nature of labelled reviews (i.e., balanced or imbalanced) that were made available for training purposes. The outcome related to the automated taxonomy generation approach for classifying useful review into specific groups showed a substantial match with the manual taxonomy generated from domain knowledge. Finally, we validated the performance of the group-based prioritisation and individual prioritisation methods, where we found that the performance of the individual prioritisation method was superior to that of the group-based prioritisation method when outcomes were assessed for the accuracy and time dimensions. In addition, we performed a full-scale evaluation of the individual prioritisation method which showed promising results. Given the outcomes, it is anticipated that our individual prioritisation method could assist app developers in filtering and prioritising numerous useful reviews to support app maintenance and evolution cycles. Beyond app reviews, the utility of our proposed prioritisation solution can be evaluated on software repositories tracking bugs and requests such as Jira, GitHub and so on

    Contributions to the Optimisation of aircraft noise abatement procedures

    Get PDF
    Tot i que en les últimes dècades la reducció del soroll emès pels avions ha estat substancial, el seu impacte a la població ubicada a prop dels aeroports és un problema que encara persisteix. Contenir el soroll generat per les operacions d'aeronaus, tot assumint al mateix temps la creixent demanda de vols, és un dels principals desafiaments a que s'enfronten les autoritats aeroportuàries, els proveïdors de serveis per a la navegació aèria i els operadors de les aeronaus. A part de millorar l'aerodinàmica o les emissions sonores de les aeronaus, l'impacte acústic de les seves operacions es pot reduir també gràcies a la definició de nous procediments de vol més òptims. Aquests procediments s'anomenen generalment Procediments d'Atenuació de Soroll (PAS) i poden incloure rutes preferencials de vol (a fi d'evitar les zones poblades) i també perfils de vol verticals optimitzats. Els procediments actuals per a la reducció de soroll estan molt lluny de ser els òptims. En general, la seva optimització no és possible a causa de les limitacions d'avui en dia en els mètodes de navegació, els equips d'aviònica i la complexitat present en alguns espais aeris. D'altra banda, molts PAS s'han dissenyat de forma manual per un grup d'experts i amb l'ajuda de diverses iteracions. Tot i això, en els propers anys s'esperen nous sistemes d'aviònica i conceptes de gestió del trànsit aeri que permetin millorar el disseny d'aquests procediments, fent que siguin més flexibles. En els pocs casos on s'optimitzen PAS, se sol utilitzar una mètrica acústica en l'elaboració de les diferents funcions objectiu i per tant, no es tenen en compte les molèsties sonores reals. La molèstia és un concepte subjectiu, complexe i que depèn del context en que s'usa i la seva integració en l'optimització de trajectòries segueix essent un aspecte a estudiar.La present tesi doctoral es basa en el fet que en el futur serà possible definir trajectòries més flexibles i precises. D'aquesta manera es permetrà la definició de procediments de vol òptims des d'un punt de vista de molèsties acústiques. Així doncs, es considera una situació en que aquest tipus de procediments poden ser dissenyats de forma automàtica o semi-automàtica per un sistema expert basat en tècniques d'optimització i de raonament aproximat. Això serviria com una eina de presa de decisions per planificadors de l'espai aeri i dissenyadors de procediments. En aquest treball es desenvolupa una eina completa pel càlcul de PAS òptims. Això inclou un conjunt de models no lineals que tinguin en compte la dinàmica de les aeronaus, les limitacions de la trajectòria i les funcions objectiu. La molèstia del soroll es modela utilitzant tècniques de lògica difusa en funció del nivell màxim de so percebut, l'hora del dia i el tipus de zona a sobrevolar. Llavors, s'identifica i es formula formalment el problema com a un problema de control òptim multi-criteri. Per resoldre'l es proposa un mètode de transcripció directa per tal de transformar-lo en un problema de programació no lineal. A continuació s'avaluen una sèrie de tècniques d'optimització multi-objectiu i entre elles es destaca el mètode d'escalarització, el més utilitzat en la literatura. No obstant això, s'exploren diverses tècniques alternatives que permeten superar certs inconvenients que l'escalarització presenta. En aquest context, es presenten i proven tècniques d'optimització lexicogràfica, jeràrquica, igualitària (o min-max) i per objectius. D'aquest anàlisi es desprenen certes conclusions que permeten aprofitar les millors característiques de cada tècnica i formar finalment una tècnica composta d'optimització multi-objectiu. Aquesta última estratègia s'aplica amb èxit a un escenari real i complex, on s'optimitzen les sortides cap a l'Est de la pista 02 de l'aeroport de Girona. En aquest exemple, dos tipus diferents d'aeronaus volant a diferents períodes del dia són simulats obtenint, conseqüentment, diferents trajectòries òptimes.Aunque en las últimas décadas la reducción del ruido emitido por los aviones ha sido sustancial, su impacto en la población ubicada cerca de los aeropuertos es un problema persistente. Contener este ruido, asumiendo al mismo tiempo la creciente demanda de vuelos, es uno de los principales desafíos a que se enfrentan las autoridades aeroportuarias, los proveedores de servicios para la navegación y los operadores. Aparte de mejorar la aerodinámica o las emisiones sonoras de las aeronaves, su impacto acústico se puede reducir también gracias a la definición de nuevos procedimientos de vuelo optimizados. Éstos, se denominan generalmente Procedimientos de Atenuación de Ruido (PAR) y pueden incluir rutas preferenciales de vuelo (a fin de evitar las zonas pobladas) y también perfiles de vuelo optimizados.Los procedimientos actuales para la reducción de ruido están muy lejos de ser los óptimos. En general, su optimización no es posible debido a las limitaciones de hoy en día en los métodos de navegación, los equipos de aviónica y la complejidad presente en algunos espacios aéreos. Por otra parte, muchos PAR se han diseñado de forma manual por un grupo de expertos y con la ayuda de varias iteraciones. Sin embargo, en los próximos años se esperan nuevos sistemas de aviónica y conceptos de gestión del tráfico aéreo que permitan mejorar el diseño de estos procedimientos, haciendo que sean más flexibles. En los pocos casos donde se optimizan PAR, se suele utilizar una métrica acústica en la elaboración de las diferentes funciones objetivo y por lo tanto, no se tienen en cuenta las molestias sonoras reales. La molestia es un concepto subjetivo, complejo y que depende del contexto en que se usa y su integración en la optimización de trayectorias sigue siendo un aspecto a estudiar. La presente tesis doctoral se basa en el hecho de que en el futuro será posible definir trayectorias más flexibles y precisas. De esta manera se permitirá la definición de procedimientos de vuelo óptimos desde un punto de vista de molestias acústicas. Se considera una situación en que este tipo de procedimientos pueden ser diseñados de forma automática o semi-automática por un sistema experto basado en técnicas de optimización y de razonamiento aproximado. Esto serviría como una herramienta de toma de decisiones para planificadores del espacio aéreo y diseñadores de procedimientos.En este trabajo se desarrolla una herramienta completa para el cálculo de PAR óptimos. Esto incluye un conjunto de modelos no lineales que tengan en cuenta la dinámica de las aeronaves, las limitaciones de la trayectoria y las funciones objetivo. La molestia del ruido se modela utilizando técnicas de lógica difusa en función del nivel máximo de sonido percibido, la hora del día y el tipo de zona a sobrevolar. Entonces, se identifica y se formula formalmente el problema como un problema de control óptimo multi-criterio. Para resolverlo se propone un método de transcripción directa para transformarlo en un problema de programación no lineal. A continuación se evalúan una serie de técnicas de optimización multi-objetivo y entre ellas se destaca el método de escalarización, el más utilizado en la literatura. Sin embargo, se exploran diversas técnicas alternativas que permiten superar ciertos inconvenientes que la escalarización presenta. En este contexto, se presentan y prueban técnicas de optimización lexicográfica, jerárquica, igualitaria (o min-max) y por objetivos. De este análisis se desprenden ciertas conclusiones que permiten aprovechar las mejores características de cada técnica y formar finalmente una técnica compuesta de optimización multi-objetivo. Esta última estrategia se aplica con éxito en un escenario real y complejo, donde se optimizan las salidas hacia el Este de la pista 02 del aeropuerto de Girona. En este ejemplo, dos tipos diferentes de aeronaves volando a diferentes periodos del día son simulados obteniendo, consecuentemente, diferentes trayectorias óptimas.Despite the substantial reduction of the emitted aircraft noise in the last decades, the noise impact on communities located near airports is a problem that still lingers. Containing the sound generated by aircraft operations, while meeting the increasing demand for aircraft transportation, is one of the major challenges that airport authorities, air traffic service providers and aircraft operators may deal with. Aircraft noise can be reduced by improving the aerodynamics of the aircraft, the engine noise emissions but also in designing new optimised flight procedures. These procedures, are generally called Noise Abatement Procedures (NAP) and may include preferential routings (in order to avoid populated areas) and also schedule optimised vertical flight path profiles. Present noise abatement procedures are far from being optimal in regards to minimising noise nuisances. In general, their optimisation is not possible due to the limitations of navigation methods, current avionic equipments and the complexity present at some terminal airspaces. Moreover, NAP are often designed manually by a group of experts and several iterations are needed. However, in the forthcoming years, new avionic systems and new Air Traffic Management concepts are expected to significantly improve the design of flight procedures. This will make them more flexible, and therefore will allow them to be more environmental friendly. Furthermore, in the few cases where NAP are optimised, an acoustical metric is usually used when building up the different optimisation functions. Therefore, the actual noise annoyance is not taken into account in the optimisation process. The annoyance is a subjective, complex and context-dependent concept. Even if sophisticated noise annoyance models are already available today, their integration into an trajectory optimisation framework is still something to be further explored. This dissertation is mainly focused on the fact that those precise and more flexible trajectories will enable the definition of optimal flight procedures regarding the noise annoyance impact, especially in the arrival and departure phases of flights. In addition, one can conceive a situation where these kinds of procedures can be designed automatically or semi-automatically by an expert system, based on optimisation techniques and approximate reasoning. This would serve as a decision making tool for airspace planners and procedure designers.A complete framework for computing optimal NAP is developed in this work. This includes a set of nonlinear models which take into account aircraft dynamics, trajectory constraints and objective functions. The noise annoyance is modelled by using fuzzy logic techniques in function of the perceived maximum sound level, the hour of the day and the type of over-flown zone. The problem tackled, formally identified and formulated as a multi-criteria optimal control problem, uses a direct transcription method to transform it into a Non Linear Programming problem. Then, an assessment of different multi-objective optimisation techniques is presented. Among these techniques, scalarisation methods are identified as the most widely used methodologies in the present day literature. Yet, in this dissertation several alternative techniques are explored in order to overcome some known drawbacks of this technique. In this context, lexicographic, hierarchical, egalitarian (or min-max) and goal optimisation strategies are presented and tested. From this analysis some conclusions arise allowing us to take advantage of the best features of each optimisation technique aimed at building a final compound multi-objective optimisation strategy. Finally, this strategy is applied successfully to a complex and real scenario, where the East departures of runway 02 at the airport of Girona (Catalonia, Spain) are optimised. Two aircraft types are simulated at different periods of the day obtaining different optimal trajectories.Postprint (published version

    Trident: Controlling Side Effects in Automated Program Repair

    Get PDF
    The goal of program repair is to eliminate a bug in a given program by automatically modifying its source code. The majority of real-world software is written in imperative programming languages. Each function or expression in imperative code may have side effects, observable effects beyond returning a value. Existing program repair approaches have a limited ability to handle side effects. Previous test-driven semantic repair approaches only synthesise patches without side effects. Heuristic repair approaches generate patches with side effects only if suitable code fragments exist in the program or a database of repair patterns, or can be derived from training data. This work introduces Trident, the first test-driven program repair approach that synthesizes patches with side effects without relying on the plastic surgery hypothesis, a database of patterns, or training data. Trident relies on an interplay of several parts. First, it infers a specification for synthesising side-effected patches using symbolic execution with a custom state merging strategy that alleviates path explosion due to side effects. Second, it uses a novel component-based patch synthesis approach that supports lvalues, values that appear on the left-hand sides of assignments. In an evaluation on open-source projects, Trident successfully repaired 6 out of 10 real bugs that require insertion of new code with side effects, which previous techniques do not therefore repair. Evaluated on the ManyBugs benchmark, Trident successfully repaired two new bugs that previous approaches could not. Adding patches with side effects to the search space can exacerbate test-overfitting. We experimentally demonstrate that the simple heuristic of preferring patches with the fewest side effects alleviates the problem. An evaluation on a large number of smaller programs shows that this strategy reduces test-overfitting caused by side-effects, increasing the rate of correct patches from 33.3% to 58.3%
    • …
    corecore