115 research outputs found

    Optimization Techniques for Automated Software Test Data Generation

    Get PDF
    Esta tesis propone una variedad de contribuciones al campo de pruebas evolutivas. Hemos abarcados un amplio rango de aspectos relativos a las pruebas de programas: código fuente procedimental y orientado a objetos, paradigmas estructural y funcional, problemas mono-objetivo y multi-objetivo, casos de prueba aislados y secuencias de pruebas, y trabajos teóricos y experimentales. En relación a los análisis llevados a cabo, hemos puesto énfasis en el análisis estadístico de los resultados para evaluar la significancia práctica de los resultados. En resumen, las principales contribuciones de la tesis son: Definición de una nueva medida de distancia para el operador instanceof en programas orientados a objetos: En este trabajo nos hemos centrado en un aspecto relacionado con el software orientado a objetos, la herencia, para proponer algunos enfoques que pueden ayudar a guiar la búsqueda de datos de prueba en el contexto de las pruebas evolutivas. En particular, hemos propuesto una medida de distancia para computar la distancia de ramas en presencia del operador instanceof en programas Java. También hemos propuesto dos operadores de mutación que modifican las soluciones candidatas basadas en la medida de distancia definida. Definición de una nueva medida de complejidad llamada ``Branch Coverage Expectation'': En este trabajo nos enfrentamos a la complejidad de pruebas desde un punto de vista original: un programa es más complejo si es más difícil de probar de forma automática. Consecuentemente, definimos la ``Branch Coverage Expectation'' para proporcionar conocimiento sobre la dificultad de probar programas. La fundación de esta medida se basa en el modelo de Markov del programa. El modelo de Markov proporciona fundamentos teóricos. El análisis de esta medida indica que está más correlacionada con la cobertura de rama que las otras medidas de código estáticas. Esto significa que esto es un buen modo de estimar la dificultad de probar un programa. Predicción teórica del número de casos de prueba necesarios para cubrir un porcentaje concreto de un programa: Nuestro modelo de Markov del programa puede ser usado para proporcionar una estimación del número de casos de prueba necesarios para cubrir un porcentaje concreto del programa. Hemos comparado nuestra predicción teórica con la media de las ejecuciones reales de un generador de datos de prueba. Este modelo puede ayudar a predecir la evolución de la fase de pruebas, la cual consecuentemente puede ahorrar tiempo y coste del proyecto completo. Esta predicción teórica podría ser también muy útil para determinar el porcentaje del programa cubierto dados un número de casos de prueba. Propuesta de enfoques para resolver el problema de generación de datos de prueba multi-objetivo: En ese capítulo estudiamos el problema de la generación multi-objetivo con el fin de analizar el rendimiento de un enfoque directo multi-objetivo frente a la aplicación de un algoritmo mono-objetivo seguido de una selección de casos de prueba. Hemos evaluado cuatro algoritmos multi-objetivo (MOCell, NSGA-II, SPEA2, y PAES) y dos algoritmos mono-objetivo (GA y ES), y dos algoritmos aleatorios. En términos de convergencia hacía el frente de Pareto óptimo, GA y MOCell han sido los mejores resolutores en nuestra comparación. Queremos destacar que el enfoque mono-objetivo, donde se ataca cada rama por separado, es más efectivo cuando el programa tiene un grado de anidamiento alto. Comparativa de diferentes estrategias de priorización en líneas de productos y árboles de clasificación: En el contexto de pruebas funcionales hemos tratado el tema de la priorización de casos de prueba con dos representaciones diferentes, modelos de características que representan líneas de productos software y árboles de clasificación. Hemos comparado cinco enfoques relativos al método de clasificación con árboles y dos relativos a líneas de productos, cuatro de ellos propuestos por nosotros. Los resultados nos indican que las propuestas para ambas representaciones basadas en un algoritmo genético son mejores que el resto en la mayoría de escenarios experimentales, es la mejor opción cuando tenemos restricciones de tiempo o coste. Definición de la extensión del método de clasificación con árbol para la generación de secuencias de pruebas: Hemos definido formalmente esta extensión para la generación de secuencias de pruebas que puede ser útil para la industria y para la comunidad investigadora. Sus beneficios son claros ya que indudablemente el coste de situar el artefacto bajo pruebas en el siguiente estado no es necesario, a la vez que reducimos significativamente el tamaño de la secuencia utilizando técnicas metaheurísticas. Particularmente nuestra propuesta basada en colonias de hormigas es el mejor algoritmo de la comparativa, siendo el único algoritmo que alcanza la cobertura máxima para todos los modelos y tipos de cobertura. Exploración del efecto de diferentes estrategias de seeding en el cálculo de frentes de Pareto óptimos en líneas de productos: Estudiamos el comportamiento de algoritmos clásicos multi-objetivo evolutivos aplicados a las pruebas por pares de líneas de productos. El grupo de algoritmos fue seleccionado para cubrir una amplia y diversa gama de técnicas. Nuestra evaluación indica claramente que las estrategias de seeding ayudan al proceso de búsqueda de forma determinante. Cuanta más información se disponga para crear esta población inicial, mejores serán los resultados obtenidos. Además, gracias al uso de técnicas multi-objetivo podemos proporcionar un conjunto de pruebas adecuado mayor o menor, en resumen, que mejor se adapte a sus restricciones económicas o tecnológicas. Propuesta de técnica exacta para la computación del frente de Pareto óptimo en líneas de productos software: Hemos propuesto un enfoque exacto para este cálculo en el caso multi-objetivo con cobertura paiwise. Definimos un programa lineal 0-1 y un algoritmo basado en resolutores SAT para obtener el frente de Pareto verdadero. La evaluación de los resultados nos indica que, a pesar de ser un fantástico método para el cálculo de soluciones óptimas, tiene el inconveniente de la escalabilidad, ya que para modelos grandes el tiempo de ejecución sube considerablemente. Tras realizar un estudio de correlaciones, confirmamos nuestras sospechas, existe una alta correlación entre el tiempo de ejecución y el número de productos denotado por el modelo de características del programa

    Estimating the effort in the early stages of software development.

    Get PDF
    Estimates of the costs involved in the development of a software product and the likely risk are two of the main components associated with the evaluation of software projects and their approval for development. They are essential before the development starts, since the investment early in software development determines the overall cost of the system. When making these estimates, however, the unknown obscures the known and high uncertainty is embedded in the process. This is the essence of the estimator's dilemma and the concerns of this thesis. This thesis offers an Effort Estimation Model (EEM), a support system to assist the process of project evaluation early in the development, when the project is about to start. The estimates are based on preliminary data and on the judgement of the estimators. They are developed for the early stages of software building in which the requirements are defined and the gross design of the software product is specified. From these estimates only coarse estimates of the total development effort are feasible. These coarse estimates are updated when uncertainty is reduced. The basic element common to all frameworks for software building is the activity. Thus the EEM uses a knowledge-base which includes decomposition of the software development process into the activity level. Components which contribute to the effort associated with the activities implemented early in the development process are identified. They are the size metrics used by the EEM. The data incorporated in the knowledge-base for each activity, and the rules for the assessment of the complexity and risk perceived in the development, allow the estimation process to take place. They form the infrastructure for a 'process model' for effort estimating. The process of estimating the effort and of developing the software are linked. Assumptions taken throughout the process are recorded and assist in understanding deviations between estimates and actual effort and enable the incorporation of a feedback mechanism into the process of software development. These estimates support the decision process associated with the overall management of software development, they facilitate management involvement and are thus considered as critical success factors for the management of software projects

    Lessons to be learned by comparing integrated fisheries stock assessment models (SAMs) with integrated population models (IPMs)

    Get PDF
    AEP was partially funded by the Cooperative Institute for Climate, Ocean, & Ecosystem Studies (CICOES) under NOAA Cooperative Agreement NA15OAR4320063, Contribution No. 2023-1331.Integrated fisheries stock assessment models (SAMs) and integrated population models (IPMs) are used in biological and ecological systems to estimate abundance and demographic rates. The approaches are fundamentally very similar, but historically have been considered as separate endeavors, resulting in a loss of shared vision, practice and progress. We review the two approaches to identify similarities and differences, with a view to identifying key lessons that would benefit more generally the overarching topic of population ecology. We present a case study for each of SAM (snapper from the west coast of New Zealand) and IPM (woodchat shrikes from Germany) to highlight differences and similarities. The key differences between SAMs and IPMs appear to be the objectives and parameter estimates required to meet these objectives, the size and spatial scale of the populations, and the differing availability of various types of data. In addition, up to now, typical SAMs have been applied in aquatic habitats, while most IPMs stem from terrestrial habitats. SAMs generally aim to assess the level of sustainable exploitation of fish populations, so absolute abundance or biomass must be estimated, although some estimate only relative trends. Relative abundance is often sufficient to understand population dynamics and inform conservation actions, which is the main objective of IPMs. IPMs are often applied to small populations of conservation concern, where demographic uncertainty can be important, which is more conveniently implemented using Bayesian approaches. IPMs are typically applied at small to moderate spatial scales (1 to 104 km2), with the possibility of collecting detailed longitudinal individual data, whereas SAMs are typically applied to large, economically valuable fish stocks at very large spatial scales (104 to 106 km2) with limited possibility of collecting detailed individual data. There is a sense in which a SAM is more data- (or information-) hungry than an IPM because of its goal to estimate absolute biomass or abundance, and data at the individual level to inform demographic rates are more difficult to obtain in the (often marine) systems where most SAMs are applied. SAMs therefore require more 'tuning' or assumptions than IPMs, where the 'data speak for themselves', and consequently techniques such as data weighting and model evaluation are more nuanced for SAMs than for IPMs. SAMs would benefit from being fit to more disaggregated data to quantify spatial and individual variation and allow richer inference on demographic processes. IPMs would benefit from more attempts to estimate absolute abundance, for example by using unconditional models for capture-recapture data.Publisher PDFPeer reviewe

    Low-Pressure EGR in Spark-Ignition Engines: Combustion Effects, System Optimization, Transients & Estimation Algorithms

    Get PDF
    Low-displacement turbocharged spark-ignition engines have become the dominant choice of auto makers in the effort to meet the increasingly stringent emission regulations and fuel efficiency targets. Low-Pressure cooled Exhaust Gas Recirculation introduces important efficiency benefits and complements the shortcomings of highly boosted engines. The main drawback of these configurations is the long air-path which may cause over-dilution limitations during transient operation. The pulsating exhaust environment and the low available pressure differential to drive the recirculation impose additional challenges with respect to feed-forward EGR estimation accuracy. For these reasons, these systems are currently implemented through calibration with less-than-optimum EGR dilution in order to ensure stable operation under all conditions. However, this technique introduces efficiency penalties. Aiming to exploit the full potential of this technology, the goal is to address these challenges and allow operation with near-optimum EGR dilution. This study is focused on three major areas regarding the implementation of Low-Pressure EGR systems: Combustion effects, benefits and constraints System optimization and transient operation Estimation and adaptation Results from system optimization show that fuel efficiency benefits range from 2% – 3% over drive cycles through pumping and heat loss reduction, and up to 16% or more at higher loads through knock mitigation and fuel enrichment elimination. Soot emissions are also significantly reduced with cooled EGR. Regarding the transient challenges, a methodology that correlates experimental data with simulation results is developed to identify over-dilution limitations related to the engine’s dilution tolerance. Different strategies are proposed to mitigate these issues, including a Neural Network-actuated VVT that controls the internal residual and increases the over-dilution tolerance by 3% of absolute EGR. Physics-based estimation algorithms are also developed, including an exhaust pressure/temperature model which is validated through real-time transient experiments and eliminates the need for exhaust sensors. Furthermore, the installation of an intake oxygen sensor is investigated and an adaptation algorithm based on an Extended Kalman Filter is created. This algorithm delivers short-term and long-term corrections to feed-forward EGR models achieving a final estimation error of less than 1%. The combination of the proposed methodologies, strategies and algorithms allows the implementation of near-optimum EGR dilution and translates to fuel efficiency benefits ranging from 1% at low-load up to 10% at high-load operation over the current state-of-the-art

    Big Code Applications and Approaches

    Get PDF
    The availability of a huge amount of source code from code archives and open-source projects opens up the possibility to merge machine learning, programming languages, and software engineering research fields. This area is often referred to as Big Code where programming languages are treated instead of natural languages while different features and patterns of code can be exploited to perform many useful tasks and build supportive tools. Among all the possible applications which can be developed within the area of Big Code, the work presented in this research thesis mainly focuses on two particular tasks: the Programming Language Identification (PLI) and the Software Defect Prediction (SDP) for source codes. Programming language identification is commonly needed in program comprehension and it is usually performed directly by developers. However, when it comes at big scales, such as in widely used archives (GitHub, Software Heritage), automation of this task is desirable. To accomplish this aim, the problem is analyzed from different points of view (text and image-based learning approaches) and different models are created paying particular attention to their scalability. Software defect prediction is a fundamental step in software development for improving quality and assuring the reliability of software products. In the past, defects were searched by manual inspection or using automatic static and dynamic analyzers. Now, the automation of this task can be tackled using learning approaches that can speed up and improve related procedures. Here, two models have been built and analyzed to detect some of the commonest bugs and errors at different code granularity levels (file and method levels). Exploited data and models’ architectures are analyzed and described in detail. Quantitative and qualitative results are reported for both PLI and SDP tasks while differences and similarities concerning other related works are discussed
    corecore