5,552 research outputs found

    Evaluating an automated procedure of machine learning parameter tuning for software effort estimation

    Get PDF
    Software effort estimation requires accurate prediction models. Machine learning algorithms have been used to create more accurate estimation models. However, these algorithms are sensitive to factors such as the choice of hyper-parameters. To reduce this sensitivity, automated approaches for hyper-parameter tuning have been recently investigated. There is a need for further research on the effectiveness of such approaches in the context of software effort estimation. These evaluations could help understand which hyper-parameter settings can be adjusted to improve model accuracy, and in which specific contexts tuning can benefit model performance. The goal of this work is to develop an automated procedure for machine learning hyper-parameter tuning in the context of software effort estimation. The automated procedure builds and evaluates software effort estimation models to determine the most accurate evaluation schemes. The methodology followed in this work consists of first performing a systematic mapping study to characterize existing hyper-parameter tuning approaches in software effort estimation, developing the procedure to automate the evaluation of hyper-parameter tuning, and conducting controlled quasi experiments to evaluate the automated procedure. From the systematic literature mapping we discovered that effort estimation literature has favored the use of grid search. The results we obtained in our quasi experiments demonstrated that fast, less exhaustive tuners were viable in place of grid search. These results indicate that randomly evaluating 60 hyper-parameters can be as good as grid search, and that multiple state-of-the-art tuners were only more effective than this random search in 6% of the evaluated dataset-model combinations. We endorse random search, genetic algorithms, flash, differential evolution, and tabu and harmony search as effective tuners.Los algoritmos de aprendizaje automático han sido utilizados para crear modelos con mayor precisión para la estimación del esfuerzo del desarrollo de software. Sin embargo, estos algoritmos son sensibles a factores, incluyendo la selección de hiper parámetros. Para reducir esto, se han investigado recientemente algoritmos de ajuste automático de hiper parámetros. Es necesario evaluar la efectividad de estos algoritmos en el contexto de estimación de esfuerzo. Estas evaluaciones podrían ayudar a entender qué hiper parámetros se pueden ajustar para mejorar los modelos, y en qué contextos esto ayuda el rendimiento de los modelos. El objetivo de este trabajo es desarrollar un procedimiento automatizado para el ajuste de hiper parámetros para algoritmos de aprendizaje automático aplicados a la estimación de esfuerzo del desarrollo de software. La metodología seguida en este trabajo consta de realizar un estudio de mapeo sistemático para caracterizar los algoritmos de ajuste existentes, desarrollar el procedimiento automatizado, y conducir cuasi experimentos controlados para evaluar este procedimiento. Mediante el mapeo sistemático descubrimos que la literatura en estimación de esfuerzo ha favorecido el uso de la búsqueda en cuadrícula. Los resultados obtenidos en nuestros cuasi experimentos demostraron que algoritmos de estimación no-exhaustivos son viables para la estimación de esfuerzo. Estos resultados indican que evaluar aleatoriamente 60 hiper parámetros puede ser tan efectivo como la búsqueda en cuadrícula, y que muchos de los métodos usados en el estado del arte son solo más efectivos que esta búsqueda aleatoria en 6% de los escenarios. Recomendamos el uso de la búsqueda aleatoria, algoritmos genéticos y similares, y la búsqueda tabú y harmónica.Escuela de Ciencias de la Computación e InformáticaCentro de Investigaciones en Tecnologías de la Información y ComunicaciónUCR::Vicerrectoría de Investigación::Sistema de Estudios de Posgrado::Ingeniería::Maestría Académica en Computación e Informátic

    Nuclear halo of a 177 MeV proton beam in water: theory, measurement and parameterization

    Full text link
    The dose distribution of a monoenergetic pencil beam in water consists of an electromagnetic "core", a "halo" from charged nuclear secondaries, and a much larger "aura" from neutral secondaries. These regions overlap, but each has distinct spatial characteristics. We have measured the core/halo using a 177MeV test beam offset in a water tank. The beam monitor was a fluence calibrated plane parallel ionization chamber (IC) and the field chamber, a dose calibrated Exradin T1, so the dose measurements are absolute (MeV/g/p). We performed depth-dose scans at ten displacements from the beam axis ranging from 0 to 10cm. The dose spans five orders of magnitude, and the transition from halo to aura is clearly visible. We have performed model-dependent (MD) and model-independent (MI) fits to the data. The MD fit separates the dose into core, elastic/inelastic nuclear, nonelastic nuclear and aura terms, and achieves a global rms measurement/fit ratio of 15%. The MI fit uses cubic splines and the same ratio is 9%. We review the literature, in particular the use of Pedroni's parametrization of the core/halo. Several papers improve on his Gaussian transverse distribution of the halo, but all retain his T(w), the radial integral of the depth-dose multiplying both the core and halo terms and motivating measurements with large "Bragg peak chambers" (BPCs). We argue that this use of T(w), which by its definition includes energy deposition by nuclear secondaries, is incorrect. T(w) should be replaced in the core term, and in at least part of the halo, by a purely electromagnetic mass stopping power. BPC measurements are unnecessary, and irrelevant to parameterizing the pencil beam.Comment: 55 pages, 4 tables, 29 figure

    Continuum modeling of active nematics via data-driven equation discovery

    Get PDF
    Data-driven modeling seeks to extract a parsimonious model for a physical system directly from measurement data. One of the most interpretable of these methods is Sparse Identification of Nonlinear Dynamics (SINDy), which selects a relatively sparse linear combination of model terms from a large set of (possibly nonlinear) candidates via optimization. This technique has shown promise for synthetic data generated by numerical simulations but the application of the techniques to real data is less developed. This dissertation applies SINDy to video data from a bio-inspired system of mictrotubule-motor protein assemblies, an example of nonequilibrium dynamics that has posed a significant modelling challenge for more than a decade. In particular, we constrain SINDy to discover a partial differential equation (PDE) model that approximates the time evolution of microtubule orientation. The discovered model is relatively simple but reproduces many of the characteristics of the experimental data. The properties of the discovered PDE model are explored through stability analysis and numerical simulation; it is then compared to previously proposed models in the literature. Chapter 1 provides an introduction and motivation for pursuing a data driven modeling approach for active nematic systems by introducing the Sparse Identification of Nonlinear Dynamics (SINDy) modeling procedure and active nematic systems. Chapter 2 lays the foundation for modeling of active nematics to better understand the model space that is searched. Chapter 3 gives some preliminary considerations for using the SINDy algorithm and proposes several approaches to mitigate common errors. Chapter 4 treats the example problem of rediscovering a governing partial differential equation for active nematics from simulated data including some of the specific challenges that arise for discovery even in the absence of noise. Chapter 5 details the procedure for extracting data from experimental observations for use with the SINDy procedure and details tests to validate the accuracy of the extracted data. Chapter 6 presents the active nematic model extracted from experimental data via SINDy, compares its properties with previously proposed models, and provides numerical results of its simulation. Finally, Chapter 7 presents conclusions from the work and provides future directions for both active nematic systems and data-driven modeling in related systems

    Solving optimisation problems in metal forming using FEM: A metamodel based optimisation algorithm

    Get PDF
    During the last decades, Finite Element (FEM) simulations of metal forming processes have\ud become important tools for designing feasible production processes. In more recent years,\ud several authors recognised the potential of coupling FEM simulations to mathematical opti-\ud misation algorithms to design optimal metal forming processes instead of only feasible ones.\ud This report describes the selection, development and implementation of an optimisa-\ud tion algorithm for solving optimisation problems for metal forming processes using time\ud consuming FEM simulations. A Sequential Approximate Optimisation algorithm is pro-\ud posed, which incorporates metamodelling techniques and sequential improvement strate-\ud gies for enhancing the e±ciency of the algorithm. The algorithm has been implemented in\ud MATLABr and can be used in combination with any Finite Element code for simulating\ud metal forming processes.\ud The good applicability of the proposed optimisation algorithm within the ¯eld of metal\ud forming has been demonstrated by applying it to optimise the internal pressure and ax-\ud ial feeding load paths for manufacturing a simple hydroformed product. Resulting was\ud a constantly distributed wall thickness throughout the ¯nal product. Subsequently, the\ud algorithm was compared to other optimisation algorithms for optimising metal forming\ud by applying it to two more complicated forging examples. In both cases, the geometry of\ud the preform was optimised. For one forging application, the algorithm managed to solve\ud a folding defect. For the other application both the folding susceptibility and the energy\ud consumption required for forging the part were reduced by 10% w.r.t. the forging process\ud proposed by the forging company. The algorithm proposed in this report yielded better\ud results than the optimisation algorithms it was compared to

    Safety Issues Of Red-light Running And Unprotected Left-turn At Signalized Intersections

    Get PDF
    Crashes categorized as running red light or left turning are most likely to occur at signalized intersections and resulted in substantial severe injuries and property damages. This dissertation mainly focused on these two types of vehicle crashes and the research methodology involved several perspectives. To examine the overall characteristics of red-light running and left-turning crashes, firstly, this study applied 1999-2001 Florida traffic crash data to investigate the accident propensity of three aspects of risk factors related to traffic environments, driver characteristics, and vehicle types. A quasi-induced exposure concept and statistical techniques including classification tree model and multiple logistic regression were used to perform this analysis. Secondly, the UCF driving simulator was applied to test the effect of a proposed new pavement marking countermeasure which purpose is to reduce the red-light running rate at signalized intersections. The simulation experiment results showed that the total red-light running rate with marking is significantly lower than that without marking. Moreover, deceleration rate of stopping drivers with marking for the higher speed limit are significantly less than those without marking. These findings are encouraging and suggesting that the pavement marking may result in safety enhancement as far as right-angle and rear-end traffic crashes at signalized intersections. Thirdly, geometric models to compute sight distances of unprotected left-turns were developed for different signalized intersection configurations including a straight approach leading to a straight one, a straight approach leading to a curved one, and a curved approach leading to a curved one. The models and related analyses can be used to layout intersection design or evaluate the sight distance problem of an existing intersection configuration to ensure safe left-turn maneuvers by drivers

    Data Mining

    Get PDF
    Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment

    Applied Mathematics and Computational Physics

    Get PDF
    As faster and more efficient numerical algorithms become available, the understanding of the physics and the mathematical foundation behind these new methods will play an increasingly important role. This Special Issue provides a platform for researchers from both academia and industry to present their novel computational methods that have engineering and physics applications
    • …
    corecore