519 research outputs found

    Supporting data-driven software development life-cycles with bug bounty programmes

    Get PDF
    A growing number of organisations are utilising the skills of a global base of white-hat hackers in order to identify pre- and post-deployment vulnerabilities. Despite the widespread adoption of bug bounty programmes, there remain many uncertainties regarding the efficacy of this relatively novel security activity, especially when considering their adoption alongside existing software development lifecycles. This dissertation explores how bug bounty programmes can be used to support data-driven software development lifecycles. To achieve this outcome, the dissertation presents four distinct contributions. The first contribution concerns the usage of Crowdsourced Vulnerability Discovery (CVD) (of which bug bounty programmes are a part) within organisations. This includes the presentation of expert opinion pertaining to the benefits and shortcomings of existing approaches, and identification of the extent to which CVD programmes are used in software development lifecycles. The second contribution explores the benefits and drawbacks of hosting a programme on a bug bounty platform (a centralised repository of programmes operated by a third party). Empirical analysis of operating characteristics helps address concerns around the long-term viability of programme operation, and allows for a comparison to be made between the cost of expanding a security team and the cost of running a programme. The third contribution examines the extent to which participating in the search for vulnerabilities is a viable long-term strategy for hackers based on bug bounty platforms. The results demonstrate that participation is infeasible, even on a short-term basis, for significant numbers of hackers, highlighting the shortcomings of the current approach used by platforms. Building on the first three, the fourth contribution explores CVD programme policies, and the extent to which pertinent information, particularly in reference to legal constraints, is communicated to hackers. A systematic review reveals the commonplace elements that form current policy documents, enabling organisations to identify gaps within their own programme policies and form policies that are consistent with peers

    Accounting for variance and hyperparameter optimization in machine learning benchmarks

    Full text link
    La récente révolution de l'apprentissage automatique s'est fortement appuyée sur l'utilisation de bancs de test standardisés. Ces derniers sont au centre de la méthodologie scientifique en apprentissage automatique, fournissant des cibles et mesures indéniables des améliorations des algorithmes d'apprentissage. Ils ne garantissent cependant pas la validité des résultats ce qui implique que certaines conclusions scientifiques sur les avancées en intelligence artificielle peuvent s'avérer erronées. Nous abordons cette question dans cette thèse en soulevant d'abord la problématique (Chapitre 5), que nous étudions ensuite plus en profondeur pour apporter des solutions (Chapitre 6) et finalement developpons un nouvel outil afin d'amélioration la méthodologie des chercheurs (Chapitre 7). Dans le premier article, chapitre 5, nous démontrons la problématique de la reproductibilité pour des bancs de test stables et consensuels, impliquant que ces problèmes sont endémiques aussi à de grands ensembles d'applications en apprentissage automatique possiblement moins stable et moins consensuels. Dans cet article, nous mettons en évidence l'impact important de la stochasticité des bancs de test, et ce même pour les plus stables tels que la classification d'images. Nous soutenons d'après ces résultats que les solutions doivent tenir compte de cette stochasticité pour améliorer la reproductibilité des bancs de test. Dans le deuxième article, chapitre 6, nous étudions les différentes sources de variation typiques aux bancs de test en apprentissage automatique, mesurons l'effet de ces variations sur les méthodes de comparaison d'algorithmes et fournissons des recommandations sur la base de nos résultats. Une contribution importante de ce travail est la mesure de la fiabilité d'estimateurs peu coûteux à calculer mais biaisés servant à estimer la performance moyenne des algorithmes. Tel qu'expliqué dans l'article, un estimateur idéal implique plusieurs exécution d'optimisation d'hyperparamètres ce qui le rend trop coûteux à calculer. La plupart des chercheurs doivent donc recourir à l'alternative biaisée, mais nous ne savions pas jusqu'à présent la magnitude de la dégradation de cet estimateur. Sur la base de nos résultats, nous fournissons des recommandations pour la comparison d'algorithmes sur des bancs de test avec des budgets de calculs limités. Premièrement, les sources de variations devraient être randomisé autant que possible. Deuxièmement, la randomization devrait inclure le partitionnement aléatoire des données pour les ensembles d'entraînement, de validation et de test, qui s'avère être la plus importante des sources de variance. Troisièmement, des tests statistiques tel que la version du Mann-Withney U-test présenté dans notre article devrait être utilisé plutôt que des comparisons sur la simple base de moyennes afin de prendre en considération l'incertitude des mesures de performance. Dans le chapitre 7, nous présentons un cadriciel d'optimisation d'hyperparamètres développé avec principal objectif de favoriser les bonnes pratiques d'optimisation des hyperparamètres. Le cadriciel est conçu de façon à privilégier une interface simple et intuitive adaptée aux habitudes de travail des chercheurs en apprentissage automatique. Il inclut un nouveau système de versionnage d'expériences afin d'aider les chercheurs à organiser leurs itérations expérimentales et tirer profit des résultats antérieurs pour augmenter l'efficacité de l'optimisation des hyperparamètres. L'optimisation des hyperparamètres joue un rôle important dans les bancs de test, les hyperparamètres étant un facteur confondant significatif. Fournir aux chercheurs un instrument afin de bien contrôler ces facteurs confondants est complémentaire aux recommandations pour tenir compte des sources de variation dans le chapitre 6. Nos recommendations et l'outil pour l'optimisation d'hyperparametre offre une base solide pour une méthodologie robuste et fiable.The recent revolution in machine learning has been strongly based on the use of standardized benchmarks. Providing clear target metrics and undeniable measures of improvements of learning algorithms, they are at the center of the scientific methodology in machine learning. They do not ensure validity of results however, therefore some scientific conclusions based on flawed methodology may prove to be wrong. In this thesis we address this question by first raising the issue (Chapter 5), then we study it to find solutions and recommendations (Chapter 6) and build tools to help improve the methodology of researchers (Chapter 7). In first article, Chapter 5, we demonstrate the issue of reproducibility in stable and consensual benchmarks, implying that these issues are endemic to a large ensemble of machine learning applications that are possibly less stable or less consensual. We raise awareness of the important impact of stochasticity even in stable image classification tasks and contend that solutions for reproducible benchmarks should account for this stochasticity. In second article, Chapter 6, we study the different sources of variation that are typical in machine learning benchmarks, measure their effect on comparison methods to benchmark algorithms and provide recommendations based on our results. One important contribution of this work is that we measure the reliability of a cheaper but biased estimator for the average performance of algorithms. As explained in the article, an ideal estimator involving multiple rounds of hyperparameter optimization is too computationally expensive. Most researchers must resort to use the biased alternative, but it has been unknown until now how serious a degradation of the quality of estimation this leads to. Our investigations provides guidelines for benchmarks on practical budgets. First, as many sources of variations as possible should be randomized. Second, the partitioning of data in training, validation and test sets should be randomized as well, since this is the most important source of variation. Finally, statistical tests should be used instead of ad-hoc average comparisons so that the uncertainty of performance estimation can be accounted for when comparing machine learning algorithms. In Chapter 7, we present a framework for hyperparameter optimization that has been developed with the main goal of encouraging best practices for hyperparameter optimization. The framework is designed to favor a simple and intuitive interface adapted to the workflow of machine learning researchers. It includes a new version control system for experiments to help researchers organize their rounds of experimentations and leverage prior results for more efficient hyperparameter optimization. Hyperparameter optimization plays an important role in benchmarking, with the effect of hyperparameters being a serious confounding factor. Providing an instrument for researchers to properly control this confounding factor is complementary to our guidelines to account for sources of variation in Chapter 7. Our recommendations together with our tool for hyperparameter optimization provides a solid basis for a reliable methodology in machine learning benchmarks

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse

    Proceedings of the Paris Open Science European Conference

    Get PDF
    For more than twenty years, the international research community has affirmed its support for open and collaborative practices that improve the quality, transparency, reproducibility and inclusiveness of science. In France, this orientation has been reflected in the adoption of two National Plans for Open Science, in 2018 and 2021. In this context and on the occasion of the French Presidency of the Council of the European Union, France organised the Open Science European Conference (OSEC) on 4 and 5 February 2022. This conference on the transformation of the research and innovation ecosystem in Europe was an opportunity to address in particular transparency in health research, the future of scientific publishing and the opening of codes and software produced in a scientific context, but also the necessary transformations of research assessment, summarised in the Paris Call presented during the event and calling for the creation of a coalition of actors committed to reforming the current system. This international event was organised was organised by the French Académie des sciences, the Ministry of Higher Education and Research, the French National Center for Scientific Research (CNRS), the National Institute of Health and Medical Research (Inserm), the High Council for Evaluation of Research and Higher Education (Hcéres), the National Research Agency (ANR), the University of Lorraine and the University of Nantes

    Reproducibility of Studies on Text Mining for Citation Screening in Systematic Reviews: Evaluation and Checklist

    Get PDF
    CONTEXT:Independent validation of published scientific results through study replication is a pre-condition for accepting the validity of such results. In computation research, full replication is often unrealistic for independent results validation, therefore, study reproduction has been justified as the minimum acceptable standard to evaluate the validity of scientific claims. The application of text mining techniques to citation screening in the context of systematic literature reviews is a relatively young and growing computational field with high relevance for software engineering, medical research and other fields. However, there is little work so far on reproduction studies in the field.OBJECTIVE:In this paper, we investigate the reproducibility of studies in this area based on information contained in published articles and we propose reporting guidelines that could improve reproducibility.METHODS:The study was approached in two ways. Initially we attempted to reproduce results from six studies, which were based on the same raw dataset. Then, based on this experience, we identified steps considered essential to successful reproduction of text mining experiments and characterized them to measure how reproducible is a study given the information provided on these steps. 33 articles were systematically assessed for reproducibility using this approach.RESULTS:Our work revealed that it is currently difficult if not impossible to independently reproduce the results published in any of the studies investigated. The lack of information about the datasets used limits reproducibility of about 80% of the studies assessed. Also, information about the machine learning algorithms is inadequate in about 27% of the papers. On the plus side, the third party software tools used are mostly free and available.CONCLUSIONS:The reproducibility potential of most of the studies can be significantly improved if more attention is paid to information provided on the datasets used, how they were partitioned and utilized, and how any randomization was controlled. We introduce a checklist of information that needs to be provided in order to ensure that a published study can be reproduced

    Meta-analysis for families of experiments in software engineering: a systematic review and reproducibility and validity assessment

    Get PDF
    ContextPrevious studies have raised concerns about the analysis and meta-analysis of crossover experiments and we were aware of several families of experiments that used crossover designs and meta-analysis.ObjectiveTo identify families of experiments that used meta-analysis, to investigate their methods for effect size construction and aggregation, and to assess the reproducibility and validity of their results.MethodWe performed a systematic review (SR) of papers reporting families of experiments in high quality software engineering journals, that attempted to apply meta-analysis. We attempted to reproduce the reported meta-analysis results using the descriptive statistics and also investigated the validity of the meta-analysis process.ResultsOut of 13 identified primary studies, we reproduced only five. Seven studies could not be reproduced. One study which was correctly analyzed could not be reproduced due to rounding errors. When we were unable to reproduce results, we provide revised meta-analysis results. To support reproducibility of analyses presented in our paper, it is complemented by the reproducer R package.ConclusionsMeta-analysis is not well understood by software engineering researchers. To support novice researchers, we present recommendations for reporting and meta-analyzing families of experiments and a detailed example of how to analyze a family of 4-group crossover experiments
    corecore