11,123 research outputs found

    CS Circles: An In-Browser Python Course for Beginners

    Full text link
    Computer Science Circles is a free programming website for beginners that is designed to be fun, easy to use, and accessible to the broadest possible audience. We teach Python since it is simple yet powerful, and the course content is well-structured but written in plain language. The website has over one hundred exercises in thirty lesson pages, plus special features to help teachers support their students. It is available in both English and French. We discuss the philosophy behind the course and its design, we describe how it was implemented, and we give statistics on its use.Comment: To appear in SIGCSE 201

    PrivacyScore: Improving Privacy and Security via Crowd-Sourced Benchmarks of Websites

    Full text link
    Website owners make conscious and unconscious decisions that affect their users, potentially exposing them to privacy and security risks in the process. In this paper we introduce PrivacyScore, an automated website scanning portal that allows anyone to benchmark security and privacy features of multiple websites. In contrast to existing projects, the checks implemented in PrivacyScore cover a wider range of potential privacy and security issues. Furthermore, users can control the ranking and analysis methodology. Therefore, PrivacyScore can also be used by data protection authorities to perform regularly scheduled compliance checks. In the long term we hope that the transparency resulting from the published benchmarks creates an incentive for website owners to improve their sites. The public availability of a first version of PrivacyScore was announced at the ENISA Annual Privacy Forum in June 2017.Comment: 14 pages, 4 figures. A german version of this paper discussing the legal aspects of this system is available at arXiv:1705.0888

    Automating biomedical data science through tree-based pipeline optimization

    Full text link
    Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

    Shingle 2.0: generalising self-consistent and automated domain discretisation for multi-scale geophysical models

    Full text link
    The approaches taken to describe and develop spatial discretisations of the domains required for geophysical simulation models are commonly ad hoc, model or application specific and under-documented. This is particularly acute for simulation models that are flexible in their use of multi-scale, anisotropic, fully unstructured meshes where a relatively large number of heterogeneous parameters are required to constrain their full description. As a consequence, it can be difficult to reproduce simulations, ensure a provenance in model data handling and initialisation, and a challenge to conduct model intercomparisons rigorously. This paper takes a novel approach to spatial discretisation, considering it much like a numerical simulation model problem of its own. It introduces a generalised, extensible, self-documenting approach to carefully describe, and necessarily fully, the constraints over the heterogeneous parameter space that determine how a domain is spatially discretised. This additionally provides a method to accurately record these constraints, using high-level natural language based abstractions, that enables full accounts of provenance, sharing and distribution. Together with this description, a generalised consistent approach to unstructured mesh generation for geophysical models is developed, that is automated, robust and repeatable, quick-to-draft, rigorously verified and consistent to the source data throughout. This interprets the description above to execute a self-consistent spatial discretisation process, which is automatically validated to expected discrete characteristics and metrics.Comment: 18 pages, 10 figures, 1 table. Submitted for publication and under revie

    BEAT: An Open-Source Web-Based Open-Science Platform

    Get PDF
    With the increased interest in computational sciences, machine learning (ML), pattern recognition (PR) and big data, governmental agencies, academia and manufacturers are overwhelmed by the constant influx of new algorithms and techniques promising improved performance, generalization and robustness. Sadly, result reproducibility is often an overlooked feature accompanying original research publications, competitions and benchmark evaluations. The main reasons behind such a gap arise from natural complications in research and development in this area: the distribution of data may be a sensitive issue; software frameworks are difficult to install and maintain; Test protocols may involve a potentially large set of intricate steps which are difficult to handle. Given the raising complexity of research challenges and the constant increase in data volume, the conditions for achieving reproducible research in the domain are also increasingly difficult to meet. To bridge this gap, we built an open platform for research in computational sciences related to pattern recognition and machine learning, to help on the development, reproducibility and certification of results obtained in the field. By making use of such a system, academic, governmental or industrial organizations enable users to easily and socially develop processing toolchains, re-use data, algorithms, workflows and compare results from distinct algorithms and/or parameterizations with minimal effort. This article presents such a platform and discusses some of its key features, uses and limitations. We overview a currently operational prototype and provide design insights.Comment: References to papers published on the platform incorporate
    • …
    corecore