303,305 research outputs found

    Evolution of statistical analysis in empirical software engineering research: Current state and steps forward

    Full text link
    Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.Comment: journal submission, 34 pages, 8 figure

    Linear extractors for extracting randomness from noisy sources

    Get PDF
    Linear transformations have many applications in information theory, like data compression and error-correcting codes design. In this paper, we study the power of linear transformations in randomness extraction, namely linear extractors, as another important application. Comparing to most existing methods for randomness extraction, linear extractors (especially those constructed with sparse matrices) are computationally fast and can be simply implemented with hardware like FPGAs, which makes them very attractive in practical use. We mainly focus on simple, efficient and sparse constructions of linear extractors. Specifically, we demonstrate that random matrices can generate random bits very efficiently from a variety of noisy sources, including noisy coin sources, bit-fixing sources, noisy (hidden) Markov sources, as well as their mixtures. It shows that low-density random matrices have almost the same efficiency as high-density random matrices when the input sequence is long, which provides a way to simplify hardware/software implementation. Note that although we constructed matrices with randomness, they are deterministic (seedless) extractors - once we constructed them, the same construction can be used for any number of times without using any seeds. Another way to construct linear extractors is based on generator matrices of primitive BCH codes. This method is more explicit, but less practical due to its computational complexity and dimensional constraints

    Ontology-driven conceptual modeling: A'systematic literature mapping and review

    Get PDF
    All rights reserved. Ontology-driven conceptual modeling (ODCM) is still a relatively new research domain in the field of information systems and there is still much discussion on how the research in ODCM should be performed and what the focus of this research should be. Therefore, this article aims to critically survey the existing literature in order to assess the kind of research that has been performed over the years, analyze the nature of the research contributions and establish its current state of the art by positioning, evaluating and interpreting relevant research to date that is related to ODCM. To understand and identify any gaps and research opportunities, our literature study is composed of both a systematic mapping study and a systematic review study. The mapping study aims at structuring and classifying the area that is being investigated in order to give a general overview of the research that has been performed in the field. A review study on the other hand is a more thorough and rigorous inquiry and provides recommendations based on the strength of the found evidence. Our results indicate that there are several research gaps that should be addressed and we further composed several research opportunities that are possible areas for future research

    The Topology ToolKit

    Full text link
    This system paper presents the Topology ToolKit (TTK), a software platform designed for topological data analysis in scientific visualization. TTK provides a unified, generic, efficient, and robust implementation of key algorithms for the topological analysis of scalar data, including: critical points, integral lines, persistence diagrams, persistence curves, merge trees, contour trees, Morse-Smale complexes, fiber surfaces, continuous scatterplots, Jacobi sets, Reeb spaces, and more. TTK is easily accessible to end users due to a tight integration with ParaView. It is also easily accessible to developers through a variety of bindings (Python, VTK/C++) for fast prototyping or through direct, dependence-free, C++, to ease integration into pre-existing complex systems. While developing TTK, we faced several algorithmic and software engineering challenges, which we document in this paper. In particular, we present an algorithm for the construction of a discrete gradient that complies to the critical points extracted in the piecewise-linear setting. This algorithm guarantees a combinatorial consistency across the topological abstractions supported by TTK, and importantly, a unified implementation of topological data simplification for multi-scale exploration and analysis. We also present a cached triangulation data structure, that supports time efficient and generic traversals, which self-adjusts its memory usage on demand for input simplicial meshes and which implicitly emulates a triangulation for regular grids with no memory overhead. Finally, we describe an original software architecture, which guarantees memory efficient and direct accesses to TTK features, while still allowing for researchers powerful and easy bindings and extensions. TTK is open source (BSD license) and its code, online documentation and video tutorials are available on TTK's website

    Software Infrastructure for Natural Language Processing

    Full text link
    We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on top of which heterogeneous NLP processing modules may be evaluated and refined individually, or may be combined into larger application systems. GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE promotes reuse of component technology, permits specialisation and collaboration in large-scale projects, and allows for the comparison and evaluation of alternative technologies. The first release of GATE is now available - see http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page

    Simulation in manufacturing and business: A review

    Get PDF
    Copyright @ 2009 Elsevier B.V.This paper reports the results of a review of simulation applications published within peer-reviewed literature between 1997 and 2006 to provide an up-to-date picture of the role of simulation techniques within manufacturing and business. The review is characterised by three factors: wide coverage, broad scope of the simulation techniques, and a focus on real-world applications. A structured methodology was followed to narrow down the search from around 20,000 papers to 281. Results include interesting trends and patterns. For instance, although discrete event simulation is the most popular technique, it has lower stakeholder engagement than other techniques, such as system dynamics or gaming. This is highly correlated with modelling lead time and purpose. Considering application areas, modelling is mostly used in scheduling. Finally, this review shows an increasing interest in hybrid modelling as an approach to cope with complex enterprise-wide systems
    • …
    corecore