73 research outputs found

    Validity concerns in software engineering research

    Full text link
    Empirical studies that use software repository artifacts have become popular in the last decade due to the ready avail-ability of open source project archives. In this paper, we survey empirical studies in the last three years of ICSE and FSE proceedings, and categorize these studies in terms of open source projects vs. proprietary source projects and the diversity of subject programs used in these studies. Our survey has shown that almost half (49%) of recent empirical studies used solely open source projects. Existing studies either draw general conclusions from these results or explic-itly disclaim any conclusions that can extend beyond specific subject software. We conclude that researchers in empirical software engi-neering must consider the external validity concerns that arise from using only several well-known open source soft-ware projects, and that discussion of data source selection is an important discussion topic in software engineering re-search. Furthermore, we propose a community research in-frastructure for software repository benchmarks and sharing the empirical analysis results, in order to address external validity concerns and to raise the bar for empirical software engineering research that analyzes software artifacts

    Are Code Examples on an Online Q&A Forum Reliable?

    Get PDF
    Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs. This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow. To reduce manual assessment effort, we design ExampleCheck, an API usage mining framework that extracts patterns from over 380K Java repositories on GitHub and subsequently reports potential API usage violations in Stack Overflow posts. We analyze 217,818 Stack Overflow posts using ExampleCheck and find that 31% may have potential API usage violations that could produce unexpected behavior such as program crashes and resource leaks. Such API misuse is caused by three main reasons---missing control constructs, missing or incorrect order of API calls, and incorrect guard conditions. Even the posts that are accepted as correct answers or upvoted by other programmers are not necessarily more reliable than other posts in terms of API misuse. This study result calls for a new approach to augment Stack Overflow with alternative API usage details that are not typically shown in curated examples

    Detecting and Characterizing Semantic Inconsistencies in Ported Code

    Get PDF
    Adding similar features and bug fixes often requires porting program patches from reference implementations and adapting them to target implementations. Porting errors may result from faulty adaptations or inconsistent updates. This paper investigates (1) the types of porting errors found in practice, and (2) how to detect and characterize potential porting errors. Analyzing version histories, we define five categories of porting errors, including incorrect control- and data-flow, code redundancy, inconsistent identifier renamings, etc. Leveraging this categorization, we design a static control- and data-dependence analysis technique, SPA, to detect and characterize porting inconsistencies. Our evaluation on code from four open-source projects shows that SPA can detect porting inconsistencies with 65% to 73% precision and 90% recall, and identify inconsistency types with 58% to 63% precision and 92% to 100% recall. In a comparison with two existing error detection tools, SPA improves precision by 14 to 17 percentage points

    RMT: Rule-based Metamorphic Testing for Autonomous Driving Models

    Full text link
    Deep neural network models are widely used for perception and control in autonomous driving. Recent work uses metamorphic testing but is limited to using equality-based metamorphic relations and does not provide expressiveness for defining inequality-based metamorphic relations. To encode real world traffic rules, domain experts must be able to express higher order relations e.g., a vehicle should decrease speed in certain ratio, when there is a vehicle x meters ahead and compositionality e.g., a vehicle must have a larger deceleration, when there is a vehicle ahead and when the weather is rainy and proportional compounding effect to the test outcome. We design RMT, a declarative rule-based metamorphic testing framework. It provides three components that work in concert:(1) a domain specific language that enables an expert to express higher-order, compositional metamorphic relations, (2) pluggable transformation engines built on a variety of image and graphics processing techniques, and (3) automated test generation that translates a human-written rule to a corresponding executable, metamorphic relation and synthesizes meaningful inputs.Our evaluation using three driving models shows that RMT can generate meaningful test cases on which 89% of erroneous predictions are found by enabling higher-order metamorphic relations. Compositionality provides further aids for generating meaningful, synthesized inputs-3012 new images are generated by compositional rules. These detected erroneous predictions are manually examined and confirmed by six human judges as meaningful traffic rule violations. RMT is the first to expand automated testing capability for autonomous vehicles by enabling easy mapping of traffic regulations to executable metamorphic relations and to demonstrate the benefits of expressivity, customization, and pluggability

    HALO : a multi-feature two-pass analysis to identify framework API evolution

    Get PDF
    Software frameworks and libraries are indispensable to today’s software systems. Because of the fast development of open-source software in recent years, frameworks and libraries have became much versatile as any open-source system or part thereof can be used as a framework (or a library). Developer can reuse frameworks in innovative ways that are not expected by the providers of frameworks. Many frameworks are not well documented and very few owners provide specific documents to describe the changes between different releases of their frameworks. When they evolve, it is often time-consuming for developers to keep their dependent code up-to-date. Approaches have been proposed to lessen the impact of framework evolution on developers by identifying API evolution or change rules between two releases of a framework. However, the precision and recall of the change rules generated by these approaches depend on the features that they use, such as call-dependency relations or text similarity. If these features do not provide enough information, the approaches can miss correct change rules and compromise the precision and recall. For example, if a method in the old release of a framework is not called by other methods, we cannot find its change rule using call-dependency relations alone. Considering more features can overcome this limitation. Yet, because many features may also give contradictory information, integrating them is not straightforward. We thus introduce Halo, a novel hybrid approach that uses multiple features, including call dependency relations, method documentations, inheritance relations, and text similarity. Halo implements a two-pass analysis inspired by pattern classification problem. We implement Halo in Java and compare it with four state-of-the-art approaches. The comparison shows that, on average, the recall and the precision of Halo are 43% and 5% higher than that of other approaches

    Association of handgrip strength with new-onset CKD in Korean adults according to gender

    Get PDF
    IntroductionHandgrip strength (HGS) is an indicator of many diseases such as pneumonia, cardiovascular disease and cancer. HGS can also predict renal function in chronic kidney disease (CKD) patients, but the value of HGS as a predictor of new-onset CKD is unknown.Methods173,195 subjects were recruited from a nationwide cohort and were followed for 4.1  years. After exclusions, 35,757 participants remained in the final study, and CKD developed in 1063 individuals during the follow-up period. Lifestyle, anthropometric and laboratory data were evaluated in relation to the risk of CKD.ResultsThe participants were subdivided into quartiles according to relative handgrip strength (RGS). Multivariate Cox regression demonstrated that RGS was inversely associated with incident CKD. Compared with the lowest quartile, the hazard ratios (HRs) [95% confidence intervals (CIs)] for incident CKD for the highest quartile (Q4) was 0.55 (0.34–0.88) after adjusting for covariates in men and 0.51 (0.31–0.85) in women. The incidence of CKD decreased as RGS increased. These negative associations were more significant in men than in women. The receiver operating characteristic (ROC) curve showed that baseline RGS had predictive power for new-onset CKD. Area under the curve (AUC) (95% CIs) was 0.739 (0.707–0.770) in men and 0.765 (0.729–0.801) in women.ConclusionThis is the novel study demonstrating that RGS is associated with incident CKD in both men and women. The relationship between RGS and incident CKD is more significant in women than in men. RGS can be used in clinical practice to evaluate renal prognosis. Regular measurement of handgrip strength is essential to CKD detection

    Model refactoring by example: A multi‐objective search based software engineering approach

    Full text link
    Declarative rules are frequently used in model refactoring in order to detect refactoring opportunities and to apply the appropriate ones. However, a large number of rules is required to obtain a complete specification of refactoring opportunities. Companies usually have accumulated examples of refactorings from past maintenance experiences. Based on these observations, we consider the model refactoring problem as a multi objective problem by suggesting refactoring sequences that aim to maximize both structural and textual similarity between a given model (the model to be refactored) and a set of poorly designed models in the base of examples (models that have undergone some refactorings) and minimize the structural similarity between a given model and a set of well‐designed models in the base of examples (models that do not need any refactoring). To this end, we use the Non‐dominated Sorting Genetic Algorithm (NSGA‐II) to find a set of representative Pareto optimal solutions that present the best trade‐off between structural and textual similarities of models. The validation results, based on 8 real world models taken from open‐source projects, confirm the effectiveness of our approach, yielding refactoring recommendations with an average correctness of over 80%. In addition, our approach outperforms 5 of the state‐of‐the‐art refactoring approaches.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/143783/1/smr1916.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/143783/2/smr1916_am.pd
    • 

    corecore