73 research outputs found
Validity concerns in software engineering research
Empirical studies that use software repository artifacts have become popular in the last decade due to the ready avail-ability of open source project archives. In this paper, we survey empirical studies in the last three years of ICSE and FSE proceedings, and categorize these studies in terms of open source projects vs. proprietary source projects and the diversity of subject programs used in these studies. Our survey has shown that almost half (49%) of recent empirical studies used solely open source projects. Existing studies either draw general conclusions from these results or explic-itly disclaim any conclusions that can extend beyond specific subject software. We conclude that researchers in empirical software engi-neering must consider the external validity concerns that arise from using only several well-known open source soft-ware projects, and that discussion of data source selection is an important discussion topic in software engineering re-search. Furthermore, we propose a community research in-frastructure for software repository benchmarks and sharing the empirical analysis results, in order to address external validity concerns and to raise the bar for empirical software engineering research that analyzes software artifacts
Detecting and Characterizing Semantic Inconsistencies in Ported Code
Adding similar features and bug fixes often requires porting program patches from reference implementations and adapting them to target implementations. Porting errors may result from faulty adaptations or inconsistent updates. This paper investigates (1) the types of porting errors found in practice, and (2) how to detect and characterize potential porting errors. Analyzing version histories, we define five categories of porting errors, including incorrect control- and data-flow, code redundancy, inconsistent identifier renamings, etc. Leveraging this categorization, we design a static control- and data-dependence analysis technique, SPA, to detect and characterize porting inconsistencies. Our evaluation on code from four open-source projects shows that SPA can detect porting inconsistencies with 65% to 73% precision and 90% recall, and identify inconsistency types with 58% to 63% precision and 92% to 100% recall. In a comparison with two existing error detection tools, SPA improves precision by 14 to 17 percentage points
Are Code Examples on an Online Q&A Forum Reliable?
Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs. This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow. To reduce manual assessment effort, we design ExampleCheck, an API usage mining framework that extracts patterns from over 380K Java repositories on GitHub and subsequently reports potential API usage violations in Stack Overflow posts. We analyze 217,818 Stack Overflow posts using ExampleCheck and find that 31% may have potential API usage violations that could produce unexpected behavior such as program crashes and resource leaks. Such API misuse is caused by three main reasons---missing control constructs, missing or incorrect order of API calls, and incorrect guard conditions. Even the posts that are accepted as correct answers or upvoted by other programmers are not necessarily more reliable than other posts in terms of API misuse. This study result calls for a new approach to augment Stack Overflow with alternative API usage details that are not typically shown in curated examples
RMT: Rule-based Metamorphic Testing for Autonomous Driving Models
Deep neural network models are widely used for perception and control in
autonomous driving. Recent work uses metamorphic testing but is limited to
using equality-based metamorphic relations and does not provide expressiveness
for defining inequality-based metamorphic relations. To encode real world
traffic rules, domain experts must be able to express higher order relations
e.g., a vehicle should decrease speed in certain ratio, when there is a vehicle
x meters ahead and compositionality e.g., a vehicle must have a larger
deceleration, when there is a vehicle ahead and when the weather is rainy and
proportional compounding effect to the test outcome. We design RMT, a
declarative rule-based metamorphic testing framework. It provides three
components that work in concert:(1) a domain specific language that enables an
expert to express higher-order, compositional metamorphic relations, (2)
pluggable transformation engines built on a variety of image and graphics
processing techniques, and (3) automated test generation that translates a
human-written rule to a corresponding executable, metamorphic relation and
synthesizes meaningful inputs.Our evaluation using three driving models shows
that RMT can generate meaningful test cases on which 89% of erroneous
predictions are found by enabling higher-order metamorphic relations.
Compositionality provides further aids for generating meaningful, synthesized
inputs-3012 new images are generated by compositional rules. These detected
erroneous predictions are manually examined and confirmed by six human judges
as meaningful traffic rule violations. RMT is the first to expand automated
testing capability for autonomous vehicles by enabling easy mapping of traffic
regulations to executable metamorphic relations and to demonstrate the benefits
of expressivity, customization, and pluggability
HALO : a multi-feature two-pass analysis to identify framework API evolution
Software frameworks and libraries are indispensable to todayâs software systems. Because of the fast development of open-source software in recent years, frameworks and libraries have became much versatile as any open-source system or part thereof can be used as a framework (or a library). Developer can reuse frameworks in innovative ways that are not expected by the providers of frameworks. Many frameworks are not well documented and very few owners provide specific documents to describe the changes between different releases of their frameworks. When they evolve, it is often time-consuming for developers to keep their dependent code up-to-date. Approaches have been proposed to lessen the impact of framework evolution on developers by identifying API evolution or change rules between two releases of a framework. However, the precision and recall of the change rules generated by these approaches depend on the features that they use, such as call-dependency relations or text similarity. If these features do not provide enough information, the approaches can miss correct change rules and compromise the precision and recall. For example, if a method in the old release of a framework is not called by other methods, we cannot find its change rule using call-dependency relations alone. Considering more features can overcome this limitation. Yet, because many features may also give contradictory information, integrating them is not straightforward. We thus introduce Halo, a novel hybrid approach that uses multiple features, including call dependency relations, method documentations, inheritance relations, and text similarity. Halo implements a two-pass analysis inspired by pattern classification problem. We implement Halo in Java and compare it with four state-of-the-art approaches. The comparison shows that, on average, the recall and the precision of Halo are 43% and 5% higher than that of other approaches
Association of handgrip strength with new-onset CKD in Korean adults according to gender
IntroductionHandgrip strength (HGS) is an indicator of many diseases such as pneumonia, cardiovascular disease and cancer. HGS can also predict renal function in chronic kidney disease (CKD) patients, but the value of HGS as a predictor of new-onset CKD is unknown.Methods173,195 subjects were recruited from a nationwide cohort and were followed for 4.1 âyears. After exclusions, 35,757 participants remained in the final study, and CKD developed in 1063 individuals during the follow-up period. Lifestyle, anthropometric and laboratory data were evaluated in relation to the risk of CKD.ResultsThe participants were subdivided into quartiles according to relative handgrip strength (RGS). Multivariate Cox regression demonstrated that RGS was inversely associated with incident CKD. Compared with the lowest quartile, the hazard ratios (HRs) [95% confidence intervals (CIs)] for incident CKD for the highest quartile (Q4) was 0.55 (0.34â0.88) after adjusting for covariates in men and 0.51 (0.31â0.85) in women. The incidence of CKD decreased as RGS increased. These negative associations were more significant in men than in women. The receiver operating characteristic (ROC) curve showed that baseline RGS had predictive power for new-onset CKD. Area under the curve (AUC) (95% CIs) was 0.739 (0.707â0.770) in men and 0.765 (0.729â0.801) in women.ConclusionThis is the novel study demonstrating that RGS is associated with incident CKD in both men and women. The relationship between RGS and incident CKD is more significant in women than in men. RGS can be used in clinical practice to evaluate renal prognosis. Regular measurement of handgrip strength is essential to CKD detection
Model refactoring by example: A multiâobjective search based software engineering approach
Declarative rules are frequently used in model refactoring in order to detect refactoring opportunities and to apply the appropriate ones. However, a large number of rules is required to obtain a complete specification of refactoring opportunities. Companies usually have accumulated examples of refactorings from past maintenance experiences. Based on these observations, we consider the model refactoring problem as a multi objective problem by suggesting refactoring sequences that aim to maximize both structural and textual similarity between a given model (the model to be refactored) and a set of poorly designed models in the base of examples (models that have undergone some refactorings) and minimize the structural similarity between a given model and a set of wellâdesigned models in the base of examples (models that do not need any refactoring). To this end, we use the Nonâdominated Sorting Genetic Algorithm (NSGAâII) to find a set of representative Pareto optimal solutions that present the best tradeâoff between structural and textual similarities of models. The validation results, based on 8 real world models taken from openâsource projects, confirm the effectiveness of our approach, yielding refactoring recommendations with an average correctness of over 80%. In addition, our approach outperforms 5 of the stateâofâtheâart refactoring approaches.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/143783/1/smr1916.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/143783/2/smr1916_am.pd
- âŠ