763 research outputs found
MuDelta: Delta-Oriented Mutation Testing at Commit Time
To effectively test program changes using mutation testing, one needs to use mutants that are relevant to the altered program behaviours. In view of this, we introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviours. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR Curve) AUC values with 0.63 and 0.32 precision and recall values. These predictions are significantly higher than random guesses, 0.20 (PR-Curve) AUC, 0.21 and 0.21 precision and recall, and subsequently lead to strong relevant tests that kill 45%more relevant mutants than randomly sampled mutants (either sampled from those residing on the changed component(s) or from the changed lines). Our results also show that MuDelta selects mutants with 27% higher fault revealing ability in fault introducing commits. Taken together, our results corroborate the conclusion that commit-based mutation testing is suitable and promising for evolving software
Molecular-level simulations of turbulence and Its decay
We provide the first demonstration that molecular-level methods based on gas kinetic theory and molecular chaos can simulate turbulence and its decay. The direct simulation Monte Carlo (DSMC) method, a molecular-level technique for simulating gas flows that resolves phenomena from molecular to hydrodynamic (continuum) length scales, is applied to simulate the Taylor-Green vortex flow. The DSMC simulations reproduce the Kolmogorov − 5 / 3 law and agree well with the turbulent kinetic energy and energy dissipation rate obtained from direct numerical simulation of the Navier-Stokes equations using a spectral method. This agreement provides strong evidence that molecular-level methods for gases can be used to investigate turbulent flows quantitatively
DSMC simulations of turbulent flows at moderate Reynolds numbers
The Direct Simulation Monte Carlo (DSMC) method has been used for more than 50 years to simulate rarefied gases. The advent of modern supercomputers has brought higher-density near-continuum flows within range. This in turn has revived the debate as to whether the Boltzmann equation, which assumes molecular chaos, can be used to simulate continuum flows when they become turbulent. In an effort to settle this debate, two canonical turbulent flows are examined, and the results are compared to available continuum theoretical and numerical results for the Navier-Stokes equations
Your professionalism is not my professionalism:congruence and variance in the views of medical students and faculty about professionalism
Abstract Background Medical professionalism is an essential aspect of medical education and practice worldwide and it must be adopted according to different social and cultural contexts. We examined the current congruence and variance in the perception of professionalism in undergraduate medical students and faculty members in one medical school in Saudi Arabia. Methods The target population was first year to final year medical students of College of Medicine, King Saud University. Out of a total of 1431 students at College of Medicine 750 students (52 %) participated in the study. Fifty faculty members from clinical and non-clinical departments of the College of Medicine were randomly selected for this study and all participated in the study. The respondents recorded their responses through the Bristol online survey system, using a bilingual (English and Arabic) version of the Dundee Polyprofessionalism Inventory I: Academic integrity, which has 34 items. Results There are 17 lapses (50 % of the total) in professional behaviour where none of the faculty recommend the ignore sanction while students recommended a variable ignore sanction in a range of 6–29 % for different behaviours. Students and faculty recommended similar sanctions for 5 lapses (14.7 % of the total) in professional behaviours. Furthermore, there is statistically significant two level difference between the sanctions approved by faculty and students in the recommended sanctions for 12 lapses (35 % of the total (p < 0.050). Conclusions These results raised concerns in relation to the students’ understanding of professionalism. It is therefore, important to enhance their learning around the attributes of medical professionalism
Efficient Testing of Deep Neural Networks via Decision Boundary Analysis
Deep learning plays a more and more important role in our daily life due to
its competitive performance in multiple industrial application domains. As the
core of DL-enabled systems, deep neural networks automatically learn knowledge
from carefully collected and organized training data to gain the ability to
predict the label of unseen data. Similar to the traditional software systems
that need to be comprehensively tested, DNNs also need to be carefully
evaluated to make sure the quality of the trained model meets the demand. In
practice, the de facto standard to assess the quality of DNNs in industry is to
check their performance (accuracy) on a collected set of labeled test data.
However, preparing such labeled data is often not easy partly because of the
huge labeling effort, i.e., data labeling is labor-intensive, especially with
the massive new incoming unlabeled data every day. Recent studies show that
test selection for DNN is a promising direction that tackles this issue by
selecting minimal representative data to label and using these data to assess
the model. However, it still requires human effort and cannot be automatic. In
this paper, we propose a novel technique, named Aries, that can estimate the
performance of DNNs on new unlabeled data using only the information obtained
from the original test data. The key insight behind our technique is that the
model should have similar prediction accuracy on the data which have similar
distances to the decision boundary. We performed a large-scale evaluation of
our technique on 13 types of data transformation methods. The results
demonstrate the usefulness of our technique that the estimated accuracy by
Aries is only 0.03% -- 2.60% (on average 0.61%) off the true accuracy. Besides,
Aries also outperforms the state-of-the-art selection-labeling-based methods in
most (96 out of 128) cases.Comment: 12 page
Active Code Learning: Benchmarking Sample-Efficient Training of Code Models
The costly human effort required to prepare the training data of machine
learning (ML) models hinders their practical development and usage in software
engineering (ML4Code), especially for those with limited budgets. Therefore,
efficiently training models of code with less human effort has become an
emergent problem. Active learning is such a technique to address this issue
that allows developers to train a model with reduced data while producing
models with desired performance, which has been well studied in computer vision
and natural language processing domains. Unfortunately, there is no such work
that explores the effectiveness of active learning for code models. In this
paper, we bridge this gap by building the first benchmark to study this
critical problem - active code learning. Specifically, we collect 11
acquisition functions~(which are used for data selection in active learning)
from existing works and adapt them for code-related tasks. Then, we conduct an
empirical study to check whether these acquisition functions maintain
performance for code data. The results demonstrate that feature selection
highly affects active learning and using output vectors to select data is the
best choice. For the code summarization task, active code learning is
ineffective which produces models with over a 29.64\% gap compared to the
expected performance. Furthermore, we explore future directions of active code
learning with an exploratory study. We propose to replace distance calculation
methods with evaluation metrics and find a correlation between these
evaluation-based distance methods and the performance of code models.Comment: 12 pages, ongoing wor
Evaluating the Robustness of Test Selection Methods for Deep Neural Networks
Testing deep learning-based systems is crucial but challenging due to the
required time and labor for labeling collected raw data. To alleviate the
labeling effort, multiple test selection methods have been proposed where only
a subset of test data needs to be labeled while satisfying testing
requirements. However, we observe that such methods with reported promising
results are only evaluated under simple scenarios, e.g., testing on original
test data. This brings a question to us: are they always reliable? In this
paper, we explore when and to what extent test selection methods fail for
testing. Specifically, first, we identify potential pitfalls of 11 selection
methods from top-tier venues based on their construction. Second, we conduct a
study on five datasets with two model architectures per dataset to empirically
confirm the existence of these pitfalls. Furthermore, we demonstrate how
pitfalls can break the reliability of these methods. Concretely, methods for
fault detection suffer from test data that are: 1) correctly classified but
uncertain, or 2) misclassified but confident. Remarkably, the test relative
coverage achieved by such methods drops by up to 86.85%. On the other hand,
methods for performance estimation are sensitive to the choice of
intermediate-layer output. The effectiveness of such methods can be even worse
than random selection when using an inappropriate layer.Comment: 12 page
The professionalism disconnect: Do entering residents identify yet participate in unprofessional behaviors?
Background
Professionalism has been an important tenet of medical education, yet defining it is a challenge. Perceptions of professional behavior may vary by individual, medical specialty, demographic group and institution. Understanding these differences should help institutions better clarify professionalism expectations and provide standards with which to evaluate resident behavior.
Methods
Duke University Hospital and Vidant Medical Center/East Carolina University surveyed entering PGY1 residents. Residents were queried on two issues: their perception of the professionalism of 46 specific behaviors related to training and patient care; and their own participation in those specified behaviors. The study reports data analyses for gender and institution based upon survey results in 2009 and 2010. The study received approval by the Institutional Review Boards of both institutions.
Results
76% (375) of 495 PGY1 residents surveyed in 2009 and 2010 responded. A majority of responders rated all 46 specified behaviors as unprofessional, and a majority had either observed or participated in each behavior. For all 46 behaviors, a greater percentage of women rated the behaviors as unprofessional. Men were more likely than women to have participated in behaviors. There were several significant differences in both the perceptions of specified behaviors and in self-reported observation of and/or involvement in those behaviors between institutions.
Respondents indicated the most important professionalism issues relevant to medical practice include: respect for colleagues/patients, relationships with pharmaceutical companies, balancing home/work life, and admitting mistakes. They reported that professionalism can best be assessed by peers, patients, observation of non-medical work and timeliness/detail of paperwork.
Conclusion
Defining professionalism in measurable terms is a challenge yet critical in order for it to be taught and assessed. Recognition of the differences by gender and institution should allow for tailored teaching and assessment of professionalism so that it is most meaningful. A shared understanding of what constitutes professional behavior is an important first step
- …