Search CORE

763 research outputs found

MuDelta: Delta-Oriented Mutation Testing at Commit Time

Author: Chekam TT
Harman M
Ma W
Papadakis M
Publication venue: 43rd IEEE/ACM International Conference on Software Engineering - Software Engineering in Practice (ICSE-SEIP) / 43rd ACM/IEEE International Conference on Software Engineering - New Ideas and Emerging Results (ICSE-NIER)
Publication date: 07/05/2021
Field of study

To effectively test program changes using mutation testing, one needs to use mutants that are relevant to the altered program behaviours. In view of this, we introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviours. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR Curve) AUC values with 0.63 and 0.32 precision and recall values. These predictions are significantly higher than random guesses, 0.20 (PR-Curve) AUC, 0.21 and 0.21 precision and recall, and subsequently lead to strong relevant tests that kill 45%more relevant mutants than randomly sampled mutants (either sampled from those residing on the changed component(s) or from the changed lines). Our results also show that MuDelta selects mutants with 27% higher fault revealing ability in fault introducing commits. Taken together, our results corroborate the conclusion that commit-based mutation testing is suitable and promising for evolving software

UCL Discovery

Molecular-level simulations of turbulence and Its decay

Author: Bitter NP
Gallis MA
Koehler TP
Papadakis G
Plimpton SJ
Torczynski JR
Publication venue: 'American Physical Society (APS)'
Publication date: 17/01/2017
Field of study

We provide the first demonstration that molecular-level methods based on gas kinetic theory and molecular chaos can simulate turbulence and its decay. The direct simulation Monte Carlo (DSMC) method, a molecular-level technique for simulating gas flows that resolves phenomena from molecular to hydrodynamic (continuum) length scales, is applied to simulate the Taylor-Green vortex flow. The DSMC simulations reproduce the Kolmogorov − 5 / 3 law and agree well with the turbulent kinetic energy and energy dissipation rate obtained from direct numerical simulation of the Navier-Stokes equations using a spectral method. This agreement provides strong evidence that molecular-level methods for gases can be used to investigate turbulent flows quantitatively

Spiral - Imperial College Digital Repository

DSMC simulations of turbulent flows at moderate Reynolds numbers

Author: Bitter NP
Gallis MA
Koehler TP
Moore SG
Papadakis G
Plimpton SJ
Torczynski JR
Publication venue: 'AIP Publishing'
Publication date: 01/01/2019
Field of study

The Direct Simulation Monte Carlo (DSMC) method has been used for more than 50 years to simulate rarefied gases. The advent of modern supercomputers has brought higher-density near-continuum flows within range. This in turn has revived the debate as to whether the Boltzmann equation, which assumes molecular chaos, can be used to simulate continuum flows when they become turbulent. In an effort to settle this debate, two canonical turbulent flows are examined, and the results are compared to available continuum theoretical and numerical results for the Navier-Stokes equations

Crossref

Spiral - Imperial College Digital Repository

Your professionalism is not my professionalism:congruence and variance in the views of medical students and faculty about professionalism

Author: ABIM Foundation
G Ryan
I Shukr
Kamran Sattar
MA Papadakis
MA Papadakis
MM Al-Eraky
NJ Scheers
NP Kenny
PE Teplitsky
R Zaini
RA Murden
S Babelli
S Roff
S Roff
Sue Roff
Sultan Ayoub Meo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Abstract Background Medical professionalism is an essential aspect of medical education and practice worldwide and it must be adopted according to different social and cultural contexts. We examined the current congruence and variance in the perception of professionalism in undergraduate medical students and faculty members in one medical school in Saudi Arabia. Methods The target population was first year to final year medical students of College of Medicine, King Saud University. Out of a total of 1431 students at College of Medicine 750 students (52 %) participated in the study. Fifty faculty members from clinical and non-clinical departments of the College of Medicine were randomly selected for this study and all participated in the study. The respondents recorded their responses through the Bristol online survey system, using a bilingual (English and Arabic) version of the Dundee Polyprofessionalism Inventory I: Academic integrity, which has 34 items. Results There are 17 lapses (50 % of the total) in professional behaviour where none of the faculty recommend the ignore sanction while students recommended a variable ignore sanction in a range of 6–29 % for different behaviours. Students and faculty recommended similar sanctions for 5 lapses (14.7 % of the total) in professional behaviours. Furthermore, there is statistically significant two level difference between the sanctions approved by faculty and students in the recommended sanctions for 12 lapses (35 % of the total (p < 0.050). Conclusions These results raised concerns in relation to the students’ understanding of professionalism. It is therefore, important to enhance their learning around the attributes of medical professionalism

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Dundee Online Publications

Efficient Testing of Deep Neural Networks via Decision Boundary Analysis

Author: Cordy Maxime
Guo Yuejun
Hu Qiang
Ma Lei
Papadakis Mike
Traon Yves Le
Xie Xiaofei
Publication venue
Publication date: 22/07/2022
Field of study

Deep learning plays a more and more important role in our daily life due to its competitive performance in multiple industrial application domains. As the core of DL-enabled systems, deep neural networks automatically learn knowledge from carefully collected and organized training data to gain the ability to predict the label of unseen data. Similar to the traditional software systems that need to be comprehensively tested, DNNs also need to be carefully evaluated to make sure the quality of the trained model meets the demand. In practice, the de facto standard to assess the quality of DNNs in industry is to check their performance (accuracy) on a collected set of labeled test data. However, preparing such labeled data is often not easy partly because of the huge labeling effort, i.e., data labeling is labor-intensive, especially with the massive new incoming unlabeled data every day. Recent studies show that test selection for DNN is a promising direction that tackles this issue by selecting minimal representative data to label and using these data to assess the model. However, it still requires human effort and cannot be automatic. In this paper, we propose a novel technique, named Aries, that can estimate the performance of DNNs on new unlabeled data using only the information obtained from the original test data. The key insight behind our technique is that the model should have similar prediction accuracy on the data which have similar distances to the decision boundary. We performed a large-scale evaluation of our technique on 13 types of data transformation methods. The results demonstrate the usefulness of our technique that the estimated accuracy by Aries is only 0.03% -- 2.60% (on average 0.61%) off the true accuracy. Besides, Aries also outperforms the state-of-the-art selection-labeling-based methods in most (96 out of 128) cases.Comment: 12 page

arXiv.org e-Print Archive

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

Author: Cordy Maxime
Guo Yuejun
Hu Qiang
Ma Lei
Papadakis Mike
Traon Yves Le
Xie Xiaofei
Publication venue
Publication date: 01/06/2023
Field of study

The costly human effort required to prepare the training data of machine learning (ML) models hinders their practical development and usage in software engineering (ML4Code), especially for those with limited budgets. Therefore, efficiently training models of code with less human effort has become an emergent problem. Active learning is such a technique to address this issue that allows developers to train a model with reduced data while producing models with desired performance, which has been well studied in computer vision and natural language processing domains. Unfortunately, there is no such work that explores the effectiveness of active learning for code models. In this paper, we bridge this gap by building the first benchmark to study this critical problem - active code learning. Specifically, we collect 11 acquisition functions~(which are used for data selection in active learning) from existing works and adapt them for code-related tasks. Then, we conduct an empirical study to check whether these acquisition functions maintain performance for code data. The results demonstrate that feature selection highly affects active learning and using output vectors to select data is the best choice. For the code summarization task, active code learning is ineffective which produces models with over a 29.64\% gap compared to the expected performance. Furthermore, we explore future directions of active code learning with an exploratory study. We propose to replace distance calculation methods with evaluation metrics and find a correlation between these evaluation-based distance methods and the performance of code models.Comment: 12 pages, ongoing wor

arXiv.org e-Print Archive

Evaluating the Robustness of Test Selection Methods for Deep Neural Networks

Author: Cordy Maxime
Guo Yuejun
Hu Qiang
Ma Wei
Papadakis Mike
Traon Yves Le
Xie Xiaofei
Publication venue
Publication date: 29/07/2023
Field of study

Testing deep learning-based systems is crucial but challenging due to the required time and labor for labeling collected raw data. To alleviate the labeling effort, multiple test selection methods have been proposed where only a subset of test data needs to be labeled while satisfying testing requirements. However, we observe that such methods with reported promising results are only evaluated under simple scenarios, e.g., testing on original test data. This brings a question to us: are they always reliable? In this paper, we explore when and to what extent test selection methods fail for testing. Specifically, first, we identify potential pitfalls of 11 selection methods from top-tier venues based on their construction. Second, we conduct a study on five datasets with two model architectures per dataset to empirically confirm the existence of these pitfalls. Furthermore, we demonstrate how pitfalls can break the reliability of these methods. Concretely, methods for fault detection suffer from test data that are: 1) correctly classified but uncertain, or 2) misclassified but confident. Remarkably, the test relative coverage achieved by such methods drops by up to 86.85%. On the other hand, methods for performance estimation are sensitive to the choice of intermediate-layer output. The effectiveness of such methods can be even worse than random selection when using an inappropriate layer.Comment: 12 page

arXiv.org e-Print Archive

The professionalism disconnect: Do entering residents identify yet participate in unprofessional behaviors?

Author: Alisa Nagler
C Gillespie
C Gilligan
C Harenski
CE Dehlendorf
D Wear
DA Reed
David Musick
DT Stern
ES Holmboe
FW Hafferty
G Finn
GB Hickson
H O'Sullivan
H Sanfey
J Morrison
J Park
JJ Cohen
JM Farnan
Kathryn Andolsek
KM Ludmerer
Lorraine Basnight
M Green
M Green
MA Hall
MA Papadakis
MA Papadakis
MA Papadakis
Mariah Rudd
MJ Asken
MW Rabow
O Karnieli-Miller
P Batalden
P Jackson
P Sohl
Richard Sloane
S Ginsburg
VM Arora
W Bahaziq
WC Williams
Y Hur
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background Professionalism has been an important tenet of medical education, yet defining it is a challenge. Perceptions of professional behavior may vary by individual, medical specialty, demographic group and institution. Understanding these differences should help institutions better clarify professionalism expectations and provide standards with which to evaluate resident behavior. Methods Duke University Hospital and Vidant Medical Center/East Carolina University surveyed entering PGY1 residents. Residents were queried on two issues: their perception of the professionalism of 46 specific behaviors related to training and patient care; and their own participation in those specified behaviors. The study reports data analyses for gender and institution based upon survey results in 2009 and 2010. The study received approval by the Institutional Review Boards of both institutions. Results 76% (375) of 495 PGY1 residents surveyed in 2009 and 2010 responded. A majority of responders rated all 46 specified behaviors as unprofessional, and a majority had either observed or participated in each behavior. For all 46 behaviors, a greater percentage of women rated the behaviors as unprofessional. Men were more likely than women to have participated in behaviors. There were several significant differences in both the perceptions of specified behaviors and in self-reported observation of and/or involvement in those behaviors between institutions. Respondents indicated the most important professionalism issues relevant to medical practice include: respect for colleagues/patients, relationships with pharmaceutical companies, balancing home/work life, and admitting mistakes. They reported that professionalism can best be assessed by peers, patients, observation of non-medical work and timeliness/detail of paperwork. Conclusion Defining professionalism in measurable terms is a challenge yet critical in order for it to be taught and assessed. Recognition of the differences by gender and institution should allow for tailored teaching and assessment of professionalism so that it is most meaningful. A shared understanding of what constitutes professional behavior is an important first step

Crossref

Springer - Publisher Connector

DukeSpace

PubMed Central

The University of North Carolina at Greensboro

ScholarShip