835,367 research outputs found
A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark
Automatic program repair papers tend to repeatedly use the same benchmarks.
This poses a threat to the external validity of the findings of the program
repair research community. In this paper, we perform an empirical study of
automatic repair on a benchmark of bugs called QuixBugs, which has been little
studied. In this paper, 1) We report on the characteristics of QuixBugs; 2) We
study the effectiveness of 10 program repair tools on it; 3) We apply three
patch correctness assessment techniques to comprehensively study the presence
of overfitting patches in QuixBugs. Our key results are: 1) 16/40 buggy
programs in QuixBugs can be repaired with at least a test suite adequate patch;
2) A total of 338 plausible patches are generated on the QuixBugs by the
considered tools, and 53.3% of them are overfitting patches according to our
manual assessment; 3) The three automated patch correctness assessment
techniques, RGT_Evosuite, RGT_InputSampling and GT_Invariants, achieve an
accuracy of 98.2%, 80.8% and 58.3% in overfitting detection, respectively. To
our knowledge, this is the largest empirical study of automatic repair on
QuixBugs, combining both quantitative and qualitative insights. All our
empirical results are publicly available on GitHub in order to facilitate
future research on automatic program repair
Automatic assessment of sequence diagrams
In previous work we showed how student-produced entity-relationship diagrams (ERDs) could be automatically marked with good accuracy when compared with human markers. In this paper we report how effective the same techniques are when applied to syntactically similar UML sequence diagrams and discuss some issues that arise which did not occur with ERDs. We have found that, on a corpus of 100 student-drawn sequence diagrams, the automatic marking technique is more reliable that human markers. In addition, an analysis of this corpus revealed significant syntax errors in student-drawn sequence diagrams. We used the information obtained from the analysis to build a tool that not only detects syntax errors but also provides feedback in diagrammatic form. The tool has been extended to incorporate the automatic marker to provide a revision tool for learning how to model with sequence diagrams
Automatic coding of short text responses via clustering in educational assessment
Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for International Student Assessment (PISA) 2012 in Germany. Free text responses of 10 items with Formula responses in total were analyzed. We further examined the effect of different methods, parameter values, and sample sizes on performance of the implemented system. The system reached fair to good up to excellent agreement with human codings Formula Especially items that are solved by naming specific semantic concepts appeared properly coded. The system performed equally well with Formula and somewhat poorer but still acceptable down to Formula Based on our findings, we discuss potential innovations for assessment that are enabled by automatic coding of short text responses. (DIPF/Orig.
Speech systems research at Texas Instruments
An assessment of automatic speech processing technology is presented. Fundamental problems in the development and the deployment of automatic speech processing systems are defined and a technology forecast for speech systems is presented
Evaluation of Automatic Video Captioning Using Direct Assessment
We present Direct Assessment, a method for manually assessing the quality of
automatically-generated captions for video. Evaluating the accuracy of video
captions is particularly difficult because for any given video clip there is no
definitive ground truth or correct answer against which to measure. Automatic
metrics for comparing automatic video captions against a manual caption such as
BLEU and METEOR, drawn from techniques used in evaluating machine translation,
were used in the TRECVid video captioning task in 2016 but these are shown to
have weaknesses. The work presented here brings human assessment into the
evaluation by crowdsourcing how well a caption describes a video. We
automatically degrade the quality of some sample captions which are assessed
manually and from this we are able to rate the quality of the human assessors,
a factor we take into account in the evaluation. Using data from the TRECVid
video-to-text task in 2016, we show how our direct assessment method is
replicable and robust and should scale to where there many caption-generation
techniques to be evaluated.Comment: 26 pages, 8 figure
Accessibility assessment of MOOC platforms in Spanish: UNED COMA, COLMENIA and Miriada X
This article develops a methodology for the assessment of MOOC courses, focusing on the degree of accessibility of three Spanish MOOC platforms: UNED COMA, COLMENIA and Miriada X. Four different criteria have been
used in this context: automatic tools, disability simulators, testing tools and educational conten
- …
