1,334 research outputs found

    The Effect of Text Summarization in Essay Scoring (Case Study: Teach on E-Learning)

    Get PDF
    The development of automated essay scoring (AES) in the neural network (NN) approach has eliminated feature engineering. However, feature engineering is still needed, moreover, data with labels in the form of rubric scores, which are complementary to AES holistic scores, are still rarely found. In general, data without labels/scores is found more. However, unsupervised AES research has not progressed with the more common use of publicly labeled data. Based on the case studies adopted in the research, automatic text summarization (ATS) was used as a feature engineering model of AES and readability index as the definition of rubric values for data without labels.This research focuses on developing AES by implementing ATS results on SOM and HDBSCAN. The data used in this research are 403 documents of TEACH ON E-learning essays. Data is represented in the form of a combination of word vectors and a readability index. Based on the tests and measurements carried out, it was concluded that AES with ATS implementation had no good potential for the assessment of TEACH ON essays in increasing the silhouette score. The model produces the best silhouette score of 0.727286113 with original essay data

    An instrument for assessing the public communication of scientists

    Get PDF
    An instrument for valid, quantitative assessment of scientists’ public communication promises to promote improved science communication by giving formative feedback to scientists developing their communication skills and providing a mechanism for summative assessment of communication training programs for scientists. A quantitative instrument also fits with the scientific ethos, increasing the likelihood that the assessment will gain individual and institutional adoption. Unfortunately, past assessment instruments have fallen short in providing a methodologically sound, theory-based assessment instrument to use in assessing public science communication. This dissertation uses the Evidence Centered Design (ECD) method for language testing to develop and test the APPS—the Assessment for Public Presentations by Scientists—a f filled-cell rubric and accompanying code book based on communication theory and practice that can be used to provide formative and summative assessments of scientists giving informative presentations to public, non-scientist audiences. The APPS rubric was developed by employing an extensive domain analysis to establish the knowledge, skills, and abilities most desired for scientists who speak to public audiences, based on a methodical review of scientific organizations and a systematic review of science communication scholarship. This analysis found that scientists addressing public audiences should speak in language that is understandable, concrete, and free from scientific jargon, translating important scientific information into language that public audiences can understand; should convey the relevance and importance of science to the everyday lives of audience members; should employ visuals that enhance the presentations; should explain scientific processes, techniques, and purposes; should engage in behaviors that increase the audience’s perceptions of scientists as trustworthy, human, and approachable; and should engage in interactive exchanges about science with public audiences. The APPS operationalizes these skills and abilities, using communication theory, in a detailed, user-friendly rubric and code book for assessing public communication by scientists. The rubric delineates theory-based techniques for demonstrating the desired skills, such as using explanatory metaphors, engaging in behaviors that increase immediacy, using first-person pronouns, telling personal stories, and engaging in back-and-forth conversation with the audience. Four rounds of testing provided evidence that the final version of the APPS is a reliable and valid assessment, with constructs that measure what they are intended to measure and that are seen similarly by different raters when used in conjunction with rater training. Early rounds of testing showed the need to adjust wording and understanding of some of the constructs so that raters understood them similarly, and later testing showed marked improvement in those areas. Although the stringent interclass agreement measure Cohen’s kappa did not show strong agreement in most measures, the adjacent agreement (where raters choose scores that are within one point of each other) was high for every category in the final testing. This shows that although raters did not often have exactly the same score for speakers in each construct, they nearly always understood the construct similarly. The agreement ratings also accentuate the study’s finding that the raters’ backgrounds may affect their abilities to objectively score science speakers. Testing showed that science raters had difficulty separating themselves from their inherent science knowledge and had difficulty objectively rating communication skills. Therefore, this study finds that scientists can act as communication raters if they are trained by practicing rating science presentations as a group to norm scoring and by studying communication skills discussed in the code book. However, because of the possible difficulty separating themselves from their intrinsic science knowledge and their lack of experience in identifying excellent communication practices, the assessment of science speakers will nearly always be more accurate and the communication performance of scientists more enhanced when utilizing communication experts to help train and assess scientists in their science communication with public audiences. Therefore, the APPS can be a valuable tool for improving the knowledge, skills, and abilities of scientists communicating with public audiences when used by communication training programs to provide prompt, specific feedback. Given the reliability limitations, the rubric should not be used for high-stakes purposes or for “proving” a speaker’s competence. However, when used in a science communication training program with consistent raters, the APPS can provide valuable summative and formative assessment for science communicators

    Empirical Standards for Software Engineering Research

    Full text link
    Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.Comment: For the complete standards, supplements and other resources, see https://github.com/acmsigsoft/EmpiricalStandard

    Corrective Feedback in the EFL Classroom: Grammar Checker vs. Teacher’s Feedback.

    Get PDF
    The aim of this doctoral thesis is to compare the feedback provided by the teacher to that obtained by the software called Grammar Checker on grammatical errors in the written production of English as a foreign language students. Traditionally, feedback has been considered as one of the three theoretical conditions for language learning (along with input and output) and, for this reason, extensive research has been carried out on who should provide it, when and the level of explicitness. However, there are far fewer studies that analyse the use of e-feedback programs as a complement or alternative to those offered by the teacher. Participants in our study were divided into two experimental groups and one control group, and three grammatical aspects that are usually susceptible to error in English students at B2 level were examined: prepositions, articles, and simple past-present/past perfect dichotomy. All participants had to write four essays. The first experimental group received feedback from the teacher and the second received it through the Grammar Checker program. The control group did not get feedback on the grammatical aspects of the analysis but on other linguistic forms not studied. The results obtained point, first of all, to the fact that the software did not mark grammatical errors in some cases. This means that students were unable to improve their written output in terms of linguistic accuracy after receiving feedback from the program. In contrast, students who received feedback from the teacher did improve, although the difference was not significant. Second, the two experimental groups outperformed the control group in the use of the grammatical forms under analysis. Thirdly, regardless of the feedback offered, the two groups showed improvement in the use of grammatical aspects in the long term, and finally, no differences in attitude towards the feedback received and its impact on the results were found in either of the experimental groups. Our results open up new lines for investigating corrective feedback in the English as a foreign language classroom, since more studies are needed that, on the one hand, influence the improvement of electronic feedback programs by making them more accurate and effective in the detection of errors. On the other hand, software such as Grammar Checker can be a complement to the daily practice of the foreign language teacher, helping in the first instance to correct common and recurring mistakes, even more so when our research has shown that attitudes towards this type of electronic feedback are positive and does not imply an intrusion into the classroom, thus helping in the acquisition of the English language.Programa de Doctorat en LlengĂŒes Aplicades, Literatura i Traducci

    Veracity Roadmap: Is Big Data Objective, Truthful and Credible?

    Get PDF
    This paper argues that big data can possess different characteristics, which affect its quality. Depending on its origin, data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have biases, ambiguities, and inaccuracies which need to be identified and accounted for to reduce inference errors and improve the accuracy of generated insights. Big data veracity is now being recognized as a necessary property for its utilization, complementing the three previously established quality dimensions (volume, variety, and velocity), But there has been little discussion of the concept of veracity thus far. This paper provides a roadmap for theoretical and empirical definitions of veracity along with its practical implications. We explore veracity across three main dimensions: 1) objectivity/subjectivity, 2) truthfulness/deception, 3) credibility/implausibility – and propose to operationalize each of these dimensions with either existing computational tools or potential ones, relevant particularly to textual data analytics. We combine the measures of veracity dimensions into one composite index – the big data veracity index. This newly developed veracity index provides a useful way of assessing systematic variations in big data quality across datasets with textual information. The paper contributes to the big data research by categorizing the range of existing tools to measure the suggested dimensions, and to Library and Information Science (LIS) by proposing to account for heterogeneity of diverse big data, and to identify information quality dimensions important for each big data type

    The relationship between test takers' performance on the TEM4 and their knowledge of the released test specifications

    Get PDF
    The role that released test specifications can play during test preparation is often neglected by test takers, and even researchers. Focusing on the Test for English Majors-Band 4 (TEM4) in a Chinese EFL setting, this paper investigates the preparation effects associated with the use of TEM4 Syllabus, or its released specifications. Data collection involved 48 test takers of the TEM4 recruited from a large university in central China, where the experimental group was given a tutorial session on the TEM4 Syllabus as the treatment. Specifically, the study measured the effects associated with the TEM4 Syllabus by using a quantitative metric of score improvement and a qualitative metric informed by a framework adapted from the work of Messick (1982) and Xie (2013). Along with its exploration of possible preparation effects, this paper also discusses the ethicality of different test preparation methods and touches on the issue of specification releasability (Davidson, 2012)

    Advancing Human Assessment: The Methodological, Psychological and Policy Contributions of ETS

    Get PDF
    ​This book describes the extensive contributions made toward the advancement of human assessment by scientists from one of the world’s leading research institutions, Educational Testing Service. The book’s four major sections detail research and development in measurement and statistics, education policy analysis and evaluation, scientific psychology, and validity. Many of the developments presented have become de-facto standards in educational and psychological measurement, including in item response theory (IRT), linking and equating, differential item functioning (DIF), and educational surveys like the National Assessment of Educational Progress (NAEP), the Programme of international Student Assessment (PISA), the Progress of International Reading Literacy Study (PIRLS) and the Trends in Mathematics and Science Study (TIMSS). In addition to its comprehensive coverage of contributions to the theory and methodology of educational and psychological measurement and statistics, the book gives significant attention to ETS work in cognitive, personality, developmental, and social psychology, and to education policy analysis and program evaluation. The chapter authors are long-standing experts who provide broad coverage and thoughtful insights that build upon decades of experience in research and best practices for measurement, evaluation, scientific psychology, and education policy analysis. Opening with a chapter on the genesis of ETS and closing with a synthesis of the enormously diverse set of contributions made over its 70-year history, the book is a useful resource for all interested in the improvement of human assessment
    • 

    corecore