1,334 research outputs found
The Effect of Text Summarization in Essay Scoring (Case Study: Teach on E-Learning)
The development of automated essay scoring (AES) in the neural network (NN) approach has eliminated feature engineering. However, feature engineering is still needed, moreover, data with labels in the form of rubric scores, which are complementary to AES holistic scores, are still rarely found. In general, data without labels/scores is found more. However, unsupervised AES research has not progressed with the more common use of publicly labeled data. Based on the case studies adopted in the research, automatic text summarization (ATS) was used as a feature engineering model of AES and readability index as the definition of rubric values for data without labels.This research focuses on developing AES by implementing ATS results on SOM and HDBSCAN. The data used in this research are 403 documents of TEACH ON E-learning essays. Data is represented in the form of a combination of word vectors and a readability index. Based on the tests and measurements carried out, it was concluded that AES with ATS implementation had no good potential for the assessment of TEACH ON essays in increasing the silhouette score. The model produces the best silhouette score of 0.727286113 with original essay data
Recommended from our members
Learning Analytics for Academic Writing through Automatic Identification of Meta-discourse
Effective written communication is an essential skill which promotes educational success for undergraduates. Argumentation is a key requirement of successful writing, which is the most common genre that undergraduates have to write particularly in the social sciences. Therefore, when assessing student writing academic tutors look for studentsâ ability to present and pursue well-reasoned and strong arguments through scholarly argumentation, which is articulated by meta-discourse.
Today, there are some natural language processing systems which automatically detect authorsâ rhetorical moves in scholarly texts. Hence, when assessing their studentsâ essays, educators could benefit from the available automated textual analysis which can detect meta-discourse. However, previous work has not shown whether these technologies can be used to analyse student writing reliably. The aim of this thesis therefore has been to understand how automated analysis of meta-discourse in student writing can be used to support tutorsâ essay assessment practices. This thesis evaluates a particular language analysis tool, the Xerox Incremental Parser (XIP) as an exemplar of this type of automated technology.
The studies presented in this thesis investigates how tutors define the quality of undergraduate writing and suggests key elements that make for good quality student writing in the social sciences, where XIP seems to work best. This thesis also sets out the changes that needs to be made to the XIP and proposes in what ways its output can be delivered to tutors so that they make use of this output to give feedback on student essays.
The findings reported also show problems that academic tutors experience in essay assessment, which potentially could be solved by automated support. However, tutors have preconceptions about the use of automated support.
The study revealed that tutors want to be assured that they retain the âpowerâ themselves in any decision of using automated support to overcome these preconceptions
Recommended from our members
Developing Learning Analytics for Epistemic Commitments in a Collaborative Information Seeking Environment
Learning analytics sits at the confluence of learning, information, and computer sciences. Using a distinctive account of learning analytics as a form of assessment, I first argue for its potential in pedagogically motivated learning design, suggesting a particular construct â epistemic cognition in literacy contexts â to probe using learning analytics. I argue for a recasting of epistemic cognition as âepistemic commitmentsâ in collaborative information tasks drawing a novel alignment between information seeking and multiple document processing (MDP) models, with empirical and theoretical grounding given for a focus on collaboration and dialogue in such activities. Thus, epistemic commitments are seen in the ways students seek, select, and integrate claims from multiple sources, and the ways in which their collaborative dialogue is brought to bear in this activity. Accordingly, the empirical element of the thesis develops two pedagogically grounded literacy based tasks: a MDP task, in which pre-selected documents were provided to students; and a collaborative information seeking task (CIS), in which students could search the web. These tasks were deployed at scale (n > 500) and involved writing an evaluative review, followed by a pedagogically supported peer assessment task. Assessment outcomes were analysed in the context of a new epistemic commitments-oriented set of trace data, and psychometric data regarding the participantsâ epistemic cognition. Demonstrating the value of the methodological and conceptual approach taken, qualitative analyses indicate clear epistemic activity, and stark differences in behaviour between groups, the complexity of which is challenging to model computationally. Despite this complexity, quantitative analyses indicate that up to 30% of variance in output scores can be modelled using behavioural indicators. The explanatory potential of behaviourally-oriented models of epistemic commitments grounded in tool-interaction and collaborative dialogue is demonstrated. The thesis provides an exemplification of theoretically positioned analytic development, drawing on interdisciplinary literatures in addressing complex learning contexts
An instrument for assessing the public communication of scientists
An instrument for valid, quantitative assessment of scientistsâ public communication promises to promote improved science communication by giving formative feedback to scientists developing their communication skills and providing a mechanism for summative assessment of communication training programs for scientists. A quantitative instrument also fits with the scientific ethos, increasing the likelihood that the assessment will gain individual and institutional adoption. Unfortunately, past assessment instruments have fallen short in providing a methodologically sound, theory-based assessment instrument to use in assessing public science communication. This dissertation uses the Evidence Centered Design (ECD) method for language testing to develop and test the APPSâthe Assessment for Public Presentations by Scientistsâa f filled-cell rubric and accompanying code book based on communication theory and practice that can be used to provide formative and summative assessments of scientists giving informative presentations to public, non-scientist audiences.
The APPS rubric was developed by employing an extensive domain analysis to establish the knowledge, skills, and abilities most desired for scientists who speak to public audiences, based on a methodical review of scientific organizations and a systematic review of science communication scholarship. This analysis found that scientists addressing public audiences should speak in language that is understandable, concrete, and free from scientific jargon, translating important scientific information into language that public audiences can understand; should convey the relevance and importance of science to the everyday lives of audience members; should employ visuals that enhance the presentations; should explain scientific processes, techniques, and purposes; should engage in behaviors that increase the audienceâs perceptions of scientists as trustworthy, human, and approachable; and should engage in interactive exchanges about science with public audiences. The APPS operationalizes these skills and abilities, using communication theory, in a detailed, user-friendly rubric and code book for assessing public communication by scientists. The rubric delineates theory-based techniques for demonstrating the desired skills, such as using explanatory metaphors, engaging in behaviors that increase immediacy, using first-person pronouns, telling personal stories, and engaging in back-and-forth conversation with the audience.
Four rounds of testing provided evidence that the final version of the APPS is a reliable and valid assessment, with constructs that measure what they are intended to measure and that are seen similarly by different raters when used in conjunction with rater training. Early rounds of testing showed the need to adjust wording and understanding of some of the constructs so that raters understood them similarly, and later testing showed marked improvement in those areas. Although the stringent interclass agreement measure Cohenâs kappa did not show strong agreement in most measures, the adjacent agreement (where raters choose scores that are within one point of each other) was high for every category in the final testing. This shows that although raters did not often have exactly the same score for speakers in each construct, they nearly always understood the construct similarly.
The agreement ratings also accentuate the studyâs finding that the ratersâ backgrounds may affect their abilities to objectively score science speakers. Testing showed that science raters had difficulty separating themselves from their inherent science knowledge and had difficulty objectively rating communication skills. Therefore, this study finds that scientists can act as communication raters if they are trained by practicing rating science presentations as a group to norm scoring and by studying communication skills discussed in the code book. However, because of the possible difficulty separating themselves from their intrinsic science knowledge and their lack of experience in identifying excellent communication practices, the assessment of science speakers will nearly always be more accurate and the communication performance of scientists more enhanced when utilizing communication experts to help train and assess scientists in their science communication with public audiences.
Therefore, the APPS can be a valuable tool for improving the knowledge, skills, and abilities of scientists communicating with public audiences when used by communication training programs to provide prompt, specific feedback. Given the reliability limitations, the rubric should not be used for high-stakes purposes or for âprovingâ a speakerâs competence. However, when used in a science communication training program with consistent raters, the APPS can provide valuable summative and formative assessment for science communicators
Empirical Standards for Software Engineering Research
Empirical Standards are natural-language models of a scientific community's
expectations for a specific kind of study (e.g. a questionnaire survey). The
ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical
standards for research methods commonly used in software engineering. These
living documents, which should be continuously revised to reflect evolving
consensus around research best practices, will improve research quality and
make peer review more effective, reliable, transparent and fair.Comment: For the complete standards, supplements and other resources, see
https://github.com/acmsigsoft/EmpiricalStandard
Corrective Feedback in the EFL Classroom: Grammar Checker vs. Teacherâs Feedback.
The aim of this doctoral thesis is to compare the feedback provided by the teacher to that obtained by the software called Grammar Checker on grammatical errors in the written production of English as a foreign language students. Traditionally, feedback has been considered as one of the three theoretical conditions for language learning (along with input and output) and, for this reason, extensive research has been carried out on who should provide it, when and the level of explicitness. However, there are far fewer studies that analyse the use of e-feedback programs as a complement or alternative to those offered by the teacher. Participants in our study were divided into two experimental groups and one control group, and three grammatical aspects that are usually susceptible to error in English students at B2 level were examined: prepositions, articles, and simple past-present/past perfect dichotomy. All participants had to write four essays. The first experimental group received feedback from the teacher and the second received it through the Grammar Checker program. The control group did not get feedback on the grammatical aspects of the analysis but on other linguistic forms not studied. The results obtained point, first of all, to the fact that the software did not mark grammatical errors in some cases. This means that students were unable to improve their written output in terms of linguistic accuracy after receiving feedback from the program. In contrast, students who received feedback from the teacher did improve, although the difference was not significant. Second, the two experimental groups outperformed the control group in the use of the grammatical forms under analysis. Thirdly, regardless of the feedback offered, the two groups showed improvement in the use of grammatical aspects in the long term, and finally, no differences in attitude towards the feedback received and its impact on the results were found in either of the experimental groups. Our results open up new lines for investigating corrective feedback in the English as a foreign language classroom, since more studies are needed that, on the one hand, influence the improvement of electronic feedback programs by making them more accurate and effective in the detection of errors. On the other hand, software such as Grammar Checker can be a complement to the daily practice of the foreign language teacher, helping in the first instance to correct common and recurring mistakes, even more so when our research has shown that attitudes towards this type of electronic feedback are positive and does not imply an intrusion into the classroom, thus helping in the acquisition of the English language.Programa de Doctorat en LlengĂŒes Aplicades, Literatura i Traducci
Veracity Roadmap: Is Big Data Objective, Truthful and Credible?
This paper argues that big data can possess different characteristics, which affect its quality. Depending on its origin, data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have biases, ambiguities, and inaccuracies which need to be identified and accounted for to reduce inference errors and improve the accuracy of generated insights. Big data veracity is now being recognized as a necessary property for its utilization, complementing the three previously established quality dimensions (volume, variety, and velocity), But there has been little discussion of the concept of veracity thus far. This paper provides a roadmap for theoretical and empirical definitions of veracity along with its practical implications. We explore veracity across three main dimensions: 1) objectivity/subjectivity, 2) truthfulness/deception, 3) credibility/implausibility â and propose to operationalize each of these dimensions with either existing computational tools or potential ones, relevant particularly to textual data analytics. We combine the measures of veracity dimensions into one composite index â the big data veracity index. This newly developed veracity index provides a useful way of assessing systematic variations in big data quality across datasets with textual information. The paper contributes to the big data research by categorizing the range of existing tools to measure the suggested dimensions, and to Library and Information Science (LIS) by proposing to account for heterogeneity of diverse big data, and to identify information quality dimensions important for each big data type
The relationship between test takers' performance on the TEM4 and their knowledge of the released test specifications
The role that released test specifications can play during test preparation is often neglected by test takers, and even researchers. Focusing on the Test for English Majors-Band 4 (TEM4) in a Chinese EFL setting, this paper investigates the preparation effects associated with the use of TEM4 Syllabus, or its released specifications. Data collection involved 48 test takers of the TEM4 recruited from a large university in central China, where the experimental group was given a tutorial session on the TEM4 Syllabus as the treatment. Specifically, the study measured the effects associated with the TEM4 Syllabus by using a quantitative metric of score improvement and a qualitative metric informed by a framework adapted from the work of Messick (1982) and Xie (2013). Along with its exploration of possible preparation effects, this paper also discusses the ethicality of different test preparation methods and touches on the issue of specification releasability (Davidson, 2012)
Advancing Human Assessment: The Methodological, Psychological and Policy Contributions of ETS
âThis book describes the extensive contributions made toward the advancement of human assessment by scientists from one of the worldâs leading research institutions, Educational Testing Service. The bookâs four major sections detail research and development in measurement and statistics, education policy analysis and evaluation, scientific psychology, and validity. Many of the developments presented have become de-facto standards in educational and psychological measurement, including in item response theory (IRT), linking and equating, differential item functioning (DIF), and educational surveys like the National Assessment of Educational Progress (NAEP), the Programme of international Student Assessment (PISA), the Progress of International Reading Literacy Study (PIRLS) and the Trends in Mathematics and Science Study (TIMSS). In addition to its comprehensive coverage of contributions to the theory and methodology of educational and psychological measurement and statistics, the book gives significant attention to ETS work in cognitive, personality, developmental, and social psychology, and to education policy analysis and program evaluation. The chapter authors are long-standing experts who provide broad coverage and thoughtful insights that build upon decades of experience in research and best practices for measurement, evaluation, scientific psychology, and education policy analysis. Opening with a chapter on the genesis of ETS and closing with a synthesis of the enormously diverse set of contributions made over its 70-year history, the book is a useful resource for all interested in the improvement of human assessment
- âŠ