776 research outputs found

    Identifying evidences of computer programming skills through automatic source code evaluation

    Get PDF
    Orientador: Roberto PereiraCoorientador: Eleandro MaschioTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 27/03/2020Inclui referências: p. 98-106Área de concentração: Ciência da ComputaçãoResumo: Esta tese e contextualizada no ensino de programacao de computadores em cursos de Computacao e investiga aspectos e estrategias para avaliacao automatica e continua de codigos fonte desenvolvidos pelos alunos. O estado da arte foi identificado por meio de revisao sistematica de literatura e revelou que as pesquisas anteriores tendem a realizar avaliacoes baseadas em aspectos tecnicos de codigos fonte, como a avaliacao de corretude funcional e a deteccao de erros. Avaliacoes baseadas em habilidades, por outro lado, sao pouco exploradas e possuem potencial para fornecer detalhes a respeito de habilidades representadas por conceitos de alto nivel, como desvios condicionais e estruturas de repeticao. Um metodo de identificacao automatica de evidencias de aprendizado e entao proposto como uma abordagem baseada em habilidades para a avaliacao automatica de codigos fonte de programacao. O metodo e caracterizado pela implementacao de diferentes estrategias para avaliacao de codigos fonte, identificacao de evidencias de habilidades de programacao, e representacao destas habilidades em um modelo do aluno. Experimentos realizados em ambientes controlados (bases de dados artificiais) mostraram que estrategias automaticas de avaliacao de codigo fonte sao viaveis. Experimentos conduzidos em ambientes reais (codigos fonte produzidos por alunos) produziram resultados semelhantes aos ambientes controlados, entretanto revelaram limitacoes relacionadas a implementacao das estrategias, como vulnerabilidades a sintaxes inesperadas e falhas em expressoes regulares. Um conjunto de habilidades foi selecionado para compor o modelo do aluno, representado por uma rede bayesiana dinamica. Por meio de experimentos foi demonstrado que a alimentacao do modelo com evidencias resultantes da avaliacao automatica de codigos fonte permite o acompanhamento do progresso das habilidades dos alunos. Finalmente, as estrategias automaticas em conjunto com os recursos do modelo do aluno permitiram a demonstracao da avaliacao baseada em habilidades, que se mostrou um recurso valioso para identificacao de solucoes funcionalmente corretas, porem conceitualmente incorretas; quando o programa e funcionalmente correto, retornando resultados esperados a determinadas entradas, porem foi construido com recursos e conceitos incorretos. Palavras-chave: Programacao de Computadores, Avaliacao Automatica, Avaliacao Baseada em HabilidadesAbstract: This thesis is contextualized in the teaching of computer programming in Computing courses and investigates aspects and strategies for automatic and continuous evaluation of student developed source codes. The state of the art was identified through systematic literature review and revealed previous research tends to perform evaluations based on source codes technical aspects, such as functional correctness assessment and error detection. Skills-based assessments, in turn, are less explored although having potential to provide details of skills represented by high-level concepts, such as conditionals and repetition structures. A method for automatic identification of learning evidences is then proposed as a skills-based approach to automatic evaluation of programming source codes. The method is characterized by implementing different strategies for source code evaluation, identifying evidences of programming skills, and representing these skills in a student model. Experiments conducted in controlled scenarios (testing datasets) have shown automatic source code evaluation strategies are viable. Experiments conducted in real scenarios (student-made source codes) produced results similar to controlled scenarios, however, implementation-related limitations were revealed for some strategies, such as vulnerabilities to unexpected syntax and flaws in regular expressions. A skill set was selected to compose our student model, represented by a Dynamic Bayesian Network. Experiments have shown feeding the model with evidences resulting from source codes automatic evaluation allows monitoring students' skills progress. Finally, automatic strategies coupled with student model capabilities enabled demonstrating skills-based assessment, which showed a valuable resource for identifying functionally correct source codes, but conceptually incorrect; when a program is correct functionally, returning expected results to specific inputs, but it was built with erroneous concepts and resources. Keywords: Computer Programming, Automatic Evaluation, Skills-Based Assessmen

    Southeast Asia Primary Learning Metrics (SEA-PLM) Assessment Framework

    Get PDF
    This assessment framework for the South-East Asia Primary Learning Metric (SEA-PLM) assessment program outlines an approach to assessing mathematical literacy (Chapter 2), reading literacy (Chapter 3) and writing literacy (Chapter 4). It also puts forward a conceptual framework for the context questionnaires (Chapter 5). The orientation implied by these labels is intended to emphasise that the curriculum arrangements in participating countries, which are necessarily at the centre of a regional assessment program, have as a major purpose the preparation of young people to participate effectively as members of society in such a way that they can use what they have learned at school – their reading, writing and mathematics skills, and their citizenship – to deal with the many challenges they will meet in their life beyond school. The purpose of this assessment framework is to articulate the basic structure of the SEA-PLM. It provides a description of the constructs to be measured. It also outlines the design and content of the measurement instruments and describes how measures generated by those instruments relate to the constructs

    Evaluating the impact of a Presessional English for Academic Purposes Programme: a corpus based study

    Get PDF
    This thesis investigates the impact of an intensive programme of English for academic purposes upon the second language writing development of postgraduate students at the University of Birmingham. The study uses a 300,000 word corpus (EAPCORP) of essays from the beginning and end of the programme covering two separate years, in order to identify and measure written linguistic feature development. A multidimensional investigative approach underpins both of the two main analytical tools applied to the EAPCORP, with the basic premise that it is possible to identify register differences between different types of language by the assemblage and analysis of a large number of textual features. Firstly, Coh-Metrix is a programme employing a range of algorithms applied to a series of data bases to analyse the linguistic structure of texts. Secondly, MAT (Multidimensional Analysis Tagger) employs algorithms developed by Douglas Biber and uses an automated text tagger. The analyses suggest strongly that there has been progression from the initial production of a high frequency of features characteristic of speech to that more typical of academic writing. The results emphasise the importance of well-designed EAP programmes especially in uncertain economic contexts

    An Automatic Modern Standard Arabic Text Simplification System: A Corpus-Based Approach

    Get PDF
    This thesis brings together an overview of Text Readability (TR) about Text Simplification (TS) with an application of both to Modern Standard Arabic (MSA). It will present our findings on using automatic TR and TS tools to teach MSA, along with challenges, limitations, and recommendations about enhancing the TR and TS models. Reading is one of the most vital tasks that provide language input for communication and comprehension skills. It is proved that the use of long sentences, connected sentences, embedded phrases, passive voices, non- standard word orders, and infrequent words can increase the text difficulty for people with low literacy levels, as well as second language learners. The thesis compares the use of sentence embeddings of different types (fastText, mBERT, XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. The accuracy of the 3-way CEFR (The Common European Framework of Reference for Languages Proficiency Levels) classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification, respectively and 0.71 Spearman correlation for the regression task. At the same time, the binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for the sentence-pair semantic similarity classifier. TS is an NLP task aiming to reduce the linguistic complexity of the text while maintaining its meaning and original information (Siddharthan, 2002; Camacho Collados, 2013; Saggion, 2017). The simplification study experimented using two approaches: (i) a classification approach and (ii) a generative approach. It then evaluated the effectiveness of these methods using the BERTScore (Zhang et al., 2020) evaluation metric. The simple sentences produced by the mT5 model achieved P 0.72, R 0.68 and F-1 0.70 via BERTScore while combining Arabic- BERT and fastText achieved P 0.97, R 0.97 and F-1 0.97. To reiterate, this research demonstrated the effectiveness of the implementation of a corpus-based method combined with extracting extensive linguistic features via the latest NLP techniques. It provided insights which can be of use in various Arabic corpus studies and NLP tasks such as translation for educational purposes

    Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

    Get PDF
    Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding

    Workshop Proceedings of the 12th edition of the KONVENS conference

    Get PDF
    The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years
    corecore