9 research outputs found

    AutoScor: An Automated System for Essay Questions Scoring

    Get PDF
    The automated scoring or evaluation for written student responses have been, and are still a highly interesting topic for both education and natural language processing, NLP, researchers alike. With the obvious motivation of the difficulties teachers face when marking or correcting open essay questions; the development of automatic scoring methods have recently received much attention. In this paper, we developed and compared number of NLP techniques that accomplish this task. The baseline for this study is based on a vector space model, VSM. Where after normalisation, the baseline-system represents each essay by a vector, and subsequently calculates its score using the cosine similarity between it and the vector of the model answer. This baseline is then compared with the improved model, which takes the document structure into account. To evaluate our system, we used real essays that submitted for computer science course. Each essay was independently scored by two teachers, which we used as our gold standard. The systems’ scoring was then compared to both teachers. A high emphasis was added to the evaluation when the two human assessors are in agreement. The systems’ results show a high and promising performance

    English speaking proficiency assessment using speech and electroencephalography signals

    Get PDF
    In this paper, the English speaking proficiency level of non-native English speakerswas automatically estimated as high, medium, or low performance. For this purpose, the speech of 142 non-native English speakers was recorded and electroencephalography (EEG) signals of 58 of them were recorded while speaking in English. Two systems were proposed for estimating the English proficiency level of the speaker; one used 72 audio features, extracted from speech signals, and the other used 112 features extracted from EEG signals. Multi-class support vector machines (SVM) was used for training and testing both systems using a cross-validation strategy. The speech-based system outperformed the EEG system with 68% accuracy on 60 testing audio recordings, compared with 56% accuracy on 30 testing EEG recordings

    Shallow Analysis Based Assessment of Syntactic Complexity for Automated Speech Scoring

    Get PDF
    Abstract Designing measures that capture various aspects of language ability is a central task in the design of systems for automatic scoring of spontaneous speech. In this study, we address a key aspect of language proficiency assessment -syntactic complexity. We propose a novel measure of syntactic complexity for spontaneous speech that shows optimum empirical performance on real world data in multiple ways. First, it is both robust and reliable, producing automatic scores that agree well with human rating compared to the stateof-the-art. Second, the measure makes sense theoretically, both from algorithmic and native language acquisition points of view

    J Acquir Immune Defic Syndr

    Get PDF
    African Americans and Hispanics in the United States have much higher rates of HIV than non-minorities. There is now strong evidence that a range of behavioral interventions are efficacious in reducing sexual risk behavior in these populations. Although a handful of these programs are just beginning to be disseminated widely, we still have not implemented effective programs to a level that would reduce the population incidence of HIV for minorities. We proposed that innovative approaches involving computational technologies be explored for their use in both developing new interventions and in supporting wide-scale implementation of effective behavioral interventions. Mobile technologies have a place in both of these activities. First, mobile technologies can be used in sensing contexts and interacting to the unique preferences and needs of individuals at times where intervention to reduce risk would be most impactful. Second, mobile technologies can be used to improve the delivery of interventions by facilitators and their agencies. Systems science methods including social network analysis, agent-based models, computational linguistics, intelligent data analysis, and systems and software engineering all have strategic roles that can bring about advances in HIV prevention in minority communities. Using an existing mobile technology for depression and 3 effective HIV prevention programs, we illustrated how 8 areas in the intervention/implementation process can use innovative computational approaches to advance intervention adoption, fidelity, and sustainability.P20 MH090318/MH/NIMH NIH HHS/United StatesP20MH090318/MH/NIMH NIH HHS/United StatesP30 AI050409/AI/NIAID NIH HHS/United StatesP30 AI073961/AI/NIAID NIH HHS/United StatesP30 DA027828/DA/NIDA NIH HHS/United StatesP30 MH074678/MH/NIMH NIH HHS/United StatesP30AI050409/AI/NIAID NIH HHS/United StatesP30AI073961/AI/NIAID NIH HHS/United StatesP30DA027828/DA/NIDA NIH HHS/United StatesP30MH074678/MH/NIMH NIH HHS/United StatesR01 DA025192/DA/NIDA NIH HHS/United StatesR01 DA030452/DA/NIDA NIH HHS/United StatesR01 MH066302/MH/NIMH NIH HHS/United StatesR01DA025192/DA/NIDA NIH HHS/United StatesR01DA030452/DA/NIDA NIH HHS/United StatesR01MH066302/MH/NIMH NIH HHS/United StatesR13 HD074468/HD/NICHD NIH HHS/United StatesR13 MH-081733-01A1/MH/NIMH NIH HHS/United StatesR13 MH081733/MH/NIMH NIH HHS/United StatesU01-PS000671/PS/NCHHSTP CDC HHS/United StatesUL1 TR000150/TR/NCATS NIH HHS/United StatesUL1 TR000460/TR/NCATS NIH HHS/United StatesUL1TR000460/TR/NCATS NIH HHS/United States2014-06-01T00:00:00Z23673892PMC374676

    A computational future for preventing HIV in minority communities: How advanced technology can improve implementation of effective programs

    Get PDF
    Abstract African Americans and Hispanics in the U.S. have much higher rates of HIV than non-minorities. There is now strong evidence that a range of behavioral interventions are efficacious in reducing sexual risk behavior in these populations. While a handful of these programs are just beginning to be disseminated widely, we still have not implemented effective programs to a level that would reduce the population incidence of HIV for minorities. We propose that innovative approaches involving computational technologies be explored for their use in both developing new interventions as well as in supporting wide-scale implementation of effective behavioral interventions. Mobile technologies have a place in both of these activities. First, mobile technologies can be used in sensing contexts and interacting to the unique preferences and needs of individuals at times where intervention to reduce risk would be most impactful. Secondly, mobile technologies can be used to improve the delivery of interventions by facilitators and their agencies. Systems science methods, including social network analysis, agent based models, computational linguistics, intelligent data analysis, and systems and software engineering all have strategic roles that can bring about advances in HIV prevention in minority communities. Using an existing mobile technology for depression and three effective HIV prevention programs, we illustrate how eight areas in the intervention/implementation process can use innovative computational approaches to advance intervention adoption, fidelity, and sustainability

    Using Ontology-Based Approaches to Representing Speech Transcripts for Automated Speech Scoring

    Get PDF
    Text representation is a process of transforming text into some formats that computer systems can use for subsequent information-related tasks such as text classification. Representing text faces two main challenges: meaningfulness of representation and unknown terms. Research has shown evidence that these challenges can be resolved by using the rich semantics in ontologies. This study aims to address these challenges by using ontology-based representation and unknown term reasoning approaches in the context of content scoring of speech, which is a less explored area compared to some common ones such as categorizing text corpus (e.g. 20 newsgroups and Reuters). From the perspective of language assessment, the increasing amount of language learners taking second language tests makes automatic scoring an attractive alternative to human scoring for delivering rapid and objective scores of written and spoken test responses. This study focuses on the speaking section of second language tests and investigates ontology-based approaches to speech scoring. Most previous automated speech scoring systems for spontaneous responses of test takers assess speech by primarily using acoustic features such as fluency and pronunciation, while text features are less involved and exploited. As content is an integral part of speech, the study is motivated by the lack of rich text features in speech scoring and is designed to examine the effects of different text features on scoring performance. A central question to the study is how speech transcript content can be represented in an appropriate means for speech scoring. Previously used approaches from essay and speech scoring systems include bag-of-words and latent semantic analysis representations, which are adopted as baselines in this study; the experimental approaches are ontology-based, which can help improving meaningfulness of representation units and estimating importance of unknown terms. Two general domain ontologies, WordNet and Wikipedia, are used respectively for ontology-based representations. In addition to comparison between representation approaches, the author analyzes which parameter option leads to the best performance within a particular representation. The experimental results show that on average, ontology-based representations slightly enhances speech scoring performance on all measurements when combined with the bag-of-words representation; reasoning of unknown terms can increase performance on one measurement (cos.w4) but decrease others. Due to the small data size, the significance test (t-test) shows that the enhancement of ontology-based representations is inconclusive. The contributions of the study include: 1) it examines the effects of different representation approaches on speech scoring tasks; 2) it enhances the understanding of the mechanisms of representation approaches and their parameter options via in-depth analysis; 3) the representation methodology and framework can be applied to other tasks such as automatic essay scoring

    Modeling statistics ITAs’ speaking performances in a certification test

    Get PDF
    In light of the ever-increasing capability of computer technology and advancement in speech and natural language processing techniques, automated speech scoring of constructed responses is gaining popularity in many high-stakes assessment and low-stakes educational settings. Automated scoring is a highly interdisciplinary and complex subject, and there is much unknown about the strengths and weaknesses of automated speech scoring systems (Evanini & Zechner, 2020). Research in automated speech scoring has been centralized around a few proprietary systems owned by large testing companies. Consequently, existing systems only serve large-scale standardized assessment purposes. Application of automated scoring technologies in local assessment contexts is much desired but rarely realized because the system’s inner workings have remained unfamiliar to many language assessment professionals. Moreover, assumptions about the reliability of human scores, on which automated scoring systems are trained, are untenable in many local assessment situations, where a myriad of factors would work together to co-determine the human scores. These factors may include the rating design, the test takers’ abilities, and the raters’ specific rating behaviors (e.g., severity/leniency, internal consistency, and application of the rating scale). In an attempt to apply automated scoring procedures to a local context, the primary purpose of this study is to develop and evaluate an appropriate automated speech scoring model for a local certification test of international teaching assistants (ITAs). To meet this goal, this study first implemented feature extraction and selection based on existing automated speech scoring technologies and the scoring rubric of the local speaking test. Then, the reliability of the human ratings was investigated based on both Classical Test Theory (CTT) and Item Response Theory (IRT) frameworks, focusing on detecting potential rater effects that could negatively impact the quality of the human scores. Finally, by experimenting and comparing a series of statistical modeling options, this study investigated the extent to which the association between the automatically extracted features and the human scores could be statistically modeled to offer a mechanism that reflects the multifaceted nature of the performance assessment in a unified statistical framework. The extensive search for the speech or linguistic features, covering the sub-domains of fluency, pronunciation, rhythm, vocabulary, grammar, content, and discourse cohesion, revealed that a small set of useful variables could be identified. A large number of features could be effectively summarized as single latent factors that showed reasonably high associations with the human scores. Reliability analysis of human scoring indicated that both inter-rater reliability and intra-rater reliability were acceptable, and through a fine-grained IRT analysis, several raters who were prone to the central tendency or randomness effects were identified. Model fit indices, model performance in prediction, and model diagnostics results in the statistical modeling indicated that the most appropriate approach to model the relationship between the features and the final human scores was a cumulative link model (CLM). In contrast, the most appropriate approach to model the relationship between the features and the ratings from the multiple raters was a cumulative link mixed model (CLMM). These models suggested that higher ability levels were significantly related to the lapse of time, faster speech with fewer disfluencies, more varied and sophisticated vocabulary, more complex syntactic structures, and fewer rater effects. Based on the model’s prediction on unseen data, the rating-level CLMM achieved an accuracy of 0.64, a Pearson correlation of 0.58, and a quadratically-weighted kappa of 0.57, as compared to the human ratings on the 3-point scale. Results from this study could be used to inform the development, design, and implementation for a prototypical automated scoring system for prospective ITAs, as well as providing empirical evidence for future scale development, rater training, and support for assessment-related instruction for the testing program and diagnostic feedback for the ITA test takers

    Exploring content features for automated speech scoring

    No full text
    Most previous research on automated speech scoring has focused on restricted, predictable speech. For automated scoring of unrestricted spontaneous speech, speech proficiency has been evaluated primarily on aspects of pronunciation, fluency, vocabulary and language usage but not on aspects of content and topicality. In this paper, we explore features representing the accuracy of the content of a spoken response. Content features are generated using three similarity measures, including a lexical matching method (Vector Space Model) and two semantic similarity measures (Latent Semantic Analysis and Pointwise Mutual Information). All of the features exhibit moderately high correlations with human proficiency scores on human speech transcriptions. The correlations decrease somewhat due to recognition errors when evaluated on the output of an automatic speech recognition system; however, the additional use of word confidence scores can achieve correlations at a similar level as for human transcriptions.
    corecore