5,868 research outputs found

    RACE: Large-scale ReAding Comprehension Dataset From Examinations

    Full text link
    We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the students' ability in understanding and reasoning. In particular, the proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models (43%) and the ceiling human performance (95%). We hope this new dataset can serve as a valuable resource for research and evaluation in machine comprehension. The dataset is freely available at http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at https://github.com/qizhex/RACE_AR_baselines.Comment: EMNLP 201

    A College Entrance Essay Exam Intervention for Students with Disabilities and Struggling Writers: A Randomized Control Trial

    Get PDF
    abstract: High school students with high-incidence disabilities and struggling writers face considerable challenges when taking high-stakes writing assessments designed to examine their suitability for entrance to college. I examined the effectiveness of a writing intervention for improving these students’ performance on a popular college entrance exam, the writing assessment for the ACT. Students were taught a planning and composing strategy for successfully taking this test using the Self-Regulated Strategy Development (SRSD) model. A randomized control trial was conducted where 20 high school students were randomly assigned to a treatment (N = 10) or control (N = 10) condition. Control students received ACT math preparation. SRSD instruction statistically enhanced students’ planning, the quality of their written text (including ideas and analysis, development and support, organization, and language use), the inclusion of argumentative elements in their compositions, and the use of transition words in written text. Limitations of the study, future research, and implications for practice are discussed.Dissertation/ThesisDoctoral Dissertation Learning, Literacies and Technologies 201

    Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks

    Full text link
    Artificial agents today can answer factual questions. But they fall short on questions that require common sense reasoning. Perhaps this is because most existing common sense databases rely on text to learn and represent knowledge. But much of common sense knowledge is unwritten - partly because it tends not to be interesting enough to talk about, and partly because some common sense is unnatural to articulate in text. While unwritten, it is not unseen. In this paper we leverage semantic common sense knowledge learned from images - i.e. visual common sense - in two textual tasks: fill-in-the-blank and visual paraphrasing. We propose to "imagine" the scene behind the text, and leverage visual cues from the "imagined" scenes in addition to textual cues while answering these questions. We imagine the scenes as a visual abstraction. Our approach outperforms a strong text-only baseline on these tasks. Our proposed tasks can serve as benchmarks to quantitatively evaluate progress in solving tasks that go "beyond recognition". Our code and datasets are publicly available

    Less Subjectivity in Setting Cut Scores: A Novel Approach

    Get PDF
    Recently, standard-setting cut scores and assessment techniques became of major concerns for many organizational institutions worldwide. A cut score separates one performance level from another. It differentiates between those who pass and those who fail. They may vary according to the recommendations of policy makers and stakeholders. Passing scores were suggested by many methods on numerous types of tests: certification tests and educational tests. Most of these standard setting methods rely on panelists’ subjectivity in ordering items by level of difficulty. This paper presents a simple approach to assessments by minimizing considerably panelists’ subjectivity. Items are classified in levels of difficulties rather than in an increasing order in most of the standard methods. This novel approach respond to three main criteria: practicality, wide range of applicability and maximum agreement with the empirical data. Provisional and operational cut scores were derived and discussed

    Essays in Behavioral Economics

    Get PDF
    In chapter one, I propose a model consolidating the norm- and preferences-based approaches to explain laboratory bargaining outcomes. Social norms are identified by the axioms of cooperative bargaining theory, and other-regarding preferences are captured using Fehr and Schmidt\u27s inequity aversion utility function. The model applies to bargaining situations where other-regarding agents abide by social norms in their decision-making. Preferences and norms interact to determine bargaining outcomes, and their interaction undermines the recoverability of the other-regarding preference parameters based on observations from the lab. In chapter two, I employ a lab experiment to study whether men receive lucrative tasks more often than equally capable women so that a gender pay gap arises due to the difference in the earnings potential. Subjects allocate a standard task and a lucrative task between two workers, knowing their past performance, task preference, and sex. I find that men receive the lucrative task more often than women, but past performance and a gender difference in task preference account for the difference. Many workers shy away from the challenging yet lucrative task, suggesting that a psychic cost may arise when the tasks are challenging. Managers choose the efficient task allocation less often when the workers\u27 preferences go against rather than with their money-incentive. The result suggests that managers show concern for the subjective utilities of the workers

    English in the catalan baccalaureate system : a study of a possible washback effect from the university-entrance exam

    Get PDF
    Testing and the resulting grades are currently of great importance since they can determine the future life of a student. The term 'washback' studies the effect testing has on several aspects since the 1980s. However, more research is needed in order to better understand this phenomenon and into crease its familiarity among the educational environment. The aim of this TFG study is to provide a description of washback and to consider how testing influences the Catalan educational system, particularly in the second year of the baccalaureate, when students are about to take the Spanish University-Entrance Examination (SUEE) that will determine their future academic life. This dissertation will focus on the English Test (ET) set by the SUEE and how it affects curriculum, materials and teaching methodology in the final year of secondary-school education. In order to carry out this study, a questionnaire responded to by English teachers will provide data to determine whether there is a washback effect on the above-mentioned aspects, given that curriculum, materials and teaching methodology appear to be constantly influenced by the ET.Actualment, dur a terme exàmens i les seves respectives qualificacions tenen una gran importància ja que poden determinar el futur dels estudiants. El terme "washback" estudia l'efecte que té l'avaluació sobre diferents aspectes des dels anys 80. Tot i així, es necessita més investigació per tal d'entendre millor aquest fenomen i estendre el coneixement dins de l'entorn educatiu. L'objectiu d'aquest treball és aportar una descripció del terme "washback" i considerar com el fet d'examinar influencia dins del sistema educatiu català, especialment al segon curs de Batxillerat, quan els estudiants estan apunt de dur a terme la Selectivitat, que determinarà el seu futur acadèmic. Aquesta investigació es centrarà en l'examen d'anglès de la Selectivitat i com aquest afecta al currículum, als materials i a la metodologia d'ensenyament a l'últim any de Batxillerat. Per tal de dur a terme aquest estudi, un qüestionari contestat per professorat d'anglès aportarà dades que determinaran si existeix un efecte "washback" als aspectes mencionats anteriorment, donat que el currículum, els materials i la metodologia d'ensenyament semblen ser constantment influenciades per l'examen d'anglès de la Selectivitat.Actualmente, llevar a cabo exámenes y sus respectivas calificaciones tienen una gran importancia, ya que pueden determinar el futuro de los estudiantes. El término "washback" estudia el efecto que tiene la evaluación sobre diferentes aspectos des de los años 80. Aún así, se necesita más investigación para entender mejor dicho fenómeno y entender el conocimiento dentro del entorno educativo. El objetivo de este trabajo es aportar una descripción del término "washback" y considerar como el hecho de examinar influencia en el sistema educativo catalán, especialmente en el último año de Bachillerato, cuando los estudiantes están a punto de realizar la Selectividad, que determinará su futuro académico. Esta investigación se centrará en el examen de inglés de la Selectividad y como este afecta al currículum, materiales y a la metodología de la enseñanza en el último año de Bachillerato. Para ejecutar este estudio, un cuestionario contestado por profesorado de inglés aportará datos que determinarán si existe un efecto "washback" sobre los aspectos mencionados anteriormente, dado que el currículum, materiales y metodología de enseñanza parecen estar constantemente relacionadas por el examen de inglés de la Selectividad

    Combining independent modules to solve multiple-choice synonym and analogy problems

    Get PDF
    Existing statistical approaches to natural language problems are very coarse approximations to the true complexity of language processing. As such, no single technique will be best for all problem instances. Many researchers are examining ensemble methods that combine the output of successful, separately developed modules to create more accurate solutions. This paper examines three merging rules for combining probability distributions: the well known mixture rule, the logarithmic rule, and a novel product rule. These rules were applied with state-of-the-art results to two problems commonly used to assess human mastery of lexical semantics -- synonym questions and analogy questions. All three merging rules result in ensembles that are more accurate than any of their component modules. The differences among the three rules are not statistically significant, but it is suggestive that the popular mixture rule is not the best rule for either of the two problems
    • …
    corecore