176,052 research outputs found

    Assessment and Active Learning Strategies for Introductory Geology Courses

    Get PDF
    Educational research findings suggest that instructors can foster the growth of thinking skills and promote science literacy by incorporating active learning strategies into the classroom. This paper describes a variety of such strategies that may be adopted in introductory geology courses to encourage the development of higher-order thinking skills, and provides directions for implementing these techniques in the classroom. It discusses six hierarchical levels of student learning and links them to examples of appropriate assessment tools that were used successfully in several sections of a general education Earth Science course taught by two instructors at the University of Akron. These teaching strategies have been evaluated qualitatively using peer reviews, student written evaluations and semistructured student interviews; and quantitatively by measuring improvements in student retention, exam scores, and scores on a logical thinking assessment instrument. Educational levels: Graduate or professional

    Personality Testing in the Church of Scientology: \ud Implications for Outcome Research\ud

    Get PDF
    Many fields of modern society require scientific proof of effectiveness before new methods can be widely accepted, as in clinical trials for new drugs, educational evaluation for teaching approaches, and outcome studies for psychological interventions. Previous outcome studies on the results from Scientology services are reviewed and found to be inconclusive. The paper is devoted to the question of whether the existing data base of several thousand case histories could be used for outcome studies. The existing data contain personality test scores on the Oxford Capacity Analysis (OCA) administered before and after scientology services. A detailed analysis of the OCA demonstrates that it was derived from the Johnson Temperament Analysis (JTA), a psychological test of poorly documented validity, by paraphrasing its items, copying its scoring weights and transforming its test norms, with some alterations. It was concluded that the OCA is presently unsuitable for outcome studies, but that this situation could change if additional research could demonstrate that the OCA had validities comparable to other personality tests. For future use, it is recommended that an entirely new version of the OCA be constructed with completely original items, simplified scoring weights, and empirically derived norms, and that its validity and reliability be demonstrated prior to implementation.Scientology, outcomes, OCA, Oxford Capacity Analysis, validatio

    Briefing paper : findings from an evaluation of initial assessment materials

    Get PDF

    A comparison of integrated testlet and constructed-response question formats

    Full text link
    Constructed-response (CR) questions are a mainstay of introductory physics textbooks and exams. However, because of time, cost, and scoring reliability constraints associated with this format, CR questions are being increasingly replaced by multiple-choice (MC) questions in formal exams. The integrated testlet (IT) is a recently-developed question structure designed to provide a proxy of the pedagogical advantages of CR questions while procedurally functioning as set of MC questions. ITs utilize an answer-until-correct response format that provides immediate confirmatory or corrective feedback, and they thus allow not only for the granting of partial credit in cases of initially incorrect reasoning, but furthermore the ability to build cumulative question structures. Here, we report on a study that directly compares the functionality of ITs and CR questions in introductory physics exams. To do this, CR questions were converted to concept-equivalent ITs, and both sets of questions were deployed in midterm and final exams. We find that both question types provide adequate discrimination between stronger and weaker students, with CR questions discriminating slightly better than the ITs. Meanwhile, an analysis of inter-rater scoring of the CR questions raises serious concerns about the reliability of the granting of partial credit when this traditional assessment technique is used in a realistic (but non optimized) setting. Furthermore, we show evidence that partial credit is granted in a valid manner in the ITs. Thus, together with consideration of the vastly reduced costs of administering IT-based examinations compared to CR-based examinations, our findings indicate that ITs are viable replacements for CR questions in formal examinations where it is desirable to both assess concept integration and to reward partial knowledge, while efficiently scoring examinations.Comment: 14 pages, 3 figures, with appendix. Accepted for publication in PRST-PER (August 2014

    Looking Under the Hood : Tools for Diagnosing your Question Answering Engine

    Full text link
    In this paper we analyze two question answering tasks : the TREC-8 question answering task and a set of reading comprehension exams. First, we show that Q/A systems perform better when there are multiple answer opportunities per question. Next, we analyze common approaches to two subproblems: term overlap for answer sentence identification, and answer typing for short answer extraction. We present general tools for analyzing the strengths and limitations of techniques for these subproblems. Our results quantify the limitations of both term overlap and answer typing to distinguish between competing answer candidates.Comment: Revision of paper appearing in the Proceedings of the Workshop on Open-Domain Question Answerin

    How to Evaluate your Question Answering System Every Day and Still Get Real Work Done

    Full text link
    In this paper, we report on Qaviar, an experimental automated evaluation system for question answering applications. The goal of our research was to find an automatically calculated measure that correlates well with human judges' assessment of answer correctness in the context of question answering tasks. Qaviar judges the response by computing recall against the stemmed content words in the human-generated answer key. It counts the answer correct if it exceeds agiven recall threshold. We determined that the answer correctness predicted by Qaviar agreed with the human 93% to 95% of the time. 41 question-answering systems were ranked by both Qaviar and human assessors, and these rankings correlated with a Kendall's Tau measure of 0.920, compared to a correlation of 0.956 between human assessors on the same data.Comment: 6 pages, 3 figures, to appear in Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000

    Variability in modified rankin scoring across a large cohort of observers

    Get PDF
    <br>Background and Purpose— The modified Rankin scale (mRS) is the most commonly used outcome measure in stroke trials. However, substantial interobserver variability in mRS scoring has been reported. These studies likely underestimate the variability present in multicenter clinical trials, because exploratory work has only been undertaken in single centers by a few observers, all of similar training. We examined mRS variability across a large cohort of international observers using data from a video training resource.</br> <br>Methods— The mRS training package includes a series of “real-life” patient interviews for grading. Training data were collected centrally and analyzed for variability using kappa statistics. We examined variability against a standard of “correct” mRS grades; examined variability by country; and for UK assessors, examined variability by center and by professional background of the observer.</br> <br>Results— To date, 2942 assessments from 30 countries have been submitted. Overall reliability for mRS grading has been moderate to good with substantial heterogeneity across countries. Native English language has had little effect on reliability. Within the United Kingdom, there was no significant variation by profession.</br> <br>Conclusion— Our results confirm interobserver variability in mRS assessment. The heterogeneity across countries is intriguing because it appears not to be related solely to language. These data highlight the need for novel strategies to improve reliability.</br&gt

    Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs

    No full text
    Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines
    • …
    corecore