Search CORE

176,053 research outputs found

Assessment and Active Learning Strategies for Introductory Geology Courses

Author
Publication venue: Journal of Geoscience Education (JGE), National Association of Geoscience Teachers (NAGT)
Publication date: 01/03/2003
Field of study

Educational research findings suggest that instructors can foster the growth of thinking skills and promote science literacy by incorporating active learning strategies into the classroom. This paper describes a variety of such strategies that may be adopted in introductory geology courses to encourage the development of higher-order thinking skills, and provides directions for implementing these techniques in the classroom. It discusses six hierarchical levels of student learning and links them to examples of appropriate assessment tools that were used successfully in several sections of a general education Earth Science course taught by two instructors at the University of Akron. These teaching strategies have been evaluated qualitatively using peer reviews, student written evaluations and semistructured student interviews; and quantitatively by measuring improvements in student retention, exam scores, and scores on a logical thinking assessment instrument. Educational levels: Graduate or professional

Digital Library for Earth System Education

Personality Testing in the Church of Scientology: \ud Implications for Outcome Research\ud

Author: Wolfe John
Publication venue
Publication date: 14/08/2013
Field of study

Many fields of modern society require scientific proof of effectiveness before new methods can be widely accepted, as in clinical trials for new drugs, educational evaluation for teaching approaches, and outcome studies for psychological interventions. Previous outcome studies on the results from Scientology services are reviewed and found to be inconclusive. The paper is devoted to the question of whether the existing data base of several thousand case histories could be used for outcome studies. The existing data contain personality test scores on the Oxford Capacity Analysis (OCA) administered before and after scientology services. A detailed analysis of the OCA demonstrates that it was derived from the Johnson Temperament Analysis (JTA), a psychological test of poorly documented validity, by paraphrasing its items, copying its scoring weights and transforming its test norms, with some alterations. It was concluded that the OCA is presently unsuitable for outcome studies, but that this situation could change if additional research could demonstrate that the OCA had validities comparable to other personality tests. For future use, it is recommended that an entirely new version of the OCA be constructed with completely original items, simplified scoring weights, and empirically derived norms, and that its validity and reliability be demonstrated prior to implementation.Scientology, outcomes, OCA, Oxford Capacity Analysis, validatio

CogPrints Cognitive Sciences Eprint Archive

Briefing paper : findings from an evaluation of initial assessment materials

Author
Publication venue: Further Education Development Agency
Publication date: 01/01/1999
Field of study

Digital Education Resource Archive

A comparison of integrated testlet and constructed-response question formats

Author: Shiell Ralph C.
Slepkov Aaron D.
Publication venue: 'American Physical Society (APS)'
Publication date: 13/08/2014
Field of study

Constructed-response (CR) questions are a mainstay of introductory physics textbooks and exams. However, because of time, cost, and scoring reliability constraints associated with this format, CR questions are being increasingly replaced by multiple-choice (MC) questions in formal exams. The integrated testlet (IT) is a recently-developed question structure designed to provide a proxy of the pedagogical advantages of CR questions while procedurally functioning as set of MC questions. ITs utilize an answer-until-correct response format that provides immediate confirmatory or corrective feedback, and they thus allow not only for the granting of partial credit in cases of initially incorrect reasoning, but furthermore the ability to build cumulative question structures. Here, we report on a study that directly compares the functionality of ITs and CR questions in introductory physics exams. To do this, CR questions were converted to concept-equivalent ITs, and both sets of questions were deployed in midterm and final exams. We find that both question types provide adequate discrimination between stronger and weaker students, with CR questions discriminating slightly better than the ITs. Meanwhile, an analysis of inter-rater scoring of the CR questions raises serious concerns about the reliability of the granting of partial credit when this traditional assessment technique is used in a realistic (but non optimized) setting. Furthermore, we show evidence that partial credit is granted in a valid manner in the ITs. Thus, together with consideration of the vastly reduced costs of administering IT-based examinations compared to CR-based examinations, our findings indicate that ITs are viable replacements for CR questions in formal examinations where it is desirable to both assess concept integration and to reward partial knowledge, while efficiently scoring examinations.Comment: 14 pages, 3 figures, with appendix. Accepted for publication in PRST-PER (August 2014

arXiv.org e-Print Archive

Directory of Open Access Journals

Looking Under the Hood : Tools for Diagnosing your Question Answering Engine

Author: Brianne Brown
Ellen Riloff
Eric Breck
Gideon S. Mann
Marc Light
Mats Rooth
Michael Thelen
Pranav Anand
Publication venue
Publication date: 01/01/2001
Field of study

In this paper we analyze two question answering tasks : the TREC-8 question answering task and a set of reading comprehension exams. First, we show that Q/A systems perform better when there are multiple answer opportunities per question. Next, we analyze common approaches to two subproblems: term overlap for answer sentence identification, and answer typing for short answer extraction. We present general tools for analyzing the strengths and limitations of techniques for these subproblems. Our results quantify the limitations of both term overlap and answer typing to distinguish between competing answer candidates.Comment: Revision of paper appearing in the Proceedings of the Workshop on Open-Domain Question Answerin

arXiv.org e-Print Archive

CiteSeerX

How to Evaluate your Question Answering System Every Day and Still Get Real Work Done

Author: Breck Eric
Burger John D.
Ferro Lisa
Hirschman Lynette
House David
Light Marc
Mani Inderjeet
Publication venue
Publication date: 01/01/2000
Field of study

In this paper, we report on Qaviar, an experimental automated evaluation system for question answering applications. The goal of our research was to find an automatically calculated measure that correlates well with human judges' assessment of answer correctness in the context of question answering tasks. Qaviar judges the response by computing recall against the stemmed content words in the human-generated answer key. It counts the answer correct if it exceeds agiven recall threshold. We determined that the answer correctness predicted by Qaviar agreed with the human 93% to 95% of the time. 41 question-answering systems were ranked by both Qaviar and human assessors, and these rankings correlated with a Kendall's Tau measure of 0.920, compared to a correlation of 0.956 between human assessors on the same data.Comment: 6 pages, 3 figures, to appear in Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000

arXiv.org e-Print Archive

CiteSeerX

Variability in modified rankin scoring across a large cohort of observers

Author: Dawson J.
Lees K.R.
Quinn T.J.
Walters M.
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/01/2008
Field of study

Background and Purpose— The modified Rankin scale (mRS) is the most commonly used outcome measure in stroke trials. However, substantial interobserver variability in mRS scoring has been reported. These studies likely underestimate the variability present in multicenter clinical trials, because exploratory work has only been undertaken in single centers by a few observers, all of similar training. We examined mRS variability across a large cohort of international observers using data from a video training resource. Methods— The mRS training package includes a series of “real-life” patient interviews for grading. Training data were collected centrally and analyzed for variability using kappa statistics. We examined variability against a standard of “correct” mRS grades; examined variability by country; and for UK assessors, examined variability by center and by professional background of the observer. Results— To date, 2942 assessments from 30 countries have been submitted. Overall reliability for mRS grading has been moderate to good with substantial heterogeneity across countries. Native English language has had little effect on reliability. Within the United Kingdom, there was no significant variation by profession. Conclusion— Our results confirm interobserver variability in mRS assessment. The heterogeneity across countries is intriguing because it appears not to be related solely to language. These data highlight the need for novel strategies to improve reliability.</br&gt

Crossref

Enlighten

Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs

Author: Abujabal A.
Lu X.
Pramanik S.
Saha Roy R.
Wang Y.
Weikum G.
Publication venue
Publication date: 01/01/2019
Field of study

Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines

MPG.PuRe