176,052 research outputs found
Assessment and Active Learning Strategies for Introductory Geology Courses
Educational research findings suggest that instructors can foster the growth of thinking skills and promote science literacy by incorporating active learning strategies into the classroom. This paper describes a variety of such strategies that may be adopted in introductory geology courses to encourage the development of higher-order thinking skills, and provides directions for implementing these techniques in the classroom. It discusses six hierarchical levels of student learning and links them to examples of appropriate assessment tools that were used successfully in several sections of a general education Earth Science course taught by two instructors at the University of Akron. These teaching strategies have been evaluated qualitatively using peer reviews, student written evaluations and semistructured student interviews; and quantitatively by measuring improvements in student retention, exam scores, and scores on a logical thinking assessment instrument. Educational levels: Graduate or professional
Personality Testing in the Church of Scientology: \ud Implications for Outcome Research\ud
Many fields of modern society require scientific proof of effectiveness before new methods can be widely accepted, as in clinical trials for new drugs, educational evaluation for teaching approaches, and outcome studies for psychological interventions. Previous outcome studies on the results from Scientology services are reviewed and found to be inconclusive. The paper is devoted to the question of whether the existing data base of several thousand case histories could be used for outcome studies. The existing data contain personality test scores on the Oxford Capacity Analysis (OCA) administered before and after scientology services. A detailed analysis of the OCA demonstrates that it was derived from the Johnson Temperament Analysis (JTA), a psychological test of poorly documented validity, by paraphrasing its items, copying its scoring weights and transforming its test norms, with some alterations. It was concluded that the OCA is presently unsuitable for outcome studies, but that this situation could change if additional research could demonstrate that the OCA had validities comparable to other personality tests. For future use, it is recommended that an entirely new version of the OCA be constructed with completely original items, simplified scoring weights, and empirically derived norms, and that its validity and reliability be demonstrated prior to implementation.Scientology, outcomes, OCA, Oxford Capacity Analysis, validatio
A comparison of integrated testlet and constructed-response question formats
Constructed-response (CR) questions are a mainstay of introductory physics
textbooks and exams. However, because of time, cost, and scoring reliability
constraints associated with this format, CR questions are being increasingly
replaced by multiple-choice (MC) questions in formal exams. The integrated
testlet (IT) is a recently-developed question structure designed to provide a
proxy of the pedagogical advantages of CR questions while procedurally
functioning as set of MC questions. ITs utilize an answer-until-correct
response format that provides immediate confirmatory or corrective feedback,
and they thus allow not only for the granting of partial credit in cases of
initially incorrect reasoning, but furthermore the ability to build cumulative
question structures. Here, we report on a study that directly compares the
functionality of ITs and CR questions in introductory physics exams. To do
this, CR questions were converted to concept-equivalent ITs, and both sets of
questions were deployed in midterm and final exams. We find that both question
types provide adequate discrimination between stronger and weaker students,
with CR questions discriminating slightly better than the ITs. Meanwhile, an
analysis of inter-rater scoring of the CR questions raises serious concerns
about the reliability of the granting of partial credit when this traditional
assessment technique is used in a realistic (but non optimized) setting.
Furthermore, we show evidence that partial credit is granted in a valid manner
in the ITs. Thus, together with consideration of the vastly reduced costs of
administering IT-based examinations compared to CR-based examinations, our
findings indicate that ITs are viable replacements for CR questions in formal
examinations where it is desirable to both assess concept integration and to
reward partial knowledge, while efficiently scoring examinations.Comment: 14 pages, 3 figures, with appendix. Accepted for publication in
PRST-PER (August 2014
Looking Under the Hood : Tools for Diagnosing your Question Answering Engine
In this paper we analyze two question answering tasks : the TREC-8 question
answering task and a set of reading comprehension exams. First, we show that
Q/A systems perform better when there are multiple answer opportunities per
question. Next, we analyze common approaches to two subproblems: term overlap
for answer sentence identification, and answer typing for short answer
extraction. We present general tools for analyzing the strengths and
limitations of techniques for these subproblems. Our results quantify the
limitations of both term overlap and answer typing to distinguish between
competing answer candidates.Comment: Revision of paper appearing in the Proceedings of the Workshop on
Open-Domain Question Answerin
How to Evaluate your Question Answering System Every Day and Still Get Real Work Done
In this paper, we report on Qaviar, an experimental automated evaluation
system for question answering applications. The goal of our research was to
find an automatically calculated measure that correlates well with human
judges' assessment of answer correctness in the context of question answering
tasks. Qaviar judges the response by computing recall against the stemmed
content words in the human-generated answer key. It counts the answer correct
if it exceeds agiven recall threshold. We determined that the answer
correctness predicted by Qaviar agreed with the human 93% to 95% of the time.
41 question-answering systems were ranked by both Qaviar and human assessors,
and these rankings correlated with a Kendall's Tau measure of 0.920, compared
to a correlation of 0.956 between human assessors on the same data.Comment: 6 pages, 3 figures, to appear in Proceedings of the Second
International Conference on Language Resources and Evaluation (LREC 2000
Variability in modified rankin scoring across a large cohort of observers
<br>Background and Purpose— The modified Rankin scale (mRS) is the most commonly used outcome measure in stroke trials. However, substantial interobserver variability in mRS scoring has been reported. These studies likely underestimate the variability present in multicenter clinical trials, because exploratory work has only been undertaken in single centers by a few observers, all of similar training. We examined mRS variability across a large cohort of international observers using data from a video training resource.</br>
<br>Methods— The mRS training package includes a series of “real-life” patient interviews for grading. Training data were collected centrally and analyzed for variability using kappa statistics. We examined variability against a standard of “correct” mRS grades; examined variability by country; and for UK assessors, examined variability by center and by professional background of the observer.</br>
<br>Results— To date, 2942 assessments from 30 countries have been submitted. Overall reliability for mRS grading has been moderate to good with substantial heterogeneity across countries. Native English language has had little effect on reliability. Within the United Kingdom, there was no significant variation by profession.</br>
<br>Conclusion— Our results confirm interobserver variability in mRS assessment. The heterogeneity across countries is intriguing because it appears not to be related solely to language. These data highlight the need for novel strategies to improve reliability.</br>
Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs
Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines
- …