4,209 research outputs found

    Intra-assessor consistency in question answering

    Get PDF
    In this paper we investigate the consistency of answer assessment in a complex question answering task examining features of assessor consistency, types of answers and question type

    Estimating intra-rater reliability on an oral english proficiency test from a Bilingual Education Program

    Get PDF
    Este estudio tiene como objetivo presentar los resultados de una investigación la cual pretendía estimar el nivel de confiabilidad intra-evaluador en un examen de suficiencia oral en inglés, y determinar los diferentes factores internos y externos que afectan la consistencia del evaluador. Los participantes involucrados en el desarrollo de este estudio fueron dos profesores encargados de evaluar la sección de habla de un examen de suficiencia administrado en la Licenciatura en Bilingüismo con énfasis en inglés. Se calculó un coeficiente de correlación con el fin de establecer la consistencia de los evaluadores mientras que un protocolo verbal retrospectivo fue llevado a cabo para recopilar información acerca de los factores que influyen en la confiabilidad del evaluador. Los resultados sugieren que hay un alto nivel de confiabilidad intra-evaluador en el examen de suficiencia en cuanto el coeficiente de correlación arrojó valores superiores a .80. No obstante, aspectos relacionados con la falta de adhesión a los criterios de la rúbrica, la relación evaluador-estudiante, las condiciones físicas, y la presión y responsabilidad del evaluador para dar una nota precisa fueron identificados como factores que afectan la consistencia del evaluador. Finalmente, se proporcionaron algunas implicaciones procedentes de esta investigación

    Analysis of change in users' assessment of search results over time

    Get PDF
    We present the first systematic study of the influence of time on user judgements for rankings and relevance grades of web search engine results. The goal of this study is to evaluate the change in user assessment of search results and explore how users' judgements change. To this end, we conducted a large-scale user study with 86 participants who evaluated two different queries and four diverse result sets twice with an interval of two months. To analyse the results we investigate whether two types of patterns of user behaviour from the theory of categorical thinking hold for the case of evaluation of search results: (1) coarseness and (2) locality. To quantify these patterns we devised two new measures of change in user judgements and distinguish between local (when users swap between close ranks and relevance values) and non-local changes. Two types of judgements were considered in this study: 1) relevance on a 4-point scale, and 2) ranking on a 10-point scale without ties. We found that users tend to change their judgements of the results over time in about 50% of cases for relevance and in 85% of cases for ranking. However, the majority of these changes were local

    Dietary energy density and adiposity: employing bias adjustments in a meta-analysis of prospective studies.

    Get PDF
    BACKGROUND: Dietary studies differ in design and quality making it difficult to compare results. This study quantifies the prospective association between dietary energy density (DED) and adiposity in children using a meta-analysis method that adjusts for differences in design and quality through eliciting and incorporating expert opinion on the biases and their uncertainty. METHOD: Six prospective studies identified by a previous systematic literature search were included. Differences in study quality and design were considered respectively as internal and external biases and captured in bias checklists. Study results were converted to correlation coefficients; biases were considered either additive or proportional on this scale. The extent and uncertainty of the internal and external biases in each study were elicited in a formal process by five quantitatively-trained assessors and five subject-matter specialists. Biases for each study were combined across assessors using median pooling and results combined across studies by random-effects meta-analysis. RESULTS: The unadjusted combined correlation between DED and adiposity change was 0.06 (95%CI 0.01, 0.11; p = 0.013), but with considerable heterogeneity (I² = 52%). After bias-adjustment the pooled correlation was 0.17 (95%CI - 0.11, 0.45; p = 0.24), and the studies were apparently compatible (I² = 0%). CONCLUSIONS: This method allowed quantitative synthesis of the prospective association between DED and adiposity change in children, which is important for the development of evidence-informed policy. Bias adjustment increased the magnitude of the positive association but the widening confidence interval reflects the uncertainty of the assessed biases and implies that higher quality studies are required.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    EVALUATION OF A CUMULATIVE EXIT-FROM-DEGREE OBJECTIVE STRUCTURED CLINICAL EXAMINATION (OSCE) IN A GULF CONTEXT

    Get PDF
    This study aimed to evaluate the psychometric properties of the 2nd iteration of an Objective Structured Clinical Examination (OSCE) for graduating pharmacy students in Qatar. A secondary objective of this study was to identify quality improvement opportunities for design, implementation, and evaluation of the OSCE. The psychometric analyses occurred as follows: Cut score determination using borderline regression method; predictive validity using regression and correlation of select course grades and assessments with OSCE scores, concurrent validity using correlation between other cumulative assessments and OSCE scores, risk of bias using correlation between assessors’ analytical and global scoring, content validity using student-feedback forms, and interrater reliability using intra-class correlation coefficients (ICCs), and internal consistency using Cronbach’s alpha. Pearson and Spearman correlation statistics were conducted at α level < 0.05. A series of two focus groups and subsequent qualitative content analysis were conducted with key stakeholders to identify strengths, weaknesses, opportunities, and challenges regarding OSCE implementation. Total cut score for the exam was 55.3%. Overall pass rate was 79.2%. OSCE scores correlated moderate-strongly with course grades of Professional Skills and Integrated Case-based Learning, and formative OSCE assessments. Course grades for medicinal chemistry were not correlated with OSCE scores. OSCE scores were moderately predicted by Professional skills course grades (52.3%) and its formative OSCE assessment (61.2%). Average correlation between analytical and global grades for all assessors was 0.52. A total of 90% of the stations were deemed to reflect practice, according to student perceptions. The average intraclass correlation coefficient for analytical checklists scores, global scores, and total scores were 0.88 (0.71 – 0.95), 0.61 (0.19 – 0.82), and 0.75 (0.45 – 0.88) respectively. Cronbach’s alpha of students’ performance in global scores across stations was 0.87, and 0.93 in terms of total scores. Focus groups confirmed content validity as a weakness yet spoke to training and assessment techniques as both strengths and areas for improvement. In sum, the 2nd iteration of a cumulative OSCE for graduating pharmacy students in Qatar was deemed valid and reliable, however refinements can be implemented in future iterations to further improve the exam as a high stakes assessment

    The Feasibility, Reliability, and Validity of Using the Self-report Version of interRAI Check-Up Among Community Dwelling Older Adults

    Get PDF
    As the result of population aging around the world, the prevalence of chronic conditions is increasing. Early detection through constant monitoring is an effective strategy of minimizing the impact of chronic conditions on morbidity and mortality. However, clinician administered assessments are often not routinely completed nor done for the entire population because they require resources that may not be available. A self-report tool that can be administered by older adults and their caregivers could help achieve broader surveillance at minimal cost and contribute to enhancement of chronic disease management globally. In the meantime, as the population of cultural minorities in Canada is increasing, it will be important to examine the feasibility and acceptability of using self-report interRAI Check-Up (CU) assessment tool among older adults from different backgrounds. The study compared the experiences of older adults who electronically completed the assessment tool entirely by themselves with approaches involving the help of a lay interviewer or their informal caregiver. Also, this study evaluated the reliability and validity of data collected with self-report CU. This study concluded that CU was optimally accepted by older adults in this study. Also, the internal consistency and validity of data collected with CU is comparable to data collected by trained health professionals in Ontario using the RAI-HC among home care population

    Categorical relevance judgment

    Get PDF
    In this study we aim to explore users' behaviour when assessing search results relevance based on the hypothesis of categorical thinking. In order to investigate how users categorise search engine results, we perform several experiments where users are asked to group a list of 20 search results into a number of categories, while attaching a relevance judgment to each formed category. Moreover, to determine how users change their minds over time, each experiment was repeated three times under the same conditions, with a gap of one month between rounds. The results show that on average users form 4-5 categories. Within each round the size of a category decreases with the relevance of a category. To measure the agreement between the search engine’s ranking and the users’ relevance judgments, we defined two novel similarity measures, the average concordance and the MinMax swap ratio. Similarity is shown to be the highest for the third round as the users' opinion stabilises. Qualitative analysis uncovered some interesting points, in particular, that users tended to categorise results by type and reliability of their source, and particularly, found commercial sites less trustworthy, and attached high relevance to Wikipedia when their prior domain knowledge was limited

    Measuring facilitator competent adherence and examining its role in the outcomes of parenting programme beneficiaries: an investigation of the broader literature and the delivery of parenting for lifelong health for parents and adolescents (PLH-teens) at scale in Tanzania

    Get PDF
    Background: Implementation fidelity is a critical component of intervention science research, which aims to understand how interventions unfold in practice to improve their outcomes. A key element of fidelity is facilitator competent adherence - the extent to which a programme is delivered as prescribed with the specified level of quality. The dissertation endeavoured to better understand how to measure facilitator competent adherence and the role facilitator competent adherence plays in achieving intended parent/caregiver (parent) and child outcomes in the parenting programme literature and, specifically, within Parenting for Lifelong Health for Parents and Adolescents (PLH-Teens). PLH-Teens is a parenting programme designed to reduce violence against children and child behavioural and emotional problems in low- and middle-income countries (LMICs). The dissertation is composed of three studies – one which synthesised data from the parenting programme literature and two which analysed data from the 2020-2021 scale-up of PLH-Teens in Tanzania to 75,061 participants by community facilitators (school teachers and community health workers; N=444). Objectives: The dissertation had three objectives with each corresponding to an individual paper. The first objective was to synthesise the evidence on the relationship between observational measures of facilitator competent adherence and parent and child outcomes in the parenting programme literature. The second objective was to examine whether the observational measure of facilitator competent adherence used in the large-scale implementation of PLH-Teens in Tanzania is reliable and valid for use in research and practice and to determine the level of competent adherence with which community facilitators delivered PLH-Teens in Tanzania. The third objective was to determine the predictive validity of the observational measure of competent adherence used in PLH-Teens by examining whether competence adherence is associated with parent and adolescent outcomes. Methods: Paper 1 synthesised the results of a systematic review of studies on parenting programmes aiming to reduce violence against children and child behavioural and emotional problems to examine the associations between observational measures of facilitator competent adherence and parent and child outcomes. Due to study heterogeneity and poor reporting, Synthesis Without Meta-Analysis (SWiM) guidelines were followed. Paper 2 used 95 facilitator assessments collected by implementing partners during the 2020-2021 delivery of PLH-Teens in Tanzania. The paper evaluated the reliability and validity of the measure used to assess facilitator competent adherence in PLH-Teens - the Facilitator Assessment Tool (PLH-FAT-T). Reliability was assessed by conducting intra-rater reliability, inter-rater reliability, and internal consistency analyses using percentage agreements, intra-class correlations, Cronbach’s alphas, and omegas. Validity was assessed via consultations with stakeholders (content validity) and exploratory factor analyses (construct validity). This paper also estimated the level of competent adherence with which community facilitators delivered PLH-Teens by calculating the average PLH-FAT-T score achieved by facilitators. Paper 3 investigated the relationship between facilitator competent adherence and the pre-post outcomes of PLH-Teens participants. Analyses used 24 PLH-FAT-T assessments that could be linked to the pre-post surveys of 3,057 families. This analysis was conducted using multi-level Poisson regressions with fixed and random effects. Results: Paper 1 found 18 studies reporting on the relationship between observational measures of facilitator competent adherence and parent and child outcomes. The review found that most studies (n=13) reported a statistically significant positive relationship with at least one of the parent or child outcomes reported. However, eight studies reported inconsistent findings across outcomes. Four studies found no significant association with outcomes. Paper 2 found that the PLH-FAT-T showed strong content validity, poor to moderate intra- and inter-rater reliability, strong internal consistency, and moderate construct validity. Iterative exploratory factor analyses produced a shortened PLH-FAT-T, the PLH-FAT-T Short Form, comprised of 19 fewer items which had stronger psychometric properties. Analyses of the PLH-FAT-T Short Form found that community facilitators delivered PLH-Teens at scale in Tanzania to a high-level of competent adherence (82.3% average). Using the PLH-FAT-T Short Form, Paper 3 found that the relationship between facilitator competent adherence and outcomes was mixed with some positive, some insignificant, and some negative associations. A positive association was found between competent adherence and the primary outcome of interest, child maltreatment, as reported by adolescents. The analysis found that increased competent adherence had a positive association with two of the 12 parent-reported outcomes and seven of the 10 adolescent-reported outcomes (including child maltreatment). Yet, increased competent adherence also had a negative association with five parent-reported outcomes, as well as insignificant associations with five parent-reported outcomes and three adolescent-reported outcomes. Discussion: Paper 1 suggests that better facilitator competent adherence is generally associated with positive parent and child outcomes. However, this finding is weakened by the methodological heterogeneity of included studies and due to the wide variety of ways in which studies conceptualised competent adherence-outcome relationships. As a result, the paper reveals that there is substantial methodological work to be done in the broader parenting programme community to improve the rigour of and reporting on investigations regarding this relationship. As the amount of literature on the measurement and role of facilitator competent adherence grows in the behavioural intervention literature, the recommendations made in Paper 1 have relevance for other implementation scientists conducting and sharing studies on competent adherence. Paper 2 reports on the first psychometric evaluation of the PLH-FAT-T and is the first study of its kind to report on the fidelity achieved by facilitators during routine parenting programme delivery at scale in a low-income country. Findings suggest that the PLH-FAT-T had poor to moderate reliability and sufficient validity and that the PLH-FAT-T Short Form had stronger psychometric properties. Although the tool was stronger following iterative exploratory factor analyses, the findings indicate that further work is needed to strengthen the reliability and validity of the PLH-FAT-T Short Form. Findings also suggest that community facilitators with minimal background in and training on parenting programmes delivered PLH-Teens to a high level of quality at scale in a low-income community setting despite significant barriers. Thus, the findings of Paper 2 suggest that it may be possible for community facilitators to deliver behavioural interventions to a high level of competent adherence in low-income routine delivery settings at scale. The findings of Paper 3 are similar to the findings of Paper 1 in that Paper 3 does not provide a clear answer as to whether, and to what extent, facilitator competent adherence impacts participant outcomes. Potential explanations of the findings include the PLH-FAT-T Short Form has poor predictive validity; the PLH-FAT-T Short Form assessments were not reliable; a variety of methodological challenges may have prevented an examination of the true relationship between competent adherence and outcomes; competent adherence does not relate to outcomes in the manner theorised; competent adherence plays a less important role in the achievement of outcomes than anticipated or, at some point, plays a negative role; and only certain programme components are achieving outcomes so the PLH-FAT-T Short Form is not capturing the important aspects of programme delivery. The alignment of the findings of Papers 1 and 3 with some other systematic reviews and meta-analyses in the broader implementation science literature suggests that the role facilitator competent adherence plays in participant outcomes is not fully understood. Thus, there is reason to further investigate the theorised relationship between facilitator competent adherence and outcomes outlined in seminal implementation science theories and models to fully illuminate the inner workings of the ‘black box’ of interventions. A fuller understanding of the role that facilitator competent adherence plays in participant outcomes is essential to maximise the benefits to be reaped from evidence-based behavioural interventions. Conclusion: The dissertation provides important evidence regarding the measurement and role of facilitator competent adherence in the parenting programme literature and in Parenting for Lifelong Health. As a result, the dissertation provides a series of recommendations for the future of competent adherence monitoring in research and practice that are relevant to both the parenting programme literature and the broader implementation science literature. As parenting programmes continue to be delivered and scaled worldwide, it is intended that the findings and recommendations herein will be used to benefit both Parenting for Lifelong Health and the broader parenting programme community in the quest to maximise opportunity for vulnerable children and families globally to benefit from evidence-based parenting programmes
    corecore