6,371 research outputs found

    Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? (CORRECTED VERSION)

    Full text link
    In the context of depth-kk pooling for constructing web search test collections, we compare two approaches to ordering pooled documents for relevance assessors: the prioritisation strategy (PRI) used widely at NTCIR, and the simple randomisation strategy (RND). In order to address research questions regarding PRI and RND, we have constructed and released the WWW3E8 data set, which contains eight independent relevance labels for 32,375 topic-document pairs, i.e., a total of 259,000 labels. Four of the eight relevance labels were obtained from PRI-based pools; the other four were obtained from RND-based pools. Using WWW3E8, we compare PRI and RND in terms of inter-assessor agreement, system ranking agreement, and robustness to new systems that did not contribute to the pools. We also utilise an assessor activity log we obtained as a byproduct of WWW3E8 to compare the two strategies in terms of assessment efficiency.Comment: 30 pages. This is a corrected version of an open-access TOIS paper ( https://dl.acm.org/doi/pdf/10.1145/3494833

    Extended scope of nursing practice: a multicentre randomised controlled trial of appropriately trained nurses and pre-registration house officers in pre-operative assessment in elective general surgery

    No full text
    Aim/ Principal Research Question:1) To determine whether pre-operative assessment carried out by an appropriately trained nurse (ATN) is equivalent in quality to that carried out by a pre-registration house officer (PRHO).2) To assess whether pre-assessments carried out by ATNs and PRHOs are equivalent in terms of cost.3) To determine whether assessments carried out by ATNs are acceptable to patients.4) To investigate the quality of communication between senior medical staff and ATNs.Factors of Interest:The extended role of appropriately trained nurses and pre-registration house officers in pre-operative assessment in elective general surgery.Methods:The study design was principally a prospective randomised equivalence trial but was accompanied by additional qualitative assessment of patient and staff perceptions, and an economic evaluation.The intervention consisted of a pre-operative assessment carried out by either an ATN or a PRHO. Of the patients who completed the study with a full evaluation, 926 patients were randomised to the PRHO arm of the trial and 948 to the ATN arm. Three ATNs took part in the study, one from each centre, together with a total of 87 PRHOs.Immediately following the initial assessment of a patient by a PRHO or an ATN, one of a number of clinical research fellows, all specialist registrars in anaesthetics, repeated the assessment and recorded it on a study form, together with a list of investigations required. The clinical research fellow then evaluated the competency of the initial assessor by comparing the quality of their assessment with their own. Any deficiencies in ordering of investigations and referral to other specialities were met in order to maximise patient care.Sample groups:All patients attending at one site for assessment prior to general anaesthetic for elective general, vascular, urological or breast surgery were potentially included in the study. Of 1907 patients who were randomised, 1874 completed the study with a full evaluation.The study was carried out at four NHS hospitals, three of which were teaching hospitals, in three NHS Trusts in Southampton, Sheffield and Doncaster.Outcome measures:Three areas of ATN and PRHO performance were judged separately, history taking, examination and ordering of tests, and each was graded into one of four categories, the most important of which was under-assessment, which would possibly have affected peri-operative management. In the case of ordering of tests, it was possible to have both over- and under-assessed a patient on different tests.Findings:The pre-operative assessments carried out by the ATNs were essentially equivalent to those performed by the PRHOs in terms of under-assessment that might possibly have affected peri-operative management, although there was variation between the ATNs in terms of the quality of history taking. This may be related to the low number of patients seen at one study site.PRHOs ordered significantly more unnecessary tests than the ATNs. The substitution of ATNs for PRHOs was calculated to be cost neutral.The results of the qualitative assessment showed that the use of ATNs for pre-operative assessment was acceptable to patients; however, there was no evidence that communication between senior medical staff and those carrying out pre-operative assessments was improved by their introduction.Conclusions:This study demonstrated no reason to inhibit the development of fully nurse-led pre-operative assessment, provided that the nurses are appropriately trained and maintain sufficient workload to retain skills.Implications for Further Research:Further research is needed in the following areas:1) the extent and type of training needed for nurses undertaking the pre-operative assessment role2) the use, costs and benefits of routine pre-operative testing.<br/

    Beyond Triage: The Diagnostic Accuracy of Emergency Department Nursing Staff Risk Assessment in Patients with Suspected Acute Coronary Syndromes.

    Get PDF
    Objectives To establish the accuracy of emergency department (ED) nursing staff risk assessment using an established chest pain risk score alone and when incorporated with presentation high-sensitivity troponin testing as part of an accelerated diagnostic protocol (ADP). Design Prospective observational study comparing nursing and physician risk assessment using the modified Goldman (m-Goldman) score and a predefined ADP, incorporating presentation high-sensitivity troponin. Setting A UK District ED. Patients Consecutive patients, aged 7ge;18, with suspected cardiac chest pain and non-ischaemic ECG, for whom the treating physician determined serial troponin testing was required. Outcome measures 30-day major adverse cardiac events (MACE). Results 960 participants were recruited. 912/960 (95.0%) had m-Goldman scores recorded by physicians and 745/960 (77.6%) by nursing staff. The area under the curve of the m-Goldman score in predicting 30-day MACE was 0.647 (95% CI 0.594 to 0.700) for physicians and 0.572 (95% CI 0.510 to 0.634) for nursing staff ( p=0.09). When incorporated into an ADP, sensitivity for the rule-out of MACE was 99.2% (95% CI 94.8% to 100%) and 96.7% (90.3% to 99.2%) for physicians and nurses, respectively. One patient in the physician group (0.3%) and three patients (1.1%) in the nursing group were classified as low risk yet had MACE. There was fair agreement in the identification of low-risk patients (kappa 0.31, 95% CI 0.24 to 0.38). Conclusions The diagnostic accuracy of ED nursing staff risk assessment is similar to that of ED physicians and interobserver reliability between assessor groups is fair. When incorporating high-sensitivity troponin testing, a nurse-led ADP has a miss rate of 1.1% for MACE at 30 days. Trial registration number Controlled Trials Database (ISRCTN no. 21109279)

    Guidance on the key skills units : communication, application of number and information technology

    Get PDF

    The Effects of Training Strategies on Assessor Behavior and the Accuracy of Assessment Center Consensus Ratings

    Get PDF
    The purpose of this research was to examine the effects of four training strategies (e.g., part, whole, individual, and team) on the accuracy of performance ratings and the occurrence of interactive behaviors in consensus meetings. The results were analyzed using a 2 x 2 factorial ANOVA design. Part and whole training strategies were directly compared with one another. Team and individual training strategies made up the other direct comparison. Undergraduates (N = 108) were randomly assigned to four training conditions. The subjects were grouped into teams of three assessors. In these teams the assessors needed to exchange information about assessee performance across three assessment center exercises and form dimension and overall ratings for four experimental assessees. The rating accuracy results indicated that (a) no differences in rating accuracy existed between part and whole training, (b) team training led to more accurate final ratings than individual training, and (c) the Whole-Team training condition led to more accurate overall assessment ratings than the remaining three conditions. Reasons for the superiority of team training stem from the higher frequency of interactive behaviors observed in the team training condition. Further explanations for the findings and suggestions for future research are discussed

    Hear Me Out: A Study on the Use of the Voice Modality for Crowdsourced Relevance Assessments

    Full text link
    The creation of relevance assessments by human assessors (often nowadays crowdworkers) is a vital step when building IR test collections. Prior works have investigated assessor quality & behaviour, though into the impact of a document's presentation modality on assessor efficiency and effectiveness. Given the rise of voice-based interfaces, we investigate whether it is feasible for assessors to judge the relevance of text documents via a voice-based interface. We ran a user study (n = 49) on a crowdsourcing platform where participants judged the relevance of short and long documents sampled from the TREC Deep Learning corpus-presented to them either in the text or voice modality. We found that: (i) participants are equally accurate in their judgements across both the text and voice modality; (ii) with increased document length it takes participants significantly longer (for documents of length > 120 words it takes almost twice as much time) to make relevance judgements in the voice condition; and (iii) the ability of assessors to ignore stimuli that are not relevant (i.e., inhibition) impacts the assessment quality in the voice modality-assessors with higher inhibition are significantly more accurate than those with lower inhibition. Our results indicate that we can reliably leverage the voice modality as a means to effectively collect relevance labels from crowdworkers.Comment: Accepted at SIGIR 202

    The role of Criticism in the Dynamics of Performance Evaluation Systems

    Get PDF
    Drawing on the concept of « trial », developed by French sociologists, this article analyzes the dynamics of employees’ performance evaluation systems, particularly those involving accounting performance measures. A case study is presented as an illustration of our proposal to consider these systems as one of the major trials in the business world, that is, social arrangements organizing the testing of people and resulting in ordering them, and further in consistent social goods allocation. This analysis emphasizes the role of criticism in the dynamics and evolution of performance evaluation systems and enables us to revisit concepts like controllability or objectivity which have been presented for decades as cornerstones of performance evaluation systems either in management control or in human resource management fields.Criticism; performance evaluation systems; fairness; objectivity; controllability; legitimacy; bonus

    The role of professional and managerial experience in interpreting and using managerial assessment center data

    Get PDF
    The present study examined the role of professional and managerial experience in interpreting and using managerial assessment center data. Given the rising use of the assessment center method to select employees, it is essential to understand the degree to which professional and managerial experience impacts the accuracy of interpreting assessment center results. Past selection decision studies have failed to provide a clear answer to this question. The purpose of this study was to ascertain which type of experience, managerial or psychological, is essential in making accurate selection decisions.Several hypotheses based on past selection and performance appraisal research, as well as the expert decision making literature, were examined. A secondary purpose of the current study was to examine the type of information being used across varying groups making selection decisions, in hopes of shedding light on differences in the decision making process. The decision processes of three different subject groups representing managers, trained assessment center assessors and undergraduate students were examined in this study. Forty-two managers from various companies, 34 Industrial And Organizational Psychology doctoral students trained as assessment center raters, and50 undergraduate students served as subjects for this study. Subjects read three managerial job descriptions, examined assessment center results in the form of written summary reports, and rated 16 applicants on overall assessment center performance.predicted job performance, and four skill sets. Subject matter experts’ ratings (i.e..managers with a graduate degree in I/O Psychology and assessment center experience)served as the criterion.VICalculations of Cronbach’s (1955) evaluative accuracy components revealed that undergraduate students’ ratings were the least accurate. The findings also revealed that managers were more capable than trained assessors at distinguishing dimensional differences in performance across individuals (i.e., stereotype accuracy). However,trained assessors were slightly more accurate in their rank ordering of applicants, and they were also slightly better than managers in making accurate dimension x applicant distinctions. A dominance analysis was performed to ascertain the relative importance of dimensional skill ratings in making predictions concerning the applicants’ on-the-job performance. The results of the dominance analysis indicated that the subject groups used the information provided in different ways when making their selection decisions.These findings were further supported by the content analysis of subjects’ written rationale for making their hiring decisions. Managers were more likely to focus on future possibilities and the skills applicants needed to acquire in order to fulfill the job requirements. In contrast, trained assessors tended to focus on the dichotomous decision to hire or not hire applicants. Limitations of the present study are discussed, in addition to the practical implications of the findings. Suggestions for future research are also outlined
    • …
    corecore