64 research outputs found
A rejoinder to the comments of Benedetto et al. on the paper “Critical remarks on the Italian research assessment exercise VQR 2011–2014” (Journal of Informetrics, 11(2): 337–357)
The paper “Critical remarks on the Italian research assessment exercise VQR 2011–2014” (Franceschini & Maisano, 2017) analyzed some vulnerabilities of the recently concluded Italian assessment exercise. Some apical (former and current)members of ANVUR promptly commented on our criticisms through a letter to the editor (Benedetto, Checchi, Graziosi, & Malgarini, 2017). We believe that this letter is not very convincing. In the following, we provide a rejoinder to the comments directed to our paper
Critical remarks on the Italian research assessment exercise VQR 2011–2014
For nearly a decade, several national exercises have been implemented for assessing the
Italian research performance, from the viewpoint of universities and other research institutions.
The penultimate one – i.e., the VQR 2004–2010, which adopted a hybrid evaluation
approach based on bibliometric analysis and peer review – suffered heavy criticism at a
national and international level.
The architecture of the subsequent exercise – i.e., the VQR 2011–2014, still in progress
– is partly similar to that of the previous one, except for a few presumed improvements.
Nevertheless, this other exercise is suffering heavy criticism too.
This paper presents a structured discussion of the VQR 2011–2014, collecting and organizing
some critical arguments so far emerged, and developing them in detail.
Some of the major vulnerabilities of the VQR 2011–2014 are: (1) the fact that evaluations
cover a relatively small fraction of the scientific publications produced by the researchers
involved in the evaluation, (2) incorrect and anachronistic use of the journal metrics (i.e.,
ISI Impact Factor and similar ones) for assessing individual papers, and (3) conceptually
misleading criteria for normalizing and aggregating the bibliometric indicators in use
On tit for tat: Franceschini and Maisano versus ANVUR regarding the Italian research assessment exercise VQR 2011-2014
The response by Benedetto, Checchi, Graziosi & Malgarini (2017) (hereafter
"BCG&M"), past and current members of the Italian Agency for Evaluation of
University and Research Systems (ANVUR), to Franceschini and Maisano's ("F&M")
article (2017), inevitably draws us into the debate. BCG&M in fact complain
"that almost all criticisms to the evaluation procedures adopted in the two
Italian research assessments VQR 2004-2010 and 2011-2014 limit themselves to
criticize the procedures without proposing anything new and more apt to the
scope". Since it is us who raised most criticisms in the literature, we welcome
this opportunity to retrace our vainly "constructive" recommendations, made
with the hope of contributing to assessments of the Italian research system
more in line with the state of the art in scientometrics. We see it as equally
interesting to confront the problem of the failure of knowledge transfer from
R&D (scholars) to engineering and production (ANVUR's practitioners) in the
Italian VQRs. We will provide a few notes to help the reader understand the
context for this failure. We hope that these, together with our more specific
comments, will also assist in communicating the reasons for the level of
scientometric competence expressed in BCG&M's heated response to F&M's
criticism
On the Shapley value and its application to the Italian VQR research assessment exercise
Research assessment exercises have now become common evaluation tools in a number of countries. These exercises have the goal of guiding merit-based public funds allocation, stimulating improvement of research productivity through competition and assessing the impact of adopted research support policies. One case in point is Italy's most recent research assessment effort, VQR 2011–2014 (Research Quality Evaluation), which, in addition to research institutions, also evaluated university departments, and individuals in some cases (i.e., recently hired research staff and members of PhD committees). However, the way an institution's score was divided, according to VQR rules, between its constituent departments or its staff members does not enjoy many desirable properties well known from coalitional game theory (e.g., budget balance, fairness, marginality). We propose, instead, an alternative score division rule that is based on the notion of Shapley value, a well known solution concept in coalitional game theory, which enjoys the desirable properties mentioned above. For a significant test case (namely, Sapienza University of Rome, the largest university in Italy), we present a detailed comparison of the scores obtained, for substructures and individuals, by applying the official VQR rules, with those resulting from Shapley value computations. We show that there are significant differences in the resulting scores, making room for improvements in the allocation rules used in research assessment exercises
Are Italian research assessment exercises size-biased?
Research assessment exercises have enjoyed ever-increasing popularity in many countries in recent years, both as a method to guide public funds allocation and as a validation tool for adopted research support policies. Italy’s most recently completed evaluation effort (VQR 2011–14) required each university to submit to the Ministry for Education, University, and Research (MIUR) 2 research products per author (3 in the case of other research institutions), chosen in such a way that the same product is not assigned to two authors belonging to the same institution. This constraint suggests that larger institutions, where collaborations among colleagues may be more frequent, could suffer a size-related bias in their evaluation scores. To validate our claim, we investigate the outcome of artificially splitting Sapienza University of Rome, one of the largest universities in Europe, in a number of separate partitions, according to several criteria, noting significant score increases for several partitioning scenarios
Errors and secret data in the Italian research assessment exercise. A comment to a reply
Italy adopted a performance-based system for funding universities that is centered on the results of a national research assessment exercise, realized by a governmental agency (ANVUR). ANVUR evaluated papers by using “a dual system of evaluation”, that is by informed peer review or by bibliometrics. In view of validating that system, ANVUR performed an experiment for estimating the agreement between informed review and bibliometrics. Ancaiani et al. (2015) presents the main results of the experiment. Alberto Baccini and De Nicolao (2017) documented in a letter, among other critical issues, that the statistical analysis was not realized on a random sample of articles. A reply to the letter has been published by Research Evaluation (Benedetto et al. 2017). This note highlights that in the reply there are (1) errors in data, (2) problems with “representativeness” of the sample, (3) unverifiable claims about weights used for calculating kappas, (4) undisclosed averaging procedures; (5) a statement about “same protocol in all areas” contradicted by official reports. Last but not least: the data used by the authors continue to be undisclosed. A general warning concludes: many recently published papers use data originating from Italian research assessment exercise. These data are not accessible to the scientific community and consequently these papers are not reproducible. They can be hardly considered as containing sound evidence at least until authors or ANVUR disclose the data necessary for replication
Evaluating Research and Scholarly Impact in Criminology and Criminal Justice in the United Kingdom and Italy: A Comparative Perspective
What scholarly impact is, and how it is evaluated, vary across different countries. In the United Kingdom, for instance, scholarly impact is mainly assessed through the Research Excellence Framework (REF) in the context of providing—among other things—accountability for public investment in research, demonstrating the public benefits of research, and informing the selective allocation of research funding. In the REF system, impact needs to show a demonstrable effect on change, or evidence of benefits outside academia, and is formally assessed through case studies. In Italy, there is a comparable system for evaluating research, known as Evaluation of Research Quality, but in this latter case, the focus is on the quality of selected research outputs as indicators of research performance. Impact is here considered with reference to the so-called third mission (which includes activities aimed at the valorization of research, and activities that have positive spillovers into society at large) and is evaluated separately. Our contribution aims at critically analyzing the commonalities and differences of these two systems when it comes to evaluating research in Criminology and Criminal Justice, considering some of the benefits and potential pitfalls of research evaluation in both regions, and discussing how these disciplines are framed and delimited differently in the two countries considered
Metrics and peer review agreement at the institutional level
In the past decades, many countries have started to fund academic
institutions based on the evaluation of their scientific performance. In this
context, peer review is often used to assess scientific performance.
Bibliometric indicators have been suggested as an alternative. A recurrent
question in this context is whether peer review and metrics tend to yield
similar outcomes. In this paper, we study the agreement between bibliometric
indicators and peer review at the institutional level. Additionally, we also
quantify the internal agreement of peer review at the institutional level. We
find that the level of agreement is generally higher at the institutional level
than at the publication level. Overall, the agreement between metrics and peer
review is on par with the internal agreement among two reviewers for certain
fields of science. This suggests that for some fields, bibliometric indicators
may possibly be considered as an alternative to peer review for national
research assessment exercises
Do they agree? Bibliometric evaluation vs informed peer review in the Italian research assessment exercise
During the Italian research assessment exercise, the national agency ANVUR
performed an experiment to assess agreement between grades attributed to
journal articles by informed peer review (IR) and by bibliometrics. A sample of
articles was evaluated by using both methods and agreement was analyzed by
weighted Cohen's kappas. ANVUR presented results as indicating an overall
'good' or 'more than adequate' agreement. This paper re-examines the experiment
results according to the available statistical guidelines for interpreting
kappa values, by showing that the degree of agreement, always in the range
0.09-0.42 has to be interpreted, for all research fields, as unacceptable, poor
or, in a few cases, as, at most, fair. The only notable exception, confirmed
also by a statistical meta-analysis, was a moderate agreement for economics and
statistics (Area 13) and its sub-fields. We show that the experiment protocol
adopted in Area 13 was substantially modified with respect to all the other
research fields, to the point that results for economics and statistics have to
be considered as fatally flawed. The evidence of a poor agreement supports the
conclusion that IR and bibliometrics do not produce similar results, and that
the adoption of both methods in the Italian research assessment possibly
introduced systematic and unknown biases in its final results. The conclusion
reached by ANVUR must be reversed: the available evidence does not justify at
all the joint use of IR and bibliometrics within the same research assessment
exercise.Comment: in Scientometrics, 201
- …