Search CORE

8,967 research outputs found

Implications of Inter-Rater Agreement on a Student Information Retrieval Evaluation

Author: Mayr Philipp
Mutschke Peter
Schaer Philipp
Publication venue
Publication date: 01/01/2010
Field of study

This paper is about an information retrieval evaluation on three different retrieval-supporting services. All three services were designed to compensate typical problems that arise in metadata-driven Digital Libraries, which are not adequately handled by a simple tf-idf based retrieval. The services are: (1) a co-word analysis based query expansion mechanism and re-ranking via (2) Bradfordizing and (3) author centrality. The services are evaluated with relevance assessments conducted by 73 information science students. Since the students are neither information professionals nor domain experts the question of inter-rater agreement is taken into consideration. Two important implications emerge: (1) the inter-rater agreement rates were mainly fair to moderate and (2) after a data-cleaning step which erased the assessments with poor agreement rates the evaluation data shows that the three retrieval services returned disjoint but still relevant result sets.Comment: 7 pages, 3 figures, LWA 2010, Workshop I

arXiv.org e-Print Archive

SSOAR - Social Science Open Access Repository

Human assessments of document similarity

Author: Belkin
Belz
Cavnar
Cavnar
Damashek
Damashek
Flesch
Fox
Furnas
Gardenfors
Haenggi
Harman
Harman
Harman
Hjørland
Johnson-Laird
Järvelin
Landauer
Lee
Lin
Lund
Miller
Morris
Resnik
Salton
Saracevic
Skupin
Vorhees
Westerman
Publication venue: 'Wiley'
Publication date: 01/01/2010
Field of study

Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n-gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N-gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n-gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems

Crossref

University of Gloucestershire Research Repository

Brunel University Research Archive

Applying Science Models for Search

Author: Mayr Philipp
Mutschke Peter
Petras Vivien
Schaer Philipp
Sure York
Publication venue
Publication date: 09/01/2011
Field of study

The paper proposes three different kinds of science models as value-added services that are integrated in the retrieval process to enhance retrieval quality. The paper discusses the approaches Search Term Recommendation, Bradfordizing and Author Centrality on a general level and addresses implementation issues of the models within a real-life retrieval environment.Comment: 14 pages, 3 figures, ISI 201

arXiv.org e-Print Archive

SSOAR - Social Science Open Access Repository

Recommended from our members

Applying latent semantic analysis to computer assisted assessment in the Computer Science domain: a framework, a tool, and an evaluation

Author: Haley Debra Trusso
Publication venue
Publication date: 01/01/2009
Field of study

This dissertation argues that automated assessment systems can be useful for both students and educators provided that the results correspond well with human markers. Thus, evaluating such a system is crucial. I present an evaluation framework and show how and why it can be useful for both producers and consumers of automated assessment systems. The framework is a refinement of a research taxonomy that came out of the effort to analyse the literature review of systems based on Latent Semantic Analysis (LSA), a statistical natural language processing technique that has been used for automated assessment of essays. The evaluation framework can help developers publish their results in a format that is comprehensive, relatively compact, and useful to other researchers. The thesis claims that, in order to see a complete picture of an automated assessment system, certain pieces must be emphasised. It presents the framework as a jigsaw puzzle whose pieces join together to form the whole picture. The dissertation uses the framework to compare the accuracy of human markers and EMMA, the LSA-based assessment system I wrote as part of this dissertation. EMMA marks short, free text answers in the domain of computer science. I conducted a study of five human markers and then used the results as a benchmark against which to evaluate EMMA. An integral part of the evaluation was the success metric. The standard inter-rater reliability statistic was not useful; I located a new statistic and applied it to the domain of computer assisted assessment for the first time, as far as I know. Although EMMA exceeds human markers on a few questions, overall it does not achieve the same level of agreement with humans as humans do with each other. The last chapter maps out a plan for further research to improve EMMA

Open Research Online (The Open University)

OpenGrey Repository

The Impact of Residential Treatment on Emotionally Disturbed Boys

Author: Ebert Marilyn C.
Publication venue: Scholars Commons @ Laurier
Publication date: 01/01/1969
Field of study

Within the past four decades, social work has witnessed the development of increasingly specialized servicecs to children, among these a sort of “total impact therapy” generally defined as residential treatment. In conjunction with the basic social work values of the bio-psycho-social nature of human maladjustment, residential centres have attempted to help the child effect a happier adjustment to his life situation by meeting some ungratified basic need. Institutions for dependent children complimented those for custodial care of even isolation; contemporary residential treatment centres are designed to meet a broader range of needs of the child than those of forty years ago through a variety of approaches, often referred to as milieu therapy. Consideration of the common needs of children is basic to questions concerning the place of institutional treatment and the particular type of child for which this social work service is the most appropriate one. The residential treatment centre addresses the whole gamut of a child’s needs from physical care to rehabilitation. Exposure to, and participation in, a group life experience simulating as closely as possible the family or community life experience is the element differentiating residential care from other treatment modes. By involvement in the realities of his daily situation and the working through or resolution of these, the child is helped to cope with his own growth and development—physical, emotional, and social. Problems and questions examined in this paper revolve around the residential treatment centre defined vaguely by the Child Welfare League of America as “A building....maintained and operated by a chartered agency, organization or institution, whose main purpose is to provide shelter and care to a group of unrelated children and youths up to eighteen years of age.” More specifically, the concern for research, the proposal and plans for implementation are focused on Mount St. Joseph, an autonomous, non-profit institution providing care for boys with moderate to severe emotional disturbances

Wilfrid Laurier University

A comparison of homonym meaning frequency estimates derived from movie and television subtitles, free association, and explicit ratings

Author: Armstrong Blair C.
Beekhuizen Barend
Dubrovsky Vladimir
Rice Caitlin A.
Stevenson Suzanne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

First Online: 10 September 2018Most words are ambiguous, with interpretation dependent on context. Advancing theories of ambiguity resolution is important for any general theory of language processing, and for resolving inconsistencies in observed ambiguity effects across experimental tasks. Focusing on homonyms (words such as bank with unrelated meanings EDGE OF A RIVER vs. FINANCIAL INSTITUTION), the present work advances theories and methods for estimating the relative frequency of their meanings, a factor that shapes observed ambiguity effects. We develop a new method for estimating meaning frequency based on the meaning of a homonym evoked in lines of movie and television subtitles according to human raters. We also replicate and extend a measure of meaning frequency derived from the classification of free associates. We evaluate the internal consistency of these measures, compare them to published estimates based on explicit ratings of each meaning’s frequency, and compare each set of norms in predicting performance in lexical and semantic decision mega-studies. All measures have high internal consistency and show agreement, but each is also associated with unique variance, which may be explained by integrating cognitive theories of memory with the demands of different experimental methodologies. To derive frequency estimates, we collected manual classifications of 533 homonyms over 50,000 lines of subtitles, and of 357 homonyms across over 5000 homonym–associate pairs. This database—publicly available at: www.blairarmstrong.net/homonymnorms/—constitutes a novel resource for computational cognitive modeling and computational linguistics, and we offer suggestions around good practices for its use in training and testing models on labeled data

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Science Models as Value-Added Services for Scholarly Information Systems

Author: A Al-Maskari
A Bavelas
A Shiri
AL Barabasi
BC Brookes
C Chen
D Beaver
DB Worthen
DC Blair
DC Blair
DC Blair
FR Lang
H Lu
HD White
JL Fleiss
K Börner
KW Boyack
L Leydesdorff
L Leydesdorff
L Yin
LC Freeman
LC Freeman
M Callon
MEJ Newman
MEJ Newman
MJ Bates
NJ Belkin
P Mayr
P Mayr
P Mutschke
P Mutschke
P Mutschke
Peter Mutschke
Philipp Mayr
Philipp Schaer
RW White
SC Bradford
SC Bradford
V Petras
W Glänzel
X Liu
Y Jiang
York Sure
Z-L He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/05/2011
Field of study

The paper introduces scholarly Information Retrieval (IR) as a further dimension that should be considered in the science modeling debate. The IR use case is seen as a validation model of the adequacy of science models in representing and predicting structure and dynamics in science. Particular conceptualizations of scholarly activity and structures in science are used as value-added search services to improve retrieval quality: a co-word model depicting the cognitive structure of a field (used for query expansion), the Bradford law of information concentration, and a model of co-authorship networks (both used for re-ranking search results). An evaluation of the retrieval quality when science model driven services are used turned out that the models proposed actually provide beneficial effects to retrieval quality. From an IR perspective, the models studied are therefore verified as expressive conceptualizations of central phenomena in science. Thus, it could be shown that the IR perspective can significantly contribute to a better understanding of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric

arXiv.org e-Print Archive

Crossref

Heuristic Principles and Differential Judgments in the Assessment of Information Quality

Author: Arazy Ofer
Hadar Irit
Kopak Rick
Publication venue: AIS Electronic Library (AISeL)
Publication date: 26/05/2017
Field of study

Information quality (IQ) is a multidimensional construct and includes dimensions such as accuracy, completeness, objectivity, and representation that are difficult to measure. Recently, research has shown that independent assessors who rated IQ yielded high inter-rater agreement for some information quality dimensions as opposed to others. In this paper, we explore the reasons that underlie the differences in the “measurability” of IQ. Employing Gigerenzer’s “building blocks” framework, we conjecture that the feasibility of using a set of heuristic principles consistently when assessing different dimensions of IQ is a key factor driving inter-rater agreement in IQ judgments. We report on two studies. In the first study, we qualitatively explored the manner in which participants applied the heuristic principles of search rules, stopping rules, and decision rules in assessing the IQ dimensions of accuracy, completeness, objectivity, and representation. In the second study, we investigated the extent to which participants could reach an agreement in rating the quality of Wikipedia articles along these dimensions. Our findings show an alignment between the consistent application of heuristic principles and inter-rater agreement levels found on particular dimensions of IQ judgments. Specifically, on the dimensions of completeness and representation, assessors applied the heuristic principles consistently and tended to agree in their ratings, whereas, on the dimensions of accuracy and objectivity, they not apply the heuristic principles in a uniform manner and inter-rater agreement was relatively low. We discuss our findings implications for research and practice

AIS Electronic Library (AISeL)