158 research outputs found
Assessing relevance using automatically translated documents for cross-language information retrieval
This thesis focuses on the Relevance Feedback (RF) process, and the scenario considered is that of a Portuguese-English Cross-Language Information Retrieval (CUR) system. CUR deals with the retrieval of documents in one natural language in response to a query expressed in another language. RF is an automatic process for query reformulation. The idea behind it is that users are unlikely to produce perfect
queries, especially if given just one attempt.The process aims at improving the queryspecification, which will lead to more relevant documents being retrieved. The method consists of asking the user to analyse an initial sample of documents retrieved in response to a query and judge them for relevance.
In that context, two main questions were posed. The first one relates to the user's ability in assessing the relevance of texts in a foreign language, texts hand translated into their language and texts automatically translated into their language. The second question concerns the relationship between the accuracy of the participant's judgements and the improvement achieved through the RF process.
In order to answer those questions, this work performed an experiment in which Portuguese speakers were asked to judge the relevance of English documents, documents hand-translated to Portuguese, and documents automatically translated to Portuguese. The results show that machine translation is as effective as hand translation in aiding users to assess relevance. In addition, the impact of misjudged
documents on the performance of RF is overall just moderate, and varies greatly for different query topics.
This work advances the existing research on RF by considering a CUR scenario and carrying out user experiments, which analyse aspects of RF and CUR that remained unexplored until now. The contributions of this work also include: the investigation of CUR using a new language pair; the design and implementation of a stemming algorithm for Portuguese; and the carrying out of several experiments using Latent Semantic Indexing which contribute data points to the CUR theory
Formal concept matching and reinforcement learning in adaptive information retrieval
The superiority of the human brain in information retrieval (IR) tasks seems to come firstly
from its ability to read and understand the concepts, ideas or meanings central to documents, in
order to reason out the usefulness of documents to information needs, and secondly from its
ability to learn from experience and be adaptive to the environment. In this work we attempt to
incorporate these properties into the development of an IR model to improve document
retrieval. We investigate the applicability of concept lattices, which are based on the theory of
Formal Concept Analysis (FCA), to the representation of documents. This allows the use of
more elegant representation units, as opposed to keywords, in order to better capture
concepts/ideas expressed in natural language text. We also investigate the use of a
reinforcement leaming strategy to learn and improve document representations, based on the
information present in query statements and user relevance feedback. Features or concepts of
each document/query, formulated using FCA, are weighted separately with respect to the
documents they are in, and organised into separate concept lattices according to a subsumption
relation. Furthen-nore, each concept lattice is encoded in a two-layer neural network structure
known as a Bidirectional Associative Memory (BAM), for efficient manipulation of the
concepts in the lattice representation. This avoids implementation drawbacks faced by other
FCA-based approaches. Retrieval of a document for an information need is based on concept
matching between concept lattice representations of a document and a query. The learning
strategy works by making the similarity of relevant documents stronger and non-relevant
documents weaker for each query, depending on the relevance judgements of the users on
retrieved documents. Our approach is radically different to existing FCA-based approaches in
the following respects: concept formulation; weight assignment to object-attribute pairs; the
representation of each document in a separate concept lattice; and encoding concept lattices in
BAM structures. Furthermore, in contrast to the traditional relevance feedback mechanism, our
learning strategy makes use of relevance feedback information to enhance document
representations, thus making the document representations dynamic and adaptive to the user
interactions. The results obtained on the CISI, CACM and ASLIB Cranfield collections are
presented and compared with published results. In particular, the performance of the system is
shown to improve significantly as the system learns from experience.The School of Computing,
University of Plymouth, UK
Recommended from our members
Interactive query expansion and relevance feedback for document retrieval systems
This thesis is aimed at investigating interactive query expansion within the context of a relevance feedback system that uses term weighting and ranking in searching online databases that are available through online vendors. Previous evaluations of relevance feedback systems have been made in laboratory conditions and not in a real operational environment. The research presented in this thesis followed the idea of testing probabilistic retrieval techniques in an operational environment. The overall aim of this research was to investigate the process of interactive query expansion (IQE) from various points of view including effectiveness. The INSPEC database, on both Data-Star and ESA-IRS, was searched online using CIRT, a front-end system that allows probabilistic term weighting, ranking and relevance feedback. The thesis is divided into three parts. Part I of the thesis covers background information and appropriate literature reviews with special emphasis on the relevance weighting theory (Binary Independence Model), the approaches to automatic and semi-automatic query expansion, the ZOOM facility of ESA/IRS and the CIRT front-end. Part II is comprised of three Pilot case studies. It introduces the idea of interactive query expansion and places it within the context of the weighted environment of CIRT. Each Pilot study looked at different aspects of the query expansion process by using a front-end. The Pilot studies were used to answer methodological questions and also research questions about the query expansion terms. The knowledge and experience that was gained from the Pilots was then applied to the methodology of the study proper (Part III). Part III discusses the Experiment and the evaluation of the six ranking algorithms. The Experiment was conducted under real operational conditions using a real system, real requests, and real interaction. Emphasis was placed on the characteristics of the interaction, especially on the selection of terms for query expansion. Data were collected from 25 searches. The data collection mechanisms included questionnaires, transaction logs, and relevance evaluations. The results of the Experiment are presented according to their treatment of query expansion as main results and other findings in Chapter 10. The main results discuss issues that relate directly to query expansion, retrieval effectiveness, the correspondence of the online-to-offline relevance judgements, and the performance of the w(p — q) ranking algorithm. Finally, a comparative evaluation of six ranking algorithms was performed. The yardstick for the evaluation was provided by the user relevance judgements on the lists of the candidate terms for query expansion. The evaluation focused on whether there are any similarities in the performance of the algorithms and how those algorithms with similar performance treat terms. This abstract refers only to the main conclusions drawn from the results of the Experiment: (1) One third of the terms presented in the list of candidate terms was on average identified by the users as potentially useful for query expansion; (2) These terms were mainly judged as either variant expression (synonyms) or alternative (related) terms to the initial query terms. However, a substantial portion of the selected terms were identified as representing new ideas. (3) The relationship of the 5 best terms chosen by the users for query expansion to the initial query terms was: (a) 34% have no relationship or other type of correspondence with a query term; (b) 66% of the query expansion terms have a relationship which makes the term: (bl) narrower term (70%), (b2) broader term (5%), (b3) related term (25%). (4) The results provide some evidence for the effectiveness of interactive query expansion. The initial search produced on average 3 highly relevant documents at a precision of 34%; the query expansion search produced on average 9 further highly relevant documents at slightly higher precision. (5) The results demonstrated the effectiveness of the w(p—q) algorithm, for the ranking of terms for query expansion, within the context of the Experiment. (6) The main results of the comparative evaluation of the six ranking algorithms, i.e. w(p — q), EMIM, F4, F4modifed, Porter and ZOOM, are that: (a) w(p — q) and EMIM performed best; and (b) the performance between w(p — q) and EMIM and between F4 and F4modified is very similar; (7) A new ranking algorithm is proposed as the result of the evaluation of the six algorithms. Finally, an investigation is by definition an exploratory study which generates hypotheses for future research. Recommendations and proposals for future research are given. The conclusions highlight the need for more research on weighted systems in operational environments, for a comparative evaluation of automatic vs interactive query expansion, and for user studies in searching weighted systems
A study of the influences of computer interfaces and training approaches on end user training outcomes
Effective and efficient training is a key factor in determining the success of end user computing (EUC) in organisations. This study examines the influences of two application interfaces, namely icons and menus, on training outcomes. The training outcomes are measured in terms of effectiveness, efficiency and perceived ease of use. Effectiveness includes the keystrokes used to accomplish tasks, the accuracy of correct keystrokes, backtracks and errors committed. Efficiency includes the time taken to accomplish the given tasks. Perceived ease of use rates the ease of the training environment including training materials, operating system, application software and associated resources provided to users. In order to facilitate measurement, users were asked to nominate one of two approaches to training, instruction training and exploration training that focussed on two categories of users, basic and advanced. User category was determined based on two questionnaires that tested participants\u27 level of knowledge and experience. Learning style preference was also included in the study. For example, to overcome the criticisms of prior studies, this study allowed users to nominate their preferred interfaces and training approaches soon after the training and prior to the experiment. To measure training outcomes, an experiment was conducted with 159 users. Training materials were produced and five questionnaires developed to meet the requirements of the training design. All the materials were peer reviewed and pilot tested in order to eliminate any subjective bias. All questionnaires were tested for statistical validity to ensure the applicability of instruments. Further, for measurement purposes, all keystrokes and time information such as start time and end time of tasks were extracted using automated tools. Prior to data analysis, any \u27outliers\u27 were eliminated to ensure that the data were of good quality. This study found that icon interfaces were effective for end user training for trivial tasks. This study also found that menu interfaces were easy to use in the given training environment. In terms of training approaches, exploration training was found to be effective. The user categorisation alone did not have any significant influence on training outcomes in this study. However, the combination of basic users and instruction training approach was found to be efficient and the combination of basic users and exploration training approach was found to be effective. This study also found out that learning style preference was significant in terms of effectiveness but not efficiency. The results of the study indicates that interfaces play a significant role in determining training outcomes and hence the need for training designers to treat application interfaces differently when addressing training accuracy and time constraints. Similarly, this study supports previous studies in that learning style preferences influence training outcomes. Therefore, training designers should consider users\u27 learning style preferences in order to provide effective training. While categories of user did not show any significant influence on the outcomes of this study, the interaction between training approaches and categories of users was significant indicating that different categories of users respond to different training approaches. Therefore, training designers should consider the possibility of treating differently those with and without experience in EUC applications. For example, one possible approach to training design would be to hold separate training sessions. In summary, this study has found that interfaces, learning styles and the combination of training approaches and categories of users have varying significant impact on training outcomes. Thus the results reported in this study should help training designers to design training programs that would be effective, efficient and easy to use
Data bases and data base systems related to NASA's aerospace program. A bibliography with indexes
This bibliography lists 1778 reports, articles, and other documents introduced into the NASA scientific and technical information system, 1975 through 1980
- …