13 research outputs found
Template Mining for Information Extraction from Digital Documents
published or submitted for publicatio
LCSH and PRECIS in Library and Information Science: A Comparative Study
This study aims to compare the performance of LCSH and PRECIS for the books published in 1987 in the field of library and information science (LIS) in order to investigate the strengths and weaknesses of each system. Subject headings and PRECIS strings assigned for 82 titles
have been analyzed and the two major subject access systems have been compared regarding the number of entries, exhaustivity and specificity of the entries provided, the variety of subdivisions, and other qualitative features
The Generation of Compound Nominals to Represent the Essence of Text The COMMIX System
This thesis concerns the COMMIX system, which automatically extracts
information on what a text is about, and generates that information in the highly
compacted form of compound nominal expressions. The expressions generated
are complex and may include novel terms which do not appear themselves in
the input text.
From the practical point of view, the work is driven by the need for better
representations of content: for representations which are shorter and more
concise than would appear in an abstract, yet more informative and
representative of the actual aboutness than commonly occurs in indexing
expressions and key terms. This additional layer of representation is referred to in
this work as pertaining to the essence of a particular text.
From a theoretical standpoint, the thesis shows how the compound
nominal as a construct can be successfully employed in these highly informative
representations. It involves an exploration of the claim that there is sufficient
semantic information contained within the standard dictionary glosses for
individual words to enable the construction of useful and highly representative
novel compound nominal expressions, without recourse to standard syntactic
and statistical methods. It shows how a shallow semantic approach to content
identification which is based on lexical overlap can produce some very
encouraging results.
The methodology employed, and described herein, is domain-independent,
and does not require the specification of templates with which the
input text must comply. In these two respects, the methodology developed in this
work avoids two of the most common problems associated with information
extraction.
As regards the evaluation of this type of work, the thesis introduces and
utilises the notion of percentage attainment value, which is used in conjunction
with subjects' opinions about the degree to which the aboutness terms succeed in
indicating the subject matter of the texts for which they were generated
Special Libraries, September 1969
Volume 60, Issue 7https://scholarworks.sjsu.edu/sla_sl_1969/1006/thumbnail.jp
Relevance, Rhetoric, and Argumentation: A Cross-Disciplinary Inquiry into Patterns of Thinking and Information Structuring
This dissertation research is a multidisciplinary inquiry into topicality, involving an in-depth examination of literatures and empirical data and an inductive development of a faceted typology (containing 227 fine-grained topical relevance relationships and
33 types of presentation relationship). This inquiry investigates a large variety of topical connections beyond topic matching, renders a closer look into the structure of a topic, achieves an enriched understanding of topicality and relevance, and induces a cohesive topic-oriented information architecture that is meaningful across topics and domains. The findings from the analysis contribute to the foundation work of information organization, intellectual access / information retrieval, and knowledge discovery.
Using qualitative content analysis, the inquiry focuses on meaning and deep structure:
Phase 1 : develop a unified theory-grounded typology of topical relevance relationships through close reading of literature and synthesis of thinking from communication, rhetoric, cognitive psychology, education, information science, argumentation, logic, law, medicine, and art history;
Phase 2 : in-depth qualitative analysis of empirical relevance datasets in oral history, clinical question answering, and art image tagging, to examine manifestations of the theory-grounded typology in various contexts and to further refine the typology; the three relevance datasets were used for analysis to achieve variation in form, domain, and context.
The typology of topical relevance relationships is structured with three major facets:
Functional role of a piece of information plays in the overall structure of a topic or an argument;
Mode of reasoning: How information contributes to the user's reasoning about a topic;
Semantic relationship: How information connects to a topic semantically.
This inquiry demonstrated that topical relevance with its close linkage to thinking and reasoning is central to many disciplines. The multidisciplinary approach allows synthesis and examination from new angles, leading to an integrated scheme of relevance relationships or a system of thinking that informs each individual
discipline. The scheme resolving from the synthesis can be used to improve text and image understanding, knowledge organization and retrieval, reasoning, argumentation, and thinking in general, by people and machines
Recommended from our members
An Evaluation of Structured Navigation for Subject Searching in Online Catalogues
Understanding and improving subject searching in online library catalogues is the focus of this study. Against the backdrop of current research and developments in online catalogues an analysis of the problems and prospects for subject access in the expanding online catalogue is presented. Developments in recent information retrieval theory and practice are reviewed, and a case is made for a new model of information seeking and retrieval that more closely describes much of the subject searching and browsing activity actually conducted by library users. The center piece of this study is the experiment that was conducted using an experimental online catalogue developed to investigate and evaluate the effect of alternative browse and navigate search methods on overall retrieval effectiveness and subject searching performance. The objectives, methodology, and findings of this online catalogue search experiment are discussed. The primary aim of the experimental study was to evaluate the usability and retrieval performance of a pre-structured "navigation" approach to subject searching and browsing in library catalogues. The main hypothesis tested was that the provision and use of a navigation search and browse function would significantly improve overall OPAC retrieval effectiveness and the subject searching performance of OPAC users. The OPAC used in the study was designed and implemented by this author using the database management and retrieval software known as "TiNMAN", provided by Information Management & Engineering, Ltd. TINMAN employs an entity-relational database structure which permits the linking of any field in the stored bibliographic record to any other field. These linkages establish browse and navigation pathways among data fields ("entities") and citations to support guided but flexible searching and browsing through the collection by users. Thus, a rudimentary form of hypertext is provided for the users of the OPAC. The test database consisted of 30,000 Library of Congress MARC bibliographic records selected at random from all LC catalog records for publications through 1988 in the English language in the LC classes HB-HJ (Economics, Business, etc.). For each record, the verbal description of the assigned LC class number found in the printed schedules was added as a subject descriptor to augment the subject cataloging provided by the Library of Congress. Three different OPACs were tested for comparison purposes. The control OPAC lacked the navigation feature. The other two OPACs supported related-record navigation, one on title words only, the other on subject headings only. Searchers were encouraged to use the OPAC's features and search options in whatever manner they wished. Subjects in Group-I were permitted to navigate only on the subject headings from the controlled subject vocabulary assigned to the work cited (augmented by the verbal meanings of the Library of Congress class number). Subjects in Group-2 were permitted to navigate, but only from title words of the work cited and displayed. Navigating from one of these title words would result in the retrieval of all works whose titles had at least one occurrence of the selected word. Subjects in the control group were not permitted to navigate; that is, it was not possible for them to point to a selected data element in a displayed citation to move on to related terms or citations associated with that data element. The positive value of related-record navigation in improving subject searching in OPACs was not clearly determined. The navigation groups performed significantly better than the control groupon the first search task, but all three groups performed nearly equally well on the second search task. Navigation on subject headings or title keywords resulted in higher recall scores, especially among first time, novice users of the system, but precision suffered significantly in title-word navigation. In fact, the control group achieved higher precision scores on both search tasks. Navigation did not seem to aid subject searching performance after greater familiarity with the system was achieved, except perhaps to increase recall in persistent searches without much decrease in precision. Online bookshelf browsing seems to improve recall without a significant decrease in precision, and may be a more positive factor than navigation on either subject headings or title words
User-developer cooperation in software development: building common ground and usable systems
PhDThe topic of this research is direct user participation in the task based development
of interactive software systems. Building usable software demands understanding
and supporting users and their tasks. Users are a primary source of usability
requirements and knowledge, since users can be expected to have intimate and
extensive knowledge of themselves, their tasks and their working environment.
Task analysis approaches to software development encourage a focus on supporting
users and their tasks while participatory design approaches encourage users' direct,
active contributions to software development work. However, participatory design
approaches often concentrate their efforts on design activities rather than on wider
system development activities, while task analysis approaches generally lack active
user participation beyond initial data gathering. This research attempts an
integration of the strengths of task analysis and user participation within an overall
software development process.
This thesis also presents detailed empirical and theoretical analyses of what it is for
users and developers to cooperate, of the nature of user-developer interaction in
participatory settings. Furthennore, it operationalises and assesses the effectiveness
of user participation in development and the impact of user-developer cooperation
on the resulting software product. The research addressed these issues through the
development and application of an approach to task based participatory development
in two real world development projects. In this integrated approach, the respective
strengths of task analysis and participatory design methods complemented each
other's weaker aspects. The participatory design features encouraged active user
participation in the development work while the task analysis features extended this
participation upstream from software design activities to include analysis of the
users' current work situation and design of an envisioned work situation.
An inductive analysis of user-developer interaction in the software development
projects was combined with a theoretical analysis drawing upon work on common
ground in communication. This research generated an account of user-developer
interaction in terms of the joint construction of two distinct fonns of common
ground between user and developer: common ground about their present joint
development activities and common ground about the objects of those joint
activities, work situations and software systems.
The thesis further extended the concept of common ground, assessing user
participation in terms of contributions to common ground developed through the
user-developer discourse. The thesis then went on to operationalise and to assess
the effectiveness of user participation in tenns of the assimilation of users'
contributions into the artefacts of the development work. Finally, the thesis
assessed the value of user participation in tenns of the impact of user contributions
to the development activities on the usability of the software produced.Engineering and Physical Sciences Research Council
Harlequin Software Grou