144 research outputs found

    Text Retrieval Software for Microcomputers and Beyond: An Overview and a Review of Four Packages.

    Get PDF
    Software for textual files have seen active development in the last few years as the need for such packages grows. An in-depth review is made of 4 such packages: 1. Concordance, from Dataflight Software, 2. Concept Finder, from MUMPS Medical Information Management Systems Inc., 3. Personal Librarian, from Personal Library Software, and 4. Topic, from Verity Inc. Concordance is an easy-to-use package with a well-designed user interface. It offers the standard search features needed for text retrieval and includes integrated editing features and a report writer. Concept Finder\u27s strength is in its multifile capabilities and control over data. Complex relationships can be set up among data fields in multiple files. The search methodologies in Personal Librarian are well suited to ad hoc searching by end users. Meaningful results can be obtained without knowledge of Boolean logic. Topic is marketed primarily to organizations with large networked or time sharing systems with large amounts of text being added at various places on the system

    Image indexing and retrieval: some problems and proposed solutions.

    Get PDF
    Image processing technologies are offering considerable potential for library and information units to extend their databases by the inclusion of images such as photographs, paintings, monograph title-pages and maps. Discusses problems and potential solutions in a structured fashion based on categories of thesauri (text and visual), hybrids, description language and automatic content analysis, with state-of-the-art examples

    The necessity for adaptation in modified boolean document retrieval systems

    Full text link
    A document retrieval system may be described by three formal characteristics: the syntax employed to describe documents (keywords or vectors of weights, for instance), the form of machine-processable queries it accepts as valid (unordered sets of keywords, keywords with Boolean connectives or weighted vectors, for example), and the retrieval rules used to rank or retrieve documents. This article argues that the interdependence among document descriptions, queries, and retrieval rules requires adaptation for the system to perform effectively when one of its components changes.Recently, suggestions have been made to modify traditional Boolean document retrieval systems to allow more flexible queries and ranked document output. However, these new forms of queries and retrieval rules likely require that documents be described differently than they are in existing, commercial Boolean retrieval systems.A "genetic algorithm" is discussed as a means for redescribing documents. This probabilistic algorithm uses feedback along with alternative descriptions of a single document and takes account of the dependency structure of subject terms.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/27541/1/0000585.pd

    A Hybrid Model for Document Retrieval Systems.

    Get PDF
    A methodology for the design of document retrieval systems is presented. First, a composite index term weighting model is developed based on term frequency statistics, including document frequency, relative frequency within document and relative frequency within collection, which can be adjusted by selecting various coefficients to fit into different indexing environments. Then, a composite retrieval model is proposed to process a user\u27s information request in a weighted Phrase-Oriented Fixed-Level Expression (POFLE), which may apply more than Boolean operators, through two phases. That is, we have a search for documents which are topically relevant to the information request by means of a descriptor matching mechanism, which incorporate a partial matching facility based on a structurally-restricted relationship imposed by indexing model, and is more general than matching functions of the traditional Boolean model and vector space model, and then we have a ranking of these topically relevant documents, by means of two types of heuristic-based selection rules and a knowledge-based evaluation function, in descending order of a preference score which predicts the combined effect of user preference for quality, recency, fitness and reachability of documents

    New information retrieval systems

    Get PDF
    L'article pretén donar una visió panoràmica de la investigació que s'ha realitzat d'aquesta nova generació de sistemes de recuperació de la informació, tot describint-ne els seus components més importants i li·lustrant-ho amb exemples basats en aquests nous principis que ja s'estiguin utilitzant.This article offers an overall view of the research that has been conducted, through descriptions of the main components of this new generation of information retrieval systems. Contains examples of systems currently in ise that are based upon these principles

    Autotag: A tool for creating structured document collections from printed materials

    Full text link
    Today\u27s optical character recognition (OCR) devices ordinarily are not capable of delimiting or marking up specific structural information about the document such as the title, its authors, and titles of sections. Such information appears in the OCR device output, but would require a human to go through the output to locate the information. This type of information is highly useful for information retrieval (IR), allowing users much more flexibility in making queries of a retrieval system. This thesis will describe the design, implementation, and evaluation of a software system called Autotag. This system will automatically markup structural information in OCR-generated text. It will also establish a mapping between objects in page images and their corresponding ASCII representation. This mapping can then be used to design flexible image-based interfaces for information retrieval related applications

    A heuristic information retrieval study : an investigation of methods for enhanced searching of distributed data objects exploiting bidirectional relevance feedback

    Get PDF
    A thesis submitted for the degree of Doctor of Philosophy of the University of LutonThe primary aim of this research is to investigate methods of improving the effectiveness of current information retrieval systems. This aim can be achieved by accomplishing numerous supporting objectives. A foundational objective is to introduce a novel bidirectional, symmetrical fuzzy logic theory which may prove valuable to information retrieval, including internet searches of distributed data objects. A further objective is to design, implement and apply the novel theory to an experimental information retrieval system called ANACALYPSE, which automatically computes the relevance of a large number of unseen documents from expert relevance feedback on a small number of documents read. A further objective is to define a methodology used in this work as an experimental information retrieval framework consisting of multiple tables including various formulae which anow a plethora of syntheses of similarity functions, ternl weights, relative term frequencies, document weights, bidirectional relevance feedback and history adjusted term weights. The evaluation of bidirectional relevance feedback reveals a better correspondence between system ranking of documents and users' preferences than feedback free system ranking. The assessment of similarity functions reveals that the Cosine and Jaccard functions perform significantly better than the DotProduct and Overlap functions. The evaluation of history tracking of the documents visited from a root page reveals better system ranking of documents than tracking free information retrieval. The assessment of stemming reveals that system information retrieval performance remains unaffected, while stop word removal does not appear to be beneficial and can sometimes be harmful. The overall evaluation of the experimental information retrieval system in comparison to a leading edge commercial information retrieval system and also in comparison to the expert's golden standard of judged relevance according to established statistical correlation methods reveal enhanced system information retrieval effectiveness
    corecore