92,634 research outputs found

    Evaluation of a prototype interface for structured document retrieval

    Get PDF
    Document collections often display either internal structure, in the form of the logical arrangement of document components, or external structure, in the form of links between documents. Structured document retrieval systems aim to exploit this structural information to provide users with more effective access to structured documents. To do this, the associated interface must both represent this information explicitly and support users in their browsing behaviour. This paper describes the implementation and user-centred evaluation of a prototype interface, the RelevanceLinkBar interface. The results of the evaluation show that the RelevanceLinkBar interface supported users in their browsing behaviour, allowing them to find more relevant documents, and was strongly preferred over a standard results interface

    A Method for the Construction and Application of the Term Hierarchy Relationship Residing in Relevance Feedback

    Get PDF
    In the field of information retrieval, the information of term frequency contained in relevance feedback has been widely used. However, the analysis and application of term frequency does not cover the semantic meaning of the terms, which could make the retrieval results deviate from the user’s searching goal. Consider the semantic meaning of the terms, Wille (1992) had proposed a structured view in the dealing with the term relationships of the terms in the retrieval documents. To enhance the effectiveness of information retrieval by the dealing with the mentioned information of term hierarchy relationship, this study has developed a method of query expansion to extract and apply this information contained in relevance feedback first, and then conducted some formal tests to verify the efficiency of the method in the re-ranking of the retrieved documents. The results of the formal tests show that the proposed method of query expansion is more effective than the Rocchio’s query expansion algorithm. The contribution of this study is the disclosure of the applicability of the information of term hierarchy relationship contained in relevance feedback, and the demonstration of the application of this information

    A Compressive Survey on New Technique Towards Successful Document Research Using Key Phrase Annotations Together with Querying Benefit

    Get PDF
    Generally it can be challenging to find out the particular pertinent data inside unstructured wording paperwork. This kind of information is still suffocated within unstructured wording and terminology. Annotations by means of Characteristic name-value frames tend to be more significant for retrieval of this sort of documents. This system proposes a novel, different, alternative approach for document retrieval which includes annotations identification. This system identifies the values of structured attributes by reading, analyzing and parsing the uploaded documents. This system proposes an approach for efficient document retrieval using effective methods. The main use of this system is that when users of author perform query based search, they could get minimum and distinct accurate results where it could be easy for retr ieval data from the database. By using these techniques two techniques, workload of system can reduce by large amount. And it also, given the fact the effic iency of searching annotation document will be faster because of using the query-based searching technique or content value searching

    Nomenclature and Benchmarking Models of Text Classification Models: Contemporary Affirmation of the Recent Literature

    Get PDF
    In this paper we present automated text classification in text mining that is gaining greater relevance in various fields every day Text mining primarily focuses on developing text classification systems able to automatically classify huge volume of documents comprising of unstructured and semi structured data The process of retrieval classification and summarization simplifies extract of information by the user The finding of the ideal text classifier feature generator and distinct dominant technique of feature selection leading all other previous research has received attention from researchers of diverse areas as information retrieval machine learning and the theory of algorithms To automatically classify and discover patterns from the different types of the documents 1 techniques like Machine Learning Natural Language Processing NLP and Data Mining are applied together In this paper we review some effective feature selection researches and show the results in a table for

    Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness

    Get PDF
    In this paper we present a systematic analysis of document retrieval using unstructured and structured queries within the score region algebra (SRA) structured retrieval framework. The behavior of di®erent retrieval models, namely Boolean, tf.idf, GPX, language models, and Okapi, is tested using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation. The analysis is performed on a numerous experiments evaluated on TREC and CLEF collections, using manually generated unstructured and structured queries. Unstructured queries range from the short title queries to long title + description + narrative queries. For generating structured queries we exploit the knowledge of the document structure and the content used to semantically describe or classify documents. We show that such structured information can be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries

    CHIC: Corporate Document for Visual question Answering

    Full text link
    The massive use of digital documents due to the substantial trend of paperless initiatives confronted some companies to find ways to process thousands of documents per day automatically. To achieve this, they use automatic information retrieval (IR) allowing them to extract useful information from large datasets quickly. In order to have effective IR methods, it is first necessary to have an adequate dataset. Although companies have enough data to take into account their needs, there is also a need for a public database to compare contributions between state-of-the-art methods. Public data on the document exists as DocVQA[2] and XFUND [10], but these do not fully satisfy the needs of companies. XFUND contains only form documents while the company uses several types of documents (i.e. structured documents like forms but also semi-structured as invoices, and unstructured as emails). Compared to XFUND, DocVQA has several types of documents but only 4.5% of them are corporate documents (i.e. invoice, purchase order, etc). All of this 4.5% of documents do not meet the diversity of documents required by the company. We propose CHIC a visual question-answering public dataset. This dataset contains different types of corporate documents and the information extracted from these documents meet the right expectations of companies

    A model for structured document retrieval : empirical investigations

    Get PDF
    Documents often display a structure, e.g., several sections, each with several subsections and so on. Taking into account the structure of a document allows the retrieval process to focus on those parts of the document that are most relevant to an information need. In previous work, we developed a model for the representation and the retrieval of structured documents. This paper reports the first experimental study of the effectiveness and applicability of the model

    Applying semantic web technologies to knowledge sharing in aerospace engineering

    Get PDF
    This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale
    • …
    corecore