98,157 research outputs found

    Concept-Based Retrieval from Critical Incident Reports

    Get PDF
    Background: Critical incident reporting systems (CIRS) are used as a means to collect anonymously entered information of incidents that occurred for example in a hospital. Analyzing this information helps to identify among others problems in the workflow, in the infrastructure or in processes. Objectives: The entire potential of these sources of experiential knowledge remains often unconsidered since retrieval of relevant reports and their analysis is difficult and time-consuming, and the reporting systems often do not provide support for these tasks. The objective of this work is to develop a method for retrieving reports from the CIRS related to a specific user query. Methods: atural language processing (NLP) and information retrieval (IR) methods are exploited for realizing the retrieval. We compare standard retrieval methods that rely upon frequency of words with an approach that includes a semantic mapping of natural language to concepts of a medical ontology. Results: By an evaluation, we demonstrate the feasibility of semantic document enrichment to improve recall in incident reporting retrieval. It is shown that a combination of standard keyword-based retrieval with semantic search results in highly satisfactory recall values. Conclusion: In future work, the evaluation should be repeated on a larger data set and real-time user evaluation need to be performed to assess user satisfactory with the system and results. Keywords. Information Retrieval, Data Mining, Natural Language Processing, Critical Incidents Reporting

    A COMPARATIVE STUDY ON ONTOLOGY GENERATION AND TEXT CLUSTERING USING VSM, LSI, AND DOCUMENT ONTOLOGY MODELS

    Get PDF
    Although using ontologies to assist information retrieval and text document processing has recently attracted more and more attention, existing ontology-based approaches have not shown advantages over the traditional keywords-based Latent Semantic Indexing (LSI) method. This paper proposes an algorithm to extract a concept forest (CF) from a document with the assistance of a natural language ontology, the WordNet lexical database. Using concept forests to represent the semantics of text documents, the semantic similarities of these documents are then measured as the commonalities of their concept forests. Performance studies of text document clustering based on different document similarity measurement methods show that the CF-based similarity measurement is an effective alternative to the existing keywords-based methods. Especially, this CF-based approach has obvious advantages over the existing keywords-based methods, including LSI, in dealing with text abstract databases, such as MEDLINE, or in P2P environments where it is impractical to collect the entire document corpus for analysis

    Logical-Linguistic Model and Experiments in Document Retrieval

    Get PDF
    Conventional document retrieval systems have relied on the extensive use of the keyword approach with statistical parameters in their implementations. Now, it seems that such an approach has reached its upper limit of retrieval effectiveness, and therefore, new approaches should be investigated for the development of future systems. With current advances in hardware, programming languages and techniques, natural language processing and understanding, and generally, in the field of artificial intelligence, there are now attempts being made to include linguistic processing into document retrieval systems. Few attempts have been made to include parsing or syntactic analysis into document retrieval systems, and the results reported show some improvements in the level of retrieval effectiveness. The first part of this thesis sets out to investigate further the use of linguistic processing by including translation, instead of only parsing, into a document retrieval system. The translation process implemented is based on unification categorial grammar and uses C-Prolog as the building tool. It is used as the main part of the indexing process of documents and queries into a knowledge base predicate representation. Instead of using the vector space model to represent documents and queries, we have used a kind of knowledge base model which we call logical-linguistic model. A development of a robust parser-translator to perform the translation is discussed in detail in the thesis. A method of dealing with ambiguity is also incorporated in the parser-translator implementation. The retrieval process of this model is based on a logical implication process implemented in C-Prolog. In order to handle uncertainty in evaluating similarity values between documents and queries, meta level constructs are built upon the C-Prolog system. A logical meta language, called UNIL (UNcertain Implication Language), is proposed for controlling the implication process. Using UNIL, one can write a set of implication rules and thesaurus to define the matching function of a particular retrieval strategy. Thus, we have demonstrated and implemented the matching operation between a document and a query as an inference using unification. An inference from a document to a query is done in the context of global information represented by the implication rules and the thesaurus. A set of well structured experiments is performed with various retrieval strategies on a test collection of documents and queries in order to evaluate the performance of the system. The results obtained are analysed and discussed. The second part of the thesis sets out to implement and evaluate the imaging retrieval strategy as originally defined by van Rijsbergen. The imaging retrieval is implemented as a relevance feedback retrieval with nearest neighbour information which is defined as follows. One of the best retrieval strategies from the earlier experiments is chosen to perform the initial ranking of the documents, and a few top ranked documents will be retrieved and identified as relevant or not by the user. From this set of retrieved and relevant documents, we can obtain all other unretrieved documents which have any of the retrieved and relevant documents as their nearest neighbour. These unretrieved documents have the potential of also being relevant since they are 'close' to the retrieved and relevant ones, and thus their initial similarity values to the query will be updated according to their distances from their nearest neighbours. From the updated similarity values, a new ranking of documents can be obtained and evaluated. A few sets of experiments using imaging retrieval strategy are performed for the following objectives: to search for an appropriate updating function in order to produce a new ranking of documents, to determine an appropriate nearest neighbour set, to find the relationship of the retrieval effectiveness to the size of the documents shown to the user for relevance judgement, and lastly, to find the effectiveness of a multi-stage imaging retrieval. The results obtained are analysed and discussed. Generally, the thesis sets out to define the logical-linguistic model in document retrieval and demonstrates it by building an experimental system which will be referred to as SILOL (a Simple Logical-linguistic document retrieval system). A set of retrieval strategies will be experimented with and the results obtained will be analysed and discussed

    Taxonomic Reasoning and Lexical Semantics

    Get PDF
    Taxonomic reasoning is used in many applications, including many-sorted logic, knowledge bases, document retrieval, and natural language processing. These various applications have been dealt with independently. Because they have so much in common, a general approach to taxonomic reasoning would seem to be justified. This paper presents a theory of lexical semantics as an example of such a general approach. The theory defines a representation and an algebra for that representation. The operations of the algebra are inherently parallel, making them well matched to the capabilities of modern computer systems

    A semantic-based approach to information processing

    Get PDF
    The research reported in this thesis is centred around the development of a semantic based approach to information processing. Traditional word-based pattern matching approaches to information processing suffer from both the richness and ambiguousness of natural language. Although retrieval performances of traditional systems can be satisfactory in many situations, it is commonly held that the traditional approach has reached the peak of its potential and any substantial improvements will be very difficult to achieve, [Smea91], Word-based pattern matching retrieval systems are devoid of the semantic power necessary to either distinguish between different senses of homonyms or identity the similar meanings of related terms. Our proposed semantic information processing system was designed to tackle these problems among others, (we also wanted to allow phrasal as well as single word terms to describe concepts). Our prototype system is comprised of a WordNet derived domain independent knowledge base (KB) and a concept level semantic similarity estimator. The KB, which is rich in noun phrases, is used as a controlled vocabulary which effectively addresses many of the problems posed by ambiguities in natural language. Similarly both proposals for the semantic similarity estimator tackle issues regarding the richness of natural language and in particular the multitude of ways of expressing the same concept. A semantic based document retrieval system is developed as a means of evaluating our approach. However, many other information processing applications are discussed with particular attention directed towards the application of our approach to locating and relating information in a large scale Federated Database System (FDBS). The document retrieval evaluation application operates by obtaining KB representations of both the documents and queries and using the semantic similarity estimators as the comparison mechanism in the procedure to determine the degree of relevance of a document for a query. The construction of KB representations for documents and queries is a completely automatic procedure, and among other steps includes a sense disambiguation phase. The sense disambiguator developed for this research also represents a departure from existing approaches to sense disambiguation. In our approach four individual disambiguation mechanisms are used to individually weight different senses of ambiguous terms. This allows the possibility of there being more than one correct sense. Our evaluation mechanism employs the Wall Street Journal text corpus and a set of TREC queries along with their relevance assessments in an ovrall document retrieval application. A traditional pattern matching tPIDF system is used as a baseline system in our evaluation experiments. The results indicate firstly that our WordNet derived KB is capable of being used as a controlled vocabulary and secondly that our approaches to estimating semantic similarity operate well at their intended concept level. However, it is more difficult to arrive at conclusive interpretations of the results with regard to the application of our semantic based systems to the complex task of document retrieval. A more complete evaluation is left as a topic for future research

    Malay documents clustering algorithm based on singular value decomposition.

    Get PDF
    Document categorization is a widely researched area of information retrieval. A research on Malay natural language processing has been done up to the level of retrieving documents but not to the extent of automatic semantic categorization. Thus, an approach for the clustering of Malay documents based on semantic relations between words is proposed in this paper. The method described in this paper uses Singular Value Decomposition (SVD) technique for the vector representation of each document where familiar clustering techniques can be applied in this space. The experimental results we obtained taking into account the semantics of the document that performed good document clustering by obtaining relevant subjects appearing in a cluster

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
    corecore