82 research outputs found

    Introspective knowledge acquisition for case retrieval networks in textual case base reasoning.

    Get PDF
    Textual Case Based Reasoning (TCBR) aims at effective reuse of information contained in unstructured documents. The key advantage of TCBR over traditional Information Retrieval systems is its ability to incorporate domain-specific knowledge to facilitate case comparison beyond simple keyword matching. However, substantial human intervention is needed to acquire and transform this knowledge into a form suitable for a TCBR system. In this research, we present automated approaches that exploit statistical properties of document collections to alleviate this knowledge acquisition bottleneck. We focus on two important knowledge containers: relevance knowledge, which shows relatedness of features to cases, and similarity knowledge, which captures the relatedness of features to each other. The terminology is derived from the Case Retrieval Network (CRN) retrieval architecture in TCBR, which is used as the underlying formalism in this thesis applied to text classification. Latent Semantic Indexing (LSI) generated concepts are a useful resource for relevance knowledge acquisition for CRNs. This thesis introduces a supervised LSI technique called sprinkling that exploits class knowledge to bias LSI's concept generation. An extension of this idea, called Adaptive Sprinkling has been proposed to handle inter-class relationships in complex domains like hierarchical (e.g. Yahoo directory) and ordinal (e.g. product ranking) classification tasks. Experimental evaluation results show the superiority of CRNs created with sprinkling and AS, not only over LSI on its own, but also over state-of-the-art classifiers like Support Vector Machines (SVM). Current statistical approaches based on feature co-occurrences can be utilized to mine similarity knowledge for CRNs. However, related words often do not co-occur in the same document, though they co-occur with similar words. We introduce an algorithm to efficiently mine such indirect associations, called higher order associations. Empirical results show that CRNs created with the acquired similarity knowledge outperform both LSI and SVM. Incorporating acquired knowledge into the CRN transforms it into a densely connected network. While improving retrieval effectiveness, this has the unintended effect of slowing down retrieval. We propose a novel retrieval formalism called the Fast Case Retrieval Network (FCRN) which eliminates redundant run-time computations to improve retrieval speed. Experimental results show FCRN's ability to scale up over high dimensional textual casebases. Finally, we investigate novel ways of visualizing and estimating complexity of textual casebases that can help explain performance differences across casebases. Visualization provides a qualitative insight into the casebase, while complexity is a quantitative measure that characterizes classification or retrieval hardness intrinsic to a dataset. We study correlations of experimental results from the proposed approaches against complexity measures over diverse casebases

    Case-based approach to automated natural language generation for obituaries.

    Get PDF
    Automated generation of human readable text from structured information is challenging because grammatical rules are complex making good quality outputs difficult to achieve. Textual Case-Based Reasoning provides one approach in which the text from previously solved examples with similar inputs is reused as a template solution to generate text for the current problem. Natural Language Generation also poses a challenge when evaluating the quality of the text generated due to the high cost of human labelling and the variety in potential good quality solutions. In this paper, we propose two case-based approaches for reusing text to automatically generate an obituary from a set of input attribute-value pairs. The case-base is acquired by crawling and then tagging existing solutions published on the web to create cases as problem-solution pairs. We evaluate the quality of the text generation system with a novel unsupervised case alignment metric using normalised discounted cumulative gain which is compared to a supervised approach and human evaluation. Initial results show that our proposed evaluation measure is effective and correlates well with average attribute error evaluation which is a crude surrogate to human feedback. The system is being deployed in a real-world application with a startup company in Aberdeen to produce automated obituaries

    Knowledge Extraction and Summarization for Textual Case-Based Reasoning: A Probabilistic Task Content Modeling Approach

    Get PDF
    Case-Based Reasoning (CBR) is an Artificial Intelligence (AI) technique that has been successfully used for building knowledge systems for tasks/domains where different knowledge sources are easily available, particularly in the form of problem solving situations, known as cases. Cases generally display a clear distinction between different components of problem solving, for instance, components of the problem description and of the problem solution. Thus, an existing and explicit structure of cases is presumed. However, when problem solving experiences are stored in the form of textual narratives (in natural language), there is no explicit case structure, so that CBR cannot be applied directly. This thesis presents a novel approach for authoring cases from episodic textual narratives and organizing these cases in a case base structure that permits a better support for user goals. The approach is based on the following fundamental ideas: - CBR as a problem solving technique is goal-oriented and goals are realized by means of task strategies. - Tasks have an internal structure that can be represented in terms of participating events and event components. - Episodic textual narratives are not random containers of domain concept terms. Rather, the text can be considered as generated by the underlying task structure whose content they describe. The presented case base authoring process combines task knowledge with Natural Language Processing (NLP) techniques to perform the needed knowledge extraction and summarization

    TCBR-HMM: An HMM-based text classifier with a CBR system

    Get PDF
    This paper presents an innovative solution to model distributed adaptive systems in biomedical environments. We present an original TCBR-HMM (Text Case Based Reasoning-Hidden Markov Model) for biomedical text classification based on document content. The main goal is to propose a more effective classifier than current methods in this environment where the model needs to be adapted to new documents in an iterative learning frame. To demonstrate its achievement, we include a set of experiments, which have been performed on OSHUMED corpus. Our classifier is compared with Naive Bayes and SVM techniques, commonly used in text classification tasks. The results suggest that the TCBR-HMM Model is indeed more suitable for document classification. The model is empirically and statistically comparable to the SVM classifier and outperforms it in terms of time efficiency.Ministerio de Ciencia e Innovación | Ref. TIN2009-14057-C03-0

    Query Expansion: Is It Necessary In Textual Case-Based Reasoning?

    Get PDF
    Query expansion (QE) is the process of transforming a seed query to improve retrieval performance in information retrieval operations. It is often intended to overcome a vocabulary mismatch between the query and the document collection. Query expansion is known to improve retrieval effectiveness of some information retrieval systems, however, its effect in Textual Case-based reasoning (TCBR) which is closely related to the field of Information Retrieval has not been well studied. In this research, a TCBR System intended for storage and retrieval of Frequently Asked Questions (FAQs) named FAQCase was developed. Experiments were conducted to examine the effect of synonym-based query expansion on the system. The result has shown that there is significant retrieval improvement in FAQCase with query expansion over FAQCase without query expansion, in a situation where vocabulary mismatch between new questions and the stored FAQs is high.Keywords: Query expansion, Textual case-based reasoning, Word sense disambiguation, WordNetNigerian Journal of Basic and Applied Science (2011), 19 (2): 269-27

    Case acquisition from text: ontology-based information extraction with SCOOBIE for myCBR

    Get PDF
    myCBR is a freely available tool for rapid prototyping of similarity-based retrieval applications such as case-based product recommender systems. It provides easy-to-use model generation, data import, similarity modelling, explanation, and testing functionality together with comfortable graphical user interfaces. SCOOBIE is an ontology-based information extraction system, which uses symbolic background knowledge for extracting information from text. Extraction results depend on existing knowledge fragments. In this paper we show how to use SCOOBIE for generating cases from texts. More concrete we use ontologies of the Web of Data, published as so called Linked Data interlinked with myCBR’s case model. We present a way of formalising a case model as Linked Data ready ontology and connect it with other ontologies of the Web of Data in order to get richer cases

    Master of Science

    Get PDF
    thesisThe purpose of this study was to examine the speech/language skills of children with cleft palate and their noncleft peers at 39 months, profile the speech/language outcomes of children with cleft palate at 39 months, and extend previous studies examining pre- and postsurgery speech/language skills that predict later speech/language outcomes of children with cleft palate at 39 months. Participants included 66 children, 43 children with cleft palate and 23 noncleft children. Spontaneous speech/language samples were collected at 9 months, postsurgery (approximately 13 months), 21 months, and 39 months of age in the child's home during an interaction with the caregiver. Speech and language measures were calculated using computer software programs and hand calculations. Children were classified into one of the four speech/language outcome profiles using descriptive statistics. Results of the between-group comparisons revealed the children with cleft palate had fewer consonant sounds, produced less accurate consonants for the majority of the place and manner categories, and had lower mean length of utterances than their noncleft peers. Within-group comparisons revealed the risk factors gender, maternal education, and resonance were associated with poorer speech outcomes for children with cleft palate at 39 months. The profile normal velopharyngeal mechanism and delayed speech and/or language had the highest membership (41%). Correlations between pre- and postsurgery measures and later speech/language outcomes at 39 months revealed negative correlations between 9 month predictors and all outcome measures. All other predictors were positively correlated with the speech outcome measures at 39 months. True consonant inventory and stop production measures at 21 months were the best predictors of the profile normal velopharyngeal mechanism and normal speech/language. These results suggest that children with cleft palate have poorer speech/language outcomes than noncleft peers at 39 months of age. There is a need for children with cleft palate to receive earlier speech/language intervention to help them catch up with their noncleft peers. Finally, the strongest correlations were found between true consonant inventory and stop production at age 21 months, suggesting that 21 months is the best predictive age for speech and language outcomes at 39 months

    EGAL: Exploration Guided Active Learning for TCBR

    Get PDF
    The task of building labelled case bases can be approached using active learning (AL), a process which facilitates the labelling of large collections of examples with minimal manual labelling effort. The main challenge in designing AL systems is the development of a selection strategy to choose the most informative examples to manually label. Typical selection strategies use exploitation techniques which attempt to refine uncertain areas of the decision space based on the output of a classifier. Other approaches tend to balance exploitation with exploration, selecting examples from dense and interesting regions of the domain space. In this paper we present a simple but effective exploration only selection strategy for AL in the textual domain. Our approach is inherently case-based, using only nearest-neighbour-based density and diversity measures. We show how its performance is comparable to the more computationally expensive exploitation-based approaches and that it offers the opportunity to be classifier independent

    A Textual Case-Based Mobile Phone Diagnosis Support System

    Get PDF
    Java Cases and Ontology Libraries Integration for Building Reasoning Infrastructures (jCOLIBRI) is a framework which makes the development of Textual Case-Based Reasoning (CBR) applications easier by providing the preprocessing of text methods, textual similarity methods and appropriate representation for textual cases which are the major techniques needed in any CBR systems. In this paper, a Mobile Phone Diagnosis Support System is presented as an extension to jCOLIBRI which accepts a problem and reasons with cases to provide a solution related to a new given problem. Experimental evaluation using some set of problems shows that the developed system predicts the solution that is relatively closer to the user given mobile phone problem. The solution also provide the user valuable advise on how to go about solving the new problem

    Case reuse in textual case-based reasoning.

    Get PDF
    Text reuse involves reasoning with textual solutions of previous problems to solve new similar problems. It is an integral part of textual case-based reasoning (TCBR), which applies the CBR problem-solving methodology to situations where experiences are predominantly captured in text form. Here, we explore two key research questions in the context of textual reuse: firstly what parts of a solution are reusable given a problem and secondly how might these relevant parts be reused to generate a textual solution. Reasoning with text is naturally challenging and this is particularly so with text reuse. However significant inroads towards addressing this challenge was made possible with knowledge of problem-solution alignment. This knowledge allows us to identify specific parts of a textual solution that are linked to particular problem attributes or attribute values. Accordingly, a text reuse strategy based on implicit alignment is presented to determine textual solution constructs (words or phrases) that needs adapted. This addresses the question of what to reuse in solution texts and thereby forms the first contribution of this thesis. A generic architecture, the Case Retrieval Reuse Net (CR2N), is used to formalise the reuse strategy. Functionally, this architecture annotates textual constructs in a solution as reusable with adaptation or without adaptation. Key to this annotation is the discovery of reuse evidence mined from neighbourhood characteristics. Experimental results show significant improvements over a retrieve-only system and a baseline reuse technique. We also extended CR2N so that retrieval of similar cases is informed by solutions that are easiest to adapt. This is done by retrieving the top k cases based on their problem similarity and then determining the reusability of their solutions with respect to the target problem. Results from experiments show that reuse-guided retrieval outperforms retrieval without this guidance. Although CR2N exploits implicit alignment to aid text reuse, performance can be greatly improved if there is explicit alignment. Our second contribution is a method to form explicit alignment of structured problem attributes and values to sentences in a textual solution. Thereafter, compositional and transformational approaches to text reuse are introduced to address the question of how to reuse textual solutions. The main idea in the compositional approach is to generate a textual solution by using prototypical sentences across similar authors. While the transformation approach adapts the retrieved solution text by replacing sentences aligned to mismatched problem attributes using sentences from the neighbourhood. Experiments confirm the usefulness of these approaches through strong similarity between generated text and human references. The third and final contribution of this research is the use of Machine Translation (MT) evaluation metrics for TCBR. These metrics have been shown to correlate highly with human expert evaluation. In MT research, multiple human references are typically used as opposed to a single reference or solution per test case. An introspective approach to create multiple references for evaluation is presented. This is particularly useful for CBR domains where single reference cases (or cases with a single solution per problem) typically form the casebase. For such domains we show how multiple references can be generated by exploiting the CBR similarity assumption. Results indicate that TCBR systems evaluated with these MT metrics are closer to human judgements
    • …
    corecore