6 research outputs found

    Feature Selection and Generalisation for Retrieval of Textual Cases

    Get PDF
    Textual CBR systems solve problems by reusing experiences that are in textual form. Knowledge-rich comparison of textual cases remains an important challenge for these systems. However mapping text data into a structured case representation requires a signiïŹcant knowledge engineering effort. In this paper we look at automated acquisition of the case indexing vocabulary as a two step process involving feature selection followed by feature generalisation. Boosted decision stumps are employed as a means to select features that are predictive and relatively orthogonal. Association rule induction is employed to capture feature co-occurrence patterns. Generalised features are constructed by applying these rules. Essentially, rules preserve implicit semantic relationships between features and applying them has the desired effect of bringing together cases that would have otherwise been overlooked during case retrieval. Experiments with four textual data sets show signiïŹcant improvement in retrieval accuracy whenever generÂŹalised features are used. The results further suggest that boosted decision stumps with generalised features to be a promising combination

    Automatic case acquisition from texts for process-oriented case-based reasoning

    Get PDF
    This paper introduces a method for the automatic acquisition of a rich case representation from free text for process-oriented case-based reasoning. Case engineering is among the most complicated and costly tasks in implementing a case-based reasoning system. This is especially so for process-oriented case-based reasoning, where more expressive case representations are generally used and, in our opinion, actually required for satisfactory case adaptation. In this context, the ability to acquire cases automatically from procedural texts is a major step forward in order to reason on processes. We therefore detail a methodology that makes case acquisition from processes described as free text possible, with special attention given to assembly instruction texts. This methodology extends the techniques we used to extract actions from cooking recipes. We argue that techniques taken from natural language processing are required for this task, and that they give satisfactory results. An evaluation based on our implemented prototype extracting workflows from recipe texts is provided.Comment: Sous presse, publication pr\'evue en 201

    Textual case-based reasoning

    Get PDF
    The Knowledge Engineering Review, 20(3): pp. 255-260.This commentary provides a definition of textual case-based reasoning (TCBR) and surveys research contributions according to four research questions. We also describe how TCBR can be distinguished from text mining and information retrieval. We conclude with potential directions for TCBR research

    Integrating selection-based aspect sentiment and preference knowledge for social recommender systems.

    Get PDF
    Purpose: Recommender system approaches such as collaborative and content-based filtering rely on user ratings and product descriptions to recommend products. More recently, recommender system research has focussed on exploiting knowledge from user-generated content such as product reviews to enhance recommendation performance. The purpose of this paper is to show that the performance of a recommender system can be enhanced by integrating explicit knowledge extracted from product reviews with implicit knowledge extracted from analysis of consumer’s purchase behaviour. Design/methodology/approach: The authors introduce a sentiment and preference-guided strategy for product recommendation by integrating not only explicit, user-generated and sentiment-rich content but also implicit knowledge gleaned from users’ product purchase preferences. Integration of both of these knowledge sources helps to model sentiment over a set of product aspects. The authors show how established dimensionality reduction and feature weighting approaches from text classification can be adopted to weight and select an optimal subset of aspects for recommendation tasks. The authors compare the proposed approach against several baseline methods as well as the state-of-the-art better method, which recommends products that are superior to a query product. Findings: Evaluation results from seven different product categories show that aspect weighting and selection significantly improves state-of-the-art recommendation approaches. Research limitations/implications: The proposed approach recommends products by analysing user sentiment on product aspects. Therefore, the proposed approach can be used to develop recommender systems that can explain to users why a product is recommended. This is achieved by presenting an analysis of sentiment distribution over individual aspects that describe a given product. Originality/value: This paper describes a novel approach to integrate consumer purchase behaviour analysis and aspect-level sentiment analysis to enhance recommendation. In particular, the authors introduce the idea of aspect weighting and selection to help users identify better products. Furthermore, the authors demonstrate the practical benefits of this approach on a variety of product categories and compare the approach with the current state-of-the-art approaches

    Identifying facts for TCBR

    Get PDF
    Paper presented at The Sixth International Conference on Case-Based Reasoning, Chicago, IL.This paper explores a method to algorithmically distinguish case-specific facts from potentially reusable or adaptable elements of cases in a textual case-based reasoning (TCBR) system. In the legal domain, documents often contain casespecific facts mixed with case-neutral details of law, precedent, conclusions the attorneys reach by applying their interpretation of the law to the case facts, and other aspects of argumentation that attorneys could potentially apply to similar situations. The automated distinction of these two categories, namely facts and other elements, has the potential to improve quality of automated textual case acquisition. The goal is ultimately to distinguish case problem from solution. To separate fact from other elements, we use an information gain (IG) algorithm to identify words that serve as efficient markers of one or the other. We demonstrate that this technique can successfully distinguish case-specific fact paragraphs from others, and propose future work to overcome some of the limitations of this pilot project

    Réutilisation d'entités nommées pour la réponse au courriel

    Get PDF
    La rĂ©ponse automatique aux courriels est une solution envisagĂ©e pour faciliter le travail de certains services d’entreprises, tels que les services Ă  la clientĂšle ou les relations avec des investisseurs, services confrontĂ©s Ă  un grand nombre de courriels souvent rĂ©pĂ©titifs. Nous avons dĂ©cidĂ© d’adapter une approche de raisonnement Ă  base de cas (CBR - Case-Based Reasoning) pour confronter ce problĂšme. Cette approche vise Ă  rĂ©utiliser des messages antĂ©rieurs pour rĂ©pondre Ă  de nouveaux courriels, en sĂ©lectionnant une rĂ©ponse adĂ©quate parmi les messages archivĂ©s et en l’adaptant pour la rendre pertinent par rapport au contexte de la nouvelle requĂȘte. L’objectif de nos travaux est de dĂ©finir une dĂ©marche pour aider l’usager d’un systĂšme de rĂ©ponse au courriel Ă  rĂ©utiliser les entitĂ©s nommĂ©es de courriels antĂ©cĂ©dents. Cependant, les entitĂ©s nommĂ©es nĂ©cessitent une adaptation avant d’ĂȘtre rĂ©utilisĂ©es. Pour ce faire, nous effectuons deux tĂąches qui sont d’abord l’identification des portions modifiables du message antĂ©cĂ©dent et ensuite la sĂ©lection des portions qui seront adaptĂ©es pour construire la rĂ©ponse Ă  la requĂȘte. Les deux tĂąches nĂ©cessitent l’utilisation de connaissances. Notre problĂ©matique consiste Ă  dĂ©terminer si les approches adaptatives, basĂ©es sur des techniques d’apprentissage automatique permettent d’acquĂ©rir des connaissances pour rĂ©utiliser efficacement des entitĂ©s nommĂ©es. La premiĂšre tĂąche d’identification des portions modifiables s’apparente Ă  l’extraction d’information. Toutefois nous nous intĂ©ressons uniquement aux entitĂ©s nommĂ©es et Ă  leurs spĂ©cialisations. La seconde tĂąche, la sĂ©lection de portions Ă  adapter, correspond Ă  une catĂ©gorisation de textes dans laquelle nous utilisons la requĂȘte pour attribuer une classe Ă  la rĂ©ponse que nous devons construire. La classe nous indique quelles entitĂ©s doivent ĂȘtre adaptĂ©es. ii Nous avons Ă©tudiĂ©s et comparĂ©es diffĂ©rentes approches pour les deux tĂąches. Ainsi, nous avons testĂ©s pour l’extraction, les approches manuelle et automatiques, de haut en bas (top-down) et de bas vers le haut (bottom-up) sur un corpus de courriels. Les rĂ©sultats obtenus par l’approche manuelle sont excellents. Toutefois nous observons une dĂ©gradation pour les approches automatiques. Pour la catĂ©gorisation, Nous avons Ă©valuĂ© diffĂ©rentes reprĂ©sentations des textes et des mots, l’utilisation de poids pour ces derniers, et l’impact d’une compression obtenue par l’utilisation de rĂšgles d’association. Les rĂ©sultats obtenus sont gĂ©nĂ©ralement satisfaisants et nous indique que notre approche, composĂ©e des deux tĂąches dĂ©crites prĂ©cĂ©demment, pourrait s’appliquer Ă  notre problĂšme de rĂ©ponse automatique aux courriels.An automatic e-mail response system is a solution for improving the operations of certain business services, like customers’ services or investor relations. Those services are dealing with a large volume requests coming through e-mail messages, most of them being repetitive. We have decided to explore a CBR approach (Case-Based Reasoning) for this problem. Such an approach makes use of antecedent messages to respond to new incoming e-mails. Requests coming from customers or investors are often redundant; we could select an adequate answer among the archived messages, and then adapt it to make it coherent with the actual context of the new message request. In this project, we address the re-use problem, but more specifically the identification of named entity and their specialized roles. These entities are portions of text strongly depend on the context of the antecedent message, and hence need some adaptation to be re-used. We divide the reuse process in two tasks which are: a) the identification of modifiable portions of an antecedent message; b) the selection of portions to be adapted to build the answer of the request. For first task, we make use of information extraction techniques. But we will concentrate our efforts uniquely on the extraction of named entities and their specializations. For second task we make use of text classification techniques to decide which portions are subject to adaptation. This decision is based on the context of the request, words which compose it. We used different approaches for the two tasks. We tested manual and automatics top-down and bottom-up extraction techniques on an e-mail corpus for the identification of iv modifiable portions extraction task. Manual approach gives us excellent results. But, we notice a degradation of performance for automatic extraction techniques. For the selection of portions to be adapted, we compared made use of association rules and various word representation. Association rules use permits to compress data without degrades results a lot. Globally, results are good and indicate that our approach, desrcibes before, could be applied to our problem
    corecore