6 research outputs found
Feature Selection and Generalisation for Retrieval of Textual Cases
Textual CBR systems solve problems by reusing experiences that are in textual form. Knowledge-rich comparison of textual cases remains an important challenge for these systems. However mapping text data into a structured case representation requires a signiïŹcant knowledge engineering effort. In this paper we look at automated acquisition of the case indexing vocabulary as a two step process involving feature selection followed by feature generalisation. Boosted decision stumps are employed as a means to select features that are predictive and relatively orthogonal. Association rule induction is employed to capture feature co-occurrence patterns. Generalised features are constructed by applying these rules. Essentially, rules preserve implicit semantic relationships between features and applying them has the desired effect of bringing together cases that would have otherwise been overlooked during case retrieval. Experiments with four textual data sets show signiïŹcant improvement in retrieval accuracy whenever generÂŹalised features are used. The results further suggest that boosted decision stumps with generalised features to be a promising combination
Automatic case acquisition from texts for process-oriented case-based reasoning
This paper introduces a method for the automatic acquisition of a rich case
representation from free text for process-oriented case-based reasoning. Case
engineering is among the most complicated and costly tasks in implementing a
case-based reasoning system. This is especially so for process-oriented
case-based reasoning, where more expressive case representations are generally
used and, in our opinion, actually required for satisfactory case adaptation.
In this context, the ability to acquire cases automatically from procedural
texts is a major step forward in order to reason on processes. We therefore
detail a methodology that makes case acquisition from processes described as
free text possible, with special attention given to assembly instruction texts.
This methodology extends the techniques we used to extract actions from cooking
recipes. We argue that techniques taken from natural language processing are
required for this task, and that they give satisfactory results. An evaluation
based on our implemented prototype extracting workflows from recipe texts is
provided.Comment: Sous presse, publication pr\'evue en 201
Textual case-based reasoning
The Knowledge Engineering Review, 20(3): pp. 255-260.This commentary provides a definition of textual case-based reasoning (TCBR) and surveys
research contributions according to four research questions. We also describe how TCBR can be
distinguished from text mining and information retrieval. We conclude with potential directions for
TCBR research
Integrating selection-based aspect sentiment and preference knowledge for social recommender systems.
Purpose: Recommender system approaches such as collaborative and content-based filtering rely on user ratings and product descriptions to recommend products. More recently, recommender system research has focussed on exploiting knowledge from user-generated content such as product reviews to enhance recommendation performance. The purpose of this paper is to show that the performance of a recommender system can be enhanced by integrating explicit knowledge extracted from product reviews with implicit knowledge extracted from analysis of consumerâs purchase behaviour. Design/methodology/approach: The authors introduce a sentiment and preference-guided strategy for product recommendation by integrating not only explicit, user-generated and sentiment-rich content but also implicit knowledge gleaned from usersâ product purchase preferences. Integration of both of these knowledge sources helps to model sentiment over a set of product aspects. The authors show how established dimensionality reduction and feature weighting approaches from text classification can be adopted to weight and select an optimal subset of aspects for recommendation tasks. The authors compare the proposed approach against several baseline methods as well as the state-of-the-art better method, which recommends products that are superior to a query product. Findings: Evaluation results from seven different product categories show that aspect weighting and selection significantly improves state-of-the-art recommendation approaches. Research limitations/implications: The proposed approach recommends products by analysing user sentiment on product aspects. Therefore, the proposed approach can be used to develop recommender systems that can explain to users why a product is recommended. This is achieved by presenting an analysis of sentiment distribution over individual aspects that describe a given product. Originality/value: This paper describes a novel approach to integrate consumer purchase behaviour analysis and aspect-level sentiment analysis to enhance recommendation. In particular, the authors introduce the idea of aspect weighting and selection to help users identify better products. Furthermore, the authors demonstrate the practical benefits of this approach on a variety of product categories and compare the approach with the current state-of-the-art approaches
Identifying facts for TCBR
Paper presented at The Sixth International Conference on Case-Based Reasoning, Chicago, IL.This paper explores a method to algorithmically distinguish case-specific
facts from potentially reusable or adaptable elements of cases in a textual case-based
reasoning (TCBR) system. In the legal domain, documents often contain casespecific
facts mixed with case-neutral details of law, precedent, conclusions the
attorneys reach by applying their interpretation of the law to the case facts, and other
aspects of argumentation that attorneys could potentially apply to similar situations.
The automated distinction of these two categories, namely facts and other elements,
has the potential to improve quality of automated textual case acquisition. The goal
is ultimately to distinguish case problem from solution. To separate fact from other
elements, we use an information gain (IG) algorithm to identify words that serve as
efficient markers of one or the other. We demonstrate that this technique can
successfully distinguish case-specific fact paragraphs from others, and propose
future work to overcome some of the limitations of this pilot project
Réutilisation d'entités nommées pour la réponse au courriel
La rĂ©ponse automatique aux courriels est une solution envisagĂ©e pour faciliter le travail de certains services dâentreprises, tels que les services Ă la clientĂšle ou les relations avec des investisseurs, services confrontĂ©s Ă un grand nombre de courriels souvent rĂ©pĂ©titifs. Nous avons dĂ©cidĂ© dâadapter une approche de raisonnement Ă base de cas (CBR - Case-Based Reasoning) pour confronter ce problĂšme. Cette approche vise Ă rĂ©utiliser des messages antĂ©rieurs pour rĂ©pondre Ă de nouveaux courriels, en sĂ©lectionnant une rĂ©ponse adĂ©quate parmi les messages archivĂ©s et en lâadaptant pour la rendre pertinent par rapport au contexte de la nouvelle requĂȘte. Lâobjectif de nos travaux est de dĂ©finir une dĂ©marche pour aider lâusager dâun systĂšme de rĂ©ponse au courriel Ă rĂ©utiliser les entitĂ©s nommĂ©es de courriels antĂ©cĂ©dents. Cependant, les entitĂ©s nommĂ©es nĂ©cessitent une adaptation avant dâĂȘtre rĂ©utilisĂ©es. Pour ce faire, nous effectuons deux tĂąches qui sont dâabord lâidentification des portions modifiables du message antĂ©cĂ©dent et ensuite la sĂ©lection des portions qui seront adaptĂ©es pour construire la rĂ©ponse Ă la requĂȘte. Les deux tĂąches nĂ©cessitent lâutilisation de connaissances. Notre problĂ©matique consiste Ă dĂ©terminer si les approches adaptatives, basĂ©es sur des techniques dâapprentissage automatique permettent dâacquĂ©rir des connaissances pour rĂ©utiliser efficacement des entitĂ©s nommĂ©es. La premiĂšre tĂąche dâidentification des portions modifiables sâapparente Ă lâextraction dâinformation. Toutefois nous nous intĂ©ressons uniquement aux entitĂ©s nommĂ©es et Ă leurs spĂ©cialisations. La seconde tĂąche, la sĂ©lection de portions Ă adapter, correspond Ă une catĂ©gorisation de textes dans laquelle nous utilisons la requĂȘte pour attribuer une classe Ă la rĂ©ponse que nous devons construire. La classe nous indique quelles entitĂ©s doivent ĂȘtre adaptĂ©es. ii Nous avons Ă©tudiĂ©s et comparĂ©es diffĂ©rentes approches pour les deux tĂąches. Ainsi, nous avons testĂ©s pour lâextraction, les approches manuelle et automatiques, de haut en bas (top-down) et de bas vers le haut (bottom-up) sur un corpus de courriels. Les rĂ©sultats obtenus par lâapproche manuelle sont excellents. Toutefois nous observons une dĂ©gradation pour les approches automatiques. Pour la catĂ©gorisation, Nous avons Ă©valuĂ© diffĂ©rentes reprĂ©sentations des textes et des mots, lâutilisation de poids pour ces derniers, et lâimpact dâune compression obtenue par lâutilisation de rĂšgles dâassociation. Les rĂ©sultats obtenus sont gĂ©nĂ©ralement satisfaisants et nous indique que notre approche, composĂ©e des deux tĂąches dĂ©crites prĂ©cĂ©demment, pourrait sâappliquer Ă notre problĂšme de rĂ©ponse automatique aux courriels.An automatic e-mail response system is a solution for improving the operations of certain business services, like customersâ services or investor relations. Those services are dealing with a large volume requests coming through e-mail messages, most of them being repetitive. We have decided to explore a CBR approach (Case-Based Reasoning) for this problem. Such an approach makes use of antecedent messages to respond to new incoming e-mails. Requests coming from customers or investors are often redundant; we could select an adequate answer among the archived messages, and then adapt it to make it coherent with the actual context of the new message request. In this project, we address the re-use problem, but more specifically the identification of named entity and their specialized roles. These entities are portions of text strongly depend on the context of the antecedent message, and hence need some adaptation to be re-used. We divide the reuse process in two tasks which are: a) the identification of modifiable portions of an antecedent message; b) the selection of portions to be adapted to build the answer of the request. For first task, we make use of information extraction techniques. But we will concentrate our efforts uniquely on the extraction of named entities and their specializations. For second task we make use of text classification techniques to decide which portions are subject to adaptation. This decision is based on the context of the request, words which compose it. We used different approaches for the two tasks. We tested manual and automatics top-down and bottom-up extraction techniques on an e-mail corpus for the identification of iv modifiable portions extraction task. Manual approach gives us excellent results. But, we notice a degradation of performance for automatic extraction techniques. For the selection of portions to be adapted, we compared made use of association rules and various word representation. Association rules use permits to compress data without degrades results a lot. Globally, results are good and indicate that our approach, desrcibes before, could be applied to our problem