    A hybrid strategy for privacy-preserving recommendations for mobile shopping

    To calculate recommendations, recommender systems col-lect and store huge amounts of users ’ personal data such as preferences, interaction behavior, or demographic infor-mation. If these data are used for other purposes or get into the wrong hands, the privacy of the users can be com-promised. Thus, service providers are confronted with the challenge of o↵ering accurate recommendations without the risk of dissemination of sensitive information. This paper presents a hybrid strategy combining collaborative filtering and content-based techniques for mobile shopping with the primary aim of preserving the customer’s privacy. Detailed information about the customer, such as the shopping his-tory, is securely stored on the customer’s smartphone and locally processed by a content-based recommender. Data of individual shopping sessions, which are sent to the store backend for product association and comparison with simi-lar customers, are unlinkable and anonymous. No uniquely identifying information of the customer is revealed, making it impossible to associate successive shopping sessions at the store backend. Optionally, the customer can disclose demo-graphic data and a rudimentary explicit profile for further personalization

    Annotation en rôles sémantiques du français en domaine spécifique

    In this Natural Language Processing Ph. D. Thesis, we aim to perform semantic role labeling on French domain-specific texts. This task first disambiguates the sense of predicates in a given text and annotates its child chunks with semantic roles such as Agent, Patient or Destination. The task helps many applications in domains where annotated corpora exist, but is difficult to use otherwise. We first evaluate on the FrameNet corpus an existing method based on VerbNet, which explains why the method is domain-independant. We show that substantial improvements can be obtained. We first use syntactic information by handling the passive voice. Next, we use semantic informations by taking advantage of the selectional restrictions present in VerbNet. To apply this method to French, we first translate lexical resources. We first translate the WordNet lexical database. Next, we translate the VerbNet lexicon which is organized semantically using syntactic information. We obtain its translation, VerbeNet, by reusing two French verb lexicons (the Lexique-Grammaire and Les Verbes Français) and by manually modifying and reorganizing the resulting lexicon. Finally, once those building blocks are in place, we evaluate the feasibility of semantic role labeling of French and English in three specific domains. We study the pros and cons of using VerbNet and VerbeNet to annotate those domains before explaining our future work.Cette thèse de Traitement Automatique des Langues a pour objectif l'annotation automatique en rôles sémantiques du français en domaine spécifique. Cette tâche désambiguïse le sens des prédicats d'un texte et annote les syntagmes liés avec des rôles sémantiques tels qu'Agent, Patient ou Destination. Elle aide de nombreuses applications dans les domaines où des corpus annotés existent, mais est difficile à utiliser quand ce n'est pas le cas. Nous avons d'abord évalué sur le corpus FrameNet une méthode existante d'annotation basée uniquement sur VerbNet et donc indépendante du domaine considéré. Nous montrons que des améliorations conséquentes peuvent être obtenues à la fois d'un point de vue syntaxique avec la prise en compte de la voix passive et d'un point de vue sémantique en utilisant les restrictions de sélection indiquées dans VerbNet. Pour utiliser cette méthode en français, nous traduisons deux ressources lexicales anglaises. Nous commençons par la base de données lexicales WordNet. Nous traduisons ensuite le lexique VerbNet dans lequel les verbes sont regroupés sémantiquement grâce à leurs traits syntaxiques. La traduction, VerbeNet, a été obtenue en réutilisant deux lexiques verbaux du français (le Lexique-Grammaire et Les Verbes Français) puis en modifiant manuellement l'ensemble des informations obtenues. Enfin, une fois ces briques en place, nous évaluons la faisabilité de l'annotation en rôles sémantiques en anglais et en français dans trois domaines spécifiques. Nous évaluons quels sont les avantages et inconvénients de se baser sur VerbNet et VerbeNet pour annoter ces domaines, avant d'indiquer nos perspectives pour poursuivre ces travaux

    Tipping the scales: exploring the added value of deep semantic processing on readability prediction and sentiment analysis

    Applications which make use of natural language processing (NLP) are said to benefit more from incorporating a rich model of text meaning than from a basic representation in the form of bag-of-words. This thesis set out to explore the added value of incorporating deep semantic information in two end-user applications that normally rely mostly on superficial and lexical information, viz. readability prediction and aspect-based sentiment analysis. For both applications we apply supervised machine learning techniques and focus on the incorporation of coreference and semantic role information. To this purpose, we adapted a Dutch coreference resolution system and developed a semantic role labeler for Dutch. We tested the cross-genre robustness of both systems and in a next phase retrained them on a large corpus comprising a variety of text genres. For the readability prediction task, we first built a general-purpose corpus consisting of a large variety of text genres which was then assessed on readability. Moreover, we proposed an assessment technique which has not previously been used in readability assessment, namely crowdsourcing, and revealed that crowdsourcing is a viable alternative to the more traditional assessment technique of having experts assign labels. We built the first state-of-the-art classification-based readability prediction system relying on a rich feature space of traditional, lexical, syntactic and shallow semantic features. Furthermore, we enriched this tool by introducing new features based on coreference resolution and semantic role labeling. We then explored the added value of incorporating this deep semantic information by performing two different rounds of experiments. In the first round these features were manually in- or excluded and in the second round joint optimization experiments were performed using a wrapper-based feature selection system based on genetic algorithms. In both setups, we investigated whether there was a difference in performance when these features were derived from gold standard information compared to when they were automatically generated, which allowed us to assess the true upper bound of incorporating this type of information. Our results revealed that readability classification definitely benefits from the incorporation of semantic information in the form of coreference and semantic role features. More precisely, we found that the best results for both tasks were achieved after jointly optimizing the hyperparameters and semantic features using genetic algorithms. Contrary to our expectations, we observed that our system achieved its best performance when relying on the automatically predicted deep semantic features. This is an interesting result, as our ultimate goal is to predict readability based exclusively on automatically-derived information sources. For the aspect-based sentiment analysis task, we developed the first Dutch end-to-end system. We therefore collected a corpus of Dutch restaurant reviews and annotated each review with aspect term expressions and polarity. For the creation of our system, we distinguished three individual subtasks: aspect term extraction, aspect category classification and aspect polarity classification. We then investigated the added value of our two semantic information layers in the second subtask of aspect category classification. In a first setup, we focussed on investigating the added value of performing coreference resolution prior to classification in order to derive which implicit aspect terms (anaphors) could be linked to which explicit aspect terms (antecedents). In these experiments, we explored how the performance of a baseline classifier relying on lexical information alone would benefit from additional semantic information in the form of lexical-semantic and semantic role features. We hypothesized that if coreference resolution was performed prior to classification, more of this semantic information could be derived, i.e. for the implicit aspect terms, which would result in a better performance. In this respect, we optimized our classifier using a wrapper-based approach for feature selection and we compared a setting where we relied on gold-standard anaphor-antecedent pairs to a setting where these had been predicted. Our results revealed a very moderate performance gain and underlined that incorporating coreference information only proves useful when integrating gold-standard coreference annotations. When coreference relations were derived automatically, this led to an overall decrease in performance because of semantic mismatches. When comparing the semantic role to the lexical-semantic features, it seemed that especially the latter features allow for a better performance. In a second setup, we investigated how to resolve implicit aspect terms. We compared a setting where gold-standard coreference resolution was used for this purpose to a setting where the implicit aspects were derived from a simple subjectivity heuristic. Our results revealed that using this heuristic results in a better coverage and performance, which means that, overall, it was difficult to find an added value in resolving coreference first. Does deep semantic information help tip the scales on performance? For Dutch readability prediction, we found that it does, when integrated in a state-of-the-art classifier. By using such information for Dutch aspect-based sentiment analysis, we found that this approach adds weight to the scales, but cannot make them tip

    Exploiting frameNet for content-based book recommendation

    Adding semantic knowledge to a content-based recommender helps to better understand the items and user representations. Most recent research has focused on examining the added value of adding semantic features based on structured web data, in particular Linked Open Data (LOD). In this paper, we focus in contrast on semantic feature construction from text, by incorporating features based on semantic frames into a book recommendation classifier. To this purpose we leverage the semantic frames based on parsing the plots of the items under consideration with a state-of-the-art semantic parser. By investigating this type of semantic information, we show that these frames are also able to represent information about a particular book, but without the need of having explicitly structured data describing the books available. We reveal that exploiting frame information outperforms a basic bag-of-words approach and that especially the words relating to those frames are beneficial for classification. In a final step we compare and combine our system with the LOD features from a system leveraging DBpedia as knowledge resource. We show that both approaches yield similar results and reveal that combining semantic information from these two different sources might even be beneficial