    An Adaptive Contextual Recommender System: a Slow Intelligence Perspective

    This paper introduces an Adaptive Context Aware Recommender system based on the Slow Intelligence approach. The system is made available to the user as an adaptive mobile application, which allows a high degree of customization in recommending services and resources according to his/her current position and global profile. A case study applied to the town of Pittsburgh has been analyzed considering various users (with different profiles as visitors, students, professors) and an experimental campaign has been conducted obtaining interesting result

    Coping with Data Scarcity: First Steps towards Word Expansion for a Chatbot in the Urban transportation Domain

    Hizkuntzaren Prozesamenduan (HP) zenbait arlotan hitzak erabili izan dira tradizionalki zabaltze-tekniken garapenean, hala nola Informazioaren Berreskurapenean (IB) edota Galdera-Erantzun (GE) sistemetan. Master tesi honek bi hurbilpen aurkezten ditu Elkarrizketa-Sistemen (ES) arloan zabaltze-teknikak garatze aldera, zehazkiago Donostiako (Gipuzkoa) hiri-garraiorako chatbot baten ulertze-modulua garatzera zuzendurik. Lehenengo hurbilpenak hitz-bektoreak erabiltzen ditu semantikoki antzekoak diren terminoak erauzteko, kasu honetan FastText-eko aurre-entreinaturiko embedding sorta espainieraz eta bigarren hurbiltzeak hitzen adiera-desanbiguazioa erabiltzen du sinonimoak datu-base lexiko baten bidez erauzteko, kasu honetan espainierazko WordNet-a. Horretarako, ataza kolaboratibo bat diseinatu da, non corpusa osatuko baitugu balizko-egoera erreal baten sarrerak jasoz. Bestalde, domeinuz kanpo dauden sarrerak identi katze aldera, bi esperimentu sorta garatu dira. Lehenengo fasean kali katze sistema bat garatu da, non corpuseko terminoak Term Frequency-Inverse Document Frequency (TF-IDF) erabiliz ordenatzen baitiren eta ondoren kali katze-sistema kosinu-antzekotasunaren bidez osatzen da. Bigarren faseak aurreko kali katze-sistema formalizatuko da, hiru datu-multzo prestatuz eta estrati katuz. Datu-multzo hauek erregresore lineal bat eta Kernel linealarekin euskarri bektoredun makina bat entreinatzeko erabili dira. Emaitzen arabera, aurre-entreinaturiko bektoreek leialtasun handiagoa daukate input errealari dagokionez. Hala ere, datu-base lexikoek estaldura linguistiko zabalagoa gehituko diote zabalduriko corpus hipotetikoari. Azkenik, domeinuaren diskriminazioari dagokionez, emaitzek TF-IDF-tik erauzitako termino gehienen zeukan datu-multzoa hobesten dute.Text expansion techniques have been used in some sub elds of Natural Language Processing (NLP) such as Information Retrieval or Question-Answering Systems. This Master's Thesis presents two approaches for expansion within the context of Dialogue Systems (DS), more precisely for the Natural Language Understanding (NLU) module of a chatbot for the urban transportation domain in San Sebastian (Gipuzkoa). The rst approach uses word vectors to obtain semantically similar terms while the second one involves synonym extraction from a lexical database. For this purpose, a corpus composed of real case scenario inputs has been exploited. Furthermore, the qualitative analysis of the implemented expansion techniques revealed a need to lter out-of-domain inputs. In relation to this problem, two di erent sets of experiments have been carried out. First, the feasibility of using Term Frequency-Inverse Document Frequency (TF-IDF) and cosine similarity as discrimination features was explored. Then, linear regression and Support Vector Machine (SVM) classi ers were trained and tested. Results show that pre-trained word embedding expansion constitutes a more loyal representation of real case scenario inputs, whereas lexical database expansion adds a wider linguistic coverage to a hypothetically expanded version of the corpus. For out-of-domain detection, increasing the number of features improves both, linear regression and SVM classi cation results

    A Semantic Index for Linked Open Data and Big Data Applications

    This work proposes a new approach to index multidimensional data based on kd-trees and proposes also a novel approach to query processing. The indexing data structure is distributed across a network of "peers", where each one hosts a part of the tree and uses message passing for communication among nodes. The advantages of this kind of approach are mainly two: it is possible to i) handle a larger number of nodes and points than a single peer based architecture and ii) to run in an efficient way the elaboration of multiple queries. In particular, we propose a novel version of the k-nearest neighbor algorithm that is able to start a query in a randomly chosen peer. Furthrmore, it returns the results without traverse the peer containing the root. Preliminary experiments demonstrated that on average in about 65% of cases a query starting in a random node, does not involve the peer containing the root of the tree. Also, on average in about 98% of cases, it returns the results without involving the root peer. This work also proposes an approach to cope with textual data and provides a way to perform semantic query over the text

    Contribution à l’amélioration de la recherche d’information par utilisation des méthodes sémantiques: application à la langue arabe

    Un système de recherche d’information est un ensemble de programmes et de modules qui sert à interfacer avec l’utilisateur, pour prendre et interpréter une requête, faire la recherche dans l’index et retourner un classement des documents sélectionnés à cet utilisateur. Cependant le plus grand challenge de ce système est qu’il doit faire face au grand volume d’informations multi modales et multilingues disponibles via les bases documentaires ou le web pour trouver celles qui correspondent au mieux aux besoins des utilisateurs. A travers ce travail, nous avons présenté deux contributions. Dans la première nous avons proposé une nouvelle approche pour la reformulation des requêtes dans le contexte de la recherche d’information en arabe. Le principe est donc de représenter la requête par un arbre sémantique pondéré pour mieux identifier le besoin d'information de l'utilisateur, dont les nœuds représentent les concepts (synsets) reliés par des relations sémantiques. La construction de cet arbre est réalisée par la méthode de la Pseudo-Réinjection de la Pertinence combinée à la ressource sémantique du WordNet Arabe. Les résultats expérimentaux montrent une bonne amélioration dans les performances du système de recherche d’information. Dans la deuxième contribution, nous avons aussi proposé une nouvelle approche pour la construction d’une collection de test de recherche d’information arabe. L'approche repose sur la combinaison de la méthode de la stratégie de Pooling utilisant les moteurs de recherches et l’algorithme Naïve-Bayes de classification par l’apprentissage automatique. Pour l’expérimentation nous avons créé une nouvelle collection de test composée d’une base documentaire de 632 documents et de 165 requêtes avec leurs jugements de pertinence sous plusieurs topics. L’expérimentation a également montré l’efficacité du classificateur Bayésien pour la récupération de pertinences des documents, encore plus, il a réalisé des bonnes performances après l’enrichissement sémantique de la base documentaire par le modèle word2vec

    BIM and Knowledge Based Risk Management System

    The use of Building Information Modelling (BIM) for construction project risk management has become a growing research trend. However, it was observed that BIM-based risk management has not been widely used in practice and two important gaps leading to this problem are: 1) very few theories exist that can explain how BIM can be aligned with traditional techniques and integrated into existing processes for project risk management; and 2) current BIM solutions have very limited support on risk communication and information management during the project development process. To overcome these limitations, this research proposes a new approach that two traditional risk management techniques, Risk Breakdown Structure (RBS) and Case-based Reasoning (CBR), can be integrated into BIM-based platforms and an active linkage between the risk information and BIM can be established to support the project lifecycle. The core motivations behind the proposed solution are: 1) a tailored RBS could be used as a knowledge-based approach to classify, store and manage the information of a risk database in a proper structure and risk information in RBS could be linked to BIM for review, visualisation and communication; and 2) knowledge and experience stored in past risk reports could contribute to avoiding similar risks in new situations and the most relevant cases can be linked to BIM to support decision making during the project lifecycle. The scope of this research is limited to bridge projects; however, the basic methods and principles could be also applied to other types of projects. This research is in three phases. In the first stage, this research analysed the conceptual separation of BIM and the linkage rules between different types of risk and BIM. Specifically, an integrated bridge information model was divided into four Level of Contents (LOCs) and six technical systems based on the analysis of the Industry Foundation Classes (IFC) specification, a critical review of previous studies and the author’s project experience. Then a knowledge-based risk database was developed through an extensive collection of risk data, a process of data mining, and further assessment and translation of the data. Built on the risk database, a tailored RBS was developed to categorise and manage this risk information and a set of linkage rules between the tailored RBS and the four LOCs and six technical systems of BIM was established. Secondly, to further implement the linkage rules, a novel method to link BIM, RBS, and Work Breakdown Structure (WBS) to be a risk management system was developed. A prototype system was created based on Navisworks and the Microsoft SQL Server to support the implementation of the proposed approach. The system allows not only the storage of risk information in a central database but also to link the related risk information in the BIM model for review, visualisation and simulation. Thirdly, to facilitate the use of previous knowledge and experience for BIM-based risk management, the research proposed an approach of combining the use of two Natural Language Processing (NLP) techniques, i.e. Vector Space Model (VSM) and semantic query expansion, and outlined a new framework for the risk case retrieval system. A prototype was developed using the Python programming language to support the implementation of the proposed method. Preliminary testing results show that the proposed system is capable of retrieving relevant cases automatically and to return, for example, the top 10 similar cases. The main contribution of this research is the approach of integrating RBS and CBR into BIM through active linkages. The practical significance of this research is that the proposed approach enables the development of BIM-based risk management software to improve the risk identification, analysis, and information management during the project development process. This research provides evidence that traditional techniques can be aligned with BIM for risk management. One significant advantage of the proposed method is to combine the benefits of both traditional techniques and BIM for lifecycle project risk management and have the minimum disruption to the existing working processes

    Quale biblioteca pubblica per il XXI secolo? Modelli e valutazione in una prospettiva comparata

    L’obiettivo del progetto di ricerca è stato quello di analizzare, mediante un approccio metodologico comparato, i principali modelli organizzativi e funzionali di biblioteca pubblica sviluppatisi nel panorama internazionale, nel tentativo di individuare le peculiarità della biblioteca pubblica italiana contemporanea. La prima parte del progetto ha previsto la definizione e la comparazione di alcuni dei modelli di biblioteca pubblica più noti in ambito internazionale (public library, médiathèque, biblioteca civica, dreigeteilte Bibliothek, fraktale Bibliothek, Idea Store, Four-spaces model), nati in contesti culturali e sociali storicamente determinati, che si sono evoluti nel tempo e hanno trovato, con i necessari e dovuti adattamenti, spazio e diffusione anche al di fuori dei loro confini cronologici e geografici. La seconda parte, muovendo dal dibattito più recente sull’identità della biblioteca pubblica nel XXI secolo e sulla sua evoluzione, si è concentrata su alcune delle realizzazioni di biblioteca più riuscite in Italia, per individuare tratti distintivi e comuni alle esperienze e ai modelli consolidatisi al di fuori del nostro paese e valutarne funzioni, servizi, risultati e impatto sociale nel contesto di riferimento. Ciò ha permesso di portare alla luce quei fattori contestuali che determinano le cause di successo o di insuccesso di ciascun modello, così da acquisire solidi strumenti di analisi per le biblioteche esistenti e di progettazione per nuove biblioteche

    Weighted Word Pairs for query expansion

    This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline