17 research outputs found

    FedQAS: Privacy-Aware Machine Reading Comprehension with Federated Learning

    No full text
    Machine reading comprehension (MRC) of text data is a challenging task in Natural Language Processing (NLP), with a lot of ongoing research fueled by the release of the Stanford Question Answering Dataset (SQuAD) and Conversational Question Answering (CoQA). It is considered to be an effort to teach computers how to “understand” a text, and then to be able to answer questions about it using deep learning. However, until now, large-scale training on private text data and knowledge sharing has been missing for this NLP task. Hence, we present FedQAS, a privacy-preserving machine reading system capable of leveraging large-scale private data without the need to pool those datasets in a central location. The proposed approach combines transformer models and federated learning technologies. The system is developed using the FEDn framework and deployed as a proof-of-concept alliance initiative. FedQAS is flexible, language-agnostic, and allows intuitive participation and execution of local model training. In addition, we present the architecture and implementation of the system, as well as provide a reference evaluation based on the SQuAD dataset, to showcase how it overcomes data privacy issues and enables knowledge sharing between alliance members in a Federated learning setting

    Fouille de données et analyse de qualité des règles d’association dans les bases de données massives : Application dans le domaine de la sécurité routière

    No full text
    Knowledge discovery in databases (KDD), often called Data Mining. « Is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data ». Data mining is an active field of research aiming to exploit the vast amounts of data collected every day in various fields of computer science applications. This multidisciplinary field comes from artificial intelligence, statistics, and databases.In this thesis, we are interested in the problem of extracting association rules by introducing new algorithms and approaches. In general, an association rule is a conditional implication between sets of binary attributes called items. The extraction of such rules is composed of two main steps which are the extraction of frequent itemsets and the generation of association rules from them. The complexity of each of these steps is exponential: the number of frequent itemsets is exponential, and the number of association rules extracted can be very high, due to the quality measures used. In the literature, the extraction of the association rules is composed of two main difficulties, the response time and the memory space.To overcome these difficulties, we propose in this thesis three main contributions respectively allowing the extraction of relevant association rules, the integration of the spatial component into the extraction process, and mining relevant association rules from big data. In the first contribution, we propose an extraction approach of the relevant association rules based on multicriteria decision analysis. Then, in the second contribution, we propose an efficient algorithm for extracting spatial predicates from which frequent sets of items and spatial association rules can be generated based on the preparation of the spatial context and the fuzzy set theory. We also proposed in the third contribution a distributed algorithm for the extraction of association rules from Big Data. Using these contributions, we were able to extract the relevant association rules and reduce the execution time and memory space.Besides, to test concretely the contribution of the proposed solutions, we designed and developed a software prototype consisting of three interfaces. The first entitled ARM interface, is an interactive web interface dedicated to the extraction of association rules. The second interface, entitled MCDA interface, it is an interactive web interface dedicated to the evaluation and extraction of relevant association rules. For the last one, entitledTime Series Forecasting, is an interactive web interface dedicated to the prediction of road accidents. Moreover, interactive and user-friendly interfaces have been developed by using R language and rshiny. Finally, the experiments conducted on some databases on road accidents in Morocco show the significant feasibility of our contributions.L'extraction de connaissances dans les bases de données (ECD), également appelée fouille de données, « désigne le processus non trivial d'extraction d'information implicite, précédemment inconnue et potentiellement utile ». La fouille de données est un domaine de recherche en plein essor visant à exploiter les grandes quantités de données collectées chaque jour dans divers domaines d'application de l'informatique. Ce domaine pluridisciplinaire est issu de l'intelligence artificielle, des statistiques et des bases de données.Dans ce travail, nous nous intéressons au problème de l'extraction des règles d'association en introduisant de nouveaux algorithmes et approches d'aide à la décision multicritère. D'une manière générale, une règle d'association est une implication conditionnelle entre des ensembles d'attributs binaires appelés items. L'extraction de telles règles est décomposée en deux étapes principales, à savoir l'extraction des itemsets fréquents et la génération des règles d'association à partir de ceux-ci. Dans la majorité des approches existantes dans la littérature, l'extraction des règles d'association présente trois difficultés majeurs, à savoir; la qualité des règles extraites, l'aspect spatiale de données et le temps de réponse des algorithmes d'extraction.Pour surmonter ces difficultés, nous proposons dans cette thèse l'intégration de l'analyse multicritère au processus d'extraction des règles d'association pour l'analyse de la qualité. Ensuite, afin de prendre en considération l'aspect spatiale de données, et plus précisément l'estimation des distances métriques, nous avons proposé l'utilisation de la logique floue. Nous avons proposé également une intégration de l'algorithme FP-growth dans un environnement du Big Data pour l'extraction des règles d'association dans les bases de données massives.En plus, en vue de tester concrètement l'apport des solutions proposées, nous avons conçu et développé un prototype logiciel constitué de trois interfaces interactives. La première intitulée interface ARM, est une interface web dédiée à l'extraction des règles d'association. La deuxième interface, intitulée interface MCDA, est une interface web dédiée à l'analyse de qualité des règles d'association extraites. Quant à la dernière, intitulée Time Series Forcasting, est une interface web dédiée à la prédiction des accidents routières en termes du nombre de blessures et décès. Ces interfaces interactives d'exploration de données ont été développées en utilisant le langage R et rshiny. En fin, les expérimentations menées sur quelques bases de données relatives aux accidents routières au Maroc montrent la faisabilité notable de nos contributions

    KBot : a Knowledge graph based chatBot for natural language understanding over linked data

    No full text
    With the rapid progress of the semantic web, a huge amount of structured data has become available on the web in the form of knowledge bases (KBs). Making these data accessible and useful for end-users is one of the main objectives of chatbots over linked data. Building a chatbot over linked data raises different challenges, including user queries understanding, multiple knowledge base support, and multilingual aspect. To address these challenges, we first design and develop an architecture to provide an interactive user interface. Secondly, we propose a machine learning approach based on intent classification and natural language understanding to understand user intents and generate SPARQL queries. We especially process a new social network dataset (i.e., myPersonality) and add it to the existing knowledge bases to extend the chatbot capabilities by understanding analytical queries. The system can be extended with a new domain on-demand, flexible, multiple knowledge base, multilingual, and allows intuitive creation and execution of different tasks for an extensive range of topics. Furthermore, evaluation and application cases in the chatbot are provided to show how it facilitates interactive semantic data towards different real application scenarios and showcase the proposed approach for a knowledge graph and data-driven chatbot

    DM-MCDA: A web-based platform for data mining and multiple criteria decision analysis : A case study on road accident

    No full text
    Today's ultra-connected world is generating a huge amount of data stored in databases and cloud environment especially in the era of transportation. These databases need to be processed and analyzed to extract useful information and present it as a valid element for transportation managers for further use, such as road safety, shipping delays, and shipping optimization. The potential of data mining algorithms is largely untapped, this paper shows large-scale techniques such as associations rule analysis, multiple criteria analysis, and time series to improve road safety by identifying hot-spots in advance and giving chance to drivers to avoid the dangers. Indeed, we proposed a framework DM-MCDA based on association rules mining as a preliminary task to extract relationships between variables related to a road accident, and then integrate multiple criteria analysis to help decision-makers to make their choice of the most relevant rules. The developed system is flexible and allows intuitive creation and execution of different algorithms for an extensive range of road traffic topics. DM-MCDA can be expanded with new topics on demand, rendering knowledge extraction more robust and provide meaningful information that could help in developing suitable policies for decision-makers

    An improved approach for association rule mining using a multi-criteria decision support system: a case study in road safety

    No full text
    Abstract Purpose Road accidents have come to be considered a major public health problem worldwide. The aim of many studies is therefore to identify the main factors contributing to the severity of crashes. Methods This paper examines a large-scale data mining technique known as association rule mining, which can predict future accidents in advance and allow drivers to avoid the dangers. However, this technique produces a very large number of decision rules, preventing decision makers from making their own selection of the most relevant rules. In this context, the integration of a multi-criteria decision analysis approach would be particularly useful for decision makers affected by the redundancy of the extracted rules. Conclusion An analysis of road accidents in the province of Marrakech (Morocco) between 2004 and 2014 shows that the proposed approach serves this purpose; it may provide meaningful information that could help in developing suitable prevention policies to improve road safety

    WINFRA : A Web-Based Platform for Semantic Data Retrieval and Data Analytics

    No full text
    Given the huge amount of heterogeneous data stored in different locations, it needs to be federated and semantically interconnected for further use. This paper introduces WINFRA, a comprehensive open-access platform for semantic web data and advanced analytics based on natural language processing (NLP) and data mining techniques (e.g., association rules, clustering, classification based on associations). The system is designed to facilitate federated data analysis, knowledge discovery, information retrieval, and new techniques to deal with semantic web and knowledge graph representation. The processing step integrates data from multiple sources virtually by creating virtual databases. Afterwards, the developed RDF Generator is built to generate RDF files for different data sources, together with SPARQL queries, to support semantic data search and knowledge graph representation. Furthermore, some application cases are provided to demonstrate how it facilitates advanced data analytics over semantic data and showcase our proposed approach toward semantic association rules

    Multi-agent-based modeling for extracting relevant association rules using a multi-criteria analysis approach

    No full text
    Abstract Recently, association rule mining plays a vital role in knowledge discovery in database. In fact, in most cases, the real datasets lead to a very large number of rules, which do not allow users to make their own selection of the most relevant. The difficult task is mining useful and non-redundant rules. Several approaches have been proposed, such as rule clustering, informative cover method and quality measurements. Another way to selecting relevant association rules, we believe that it is necessary to integrate a decisional approach within the knowledge discovery process. Therefore, in this paper, we propose an approach to discover a category of relevant association rules based on multi-criteria analysis. In other side, the general process of association rules extraction becomes more and more complex, to solve such problem, we also proposed a multi-agent system for modeling the different process of our proposed approach. Therefore, we conclude our work by an empirical study applied to a set of banking data to illustrate the performance of our approach
    corecore