20 research outputs found

    Flexible queries in XML native databases

    Full text link
    To date, most of the XML native databases (DB) flexible querying systems are based on exploiting the tree structure of their semi structured data (SSD). However, it becomes important to test the efficiency of Formal Concept Analysis (FCA) formalism for this type of data since it has been proved a great performance in the field of information retrieval (IR). So, the IR in XML databases based on FCA is mainly based on the use of the lattice structure. Each concept of this lattice can be interpreted as a pair (response, query). In this work, we provide a new flexible modeling of XML DB based on fuzzy FCA as a first step towards flexible querying of SSD.Comment: 5 Pages, 1 Figur

    Mining Semi-structured Data

    Full text link
    The need for discovering knowledge from XML documents according to both structure and content features has become challenging, due to the increase in application contexts for which handling both structure and content information in XML data is essential. So, the challenge is to find an hierarchical structure which ensure a combination of data levels and their representative structures. In this work, we will be based on the Formal Concept Analysis-based views to index and query both content and structure. We evaluate given structure in a querying process which allows the searching of user query answers

    Parallel architectures for fuzzy triadic similarity learning

    Full text link
    In a context of document co-clustering, we define a new similarity measure which iteratively computes similarity while combining fuzzy sets in a three-partite graph. The fuzzy triadic similarity (FT-Sim) model can deal with uncertainty offers by the fuzzy sets. Moreover, with the development of the Web and the high availability of storage spaces, more and more documents become accessible. Documents can be provided from multiple sites and make similarity computation an expensive processing. This problem motivated us to use parallel computing. In this paper, we introduce parallel architectures which are able to treat large and multi-source data sets by a sequential, a merging or a splitting-based process. Then, we proceed to a local and a central (or global) computing using the basic FT-Sim measure. The idea behind these architectures is to reduce both time and space complexities thanks to parallel computation

    About Summarization in Large Fuzzy Databases

    Full text link
    Moved by the need increased for modeling of the fuzzy data, the success of the systems of exact generation of summary of data, we propose in this paper, a new approach of generation of summary from fuzzy data called Fuzzy-SaintEtiQ. This approach is an extension of the SaintEtiQ model to support the fuzzy data. It presents the following optimizations such as 1) the minimization of the expert risk; 2) the construction of a more detailed and more precise summaries hierarchy, and 3) the co-operation with the user by giving him fuzzy summaries in different hierarchical level

    Flexible SQLf query based on fuzzy linguistic summaries

    Full text link
    Data is often partially known, vague or ambiguous in many real world applications. To deal with such imprecise information, fuzziness is introduced in the classical model. SQLf is one of the practical language to deal with flexible fuzzy querying in Fuzzy DataBases (FDB). However, with a huge amount of fuzzy data, the necessity to work with synthetic views became a challenge for many DB community researchers. The present work deals with Flexible SQLf query based on fuzzy linguistic summaries. We use the fuzzy summaries produced by our Fuzzy-SaintEtiq approach. It provides a description of objects depending on the fuzzy linguistic labels specified as selection criteria

    Towards a New Extracting and Querying Approach of Fuzzy Summaries

    Full text link
    Diversification of DB applications highlighted the limitations of relational database management system (RDBMS) particularly on the modeling plan. In fact, in the real world, we are increasingly faced with the situation where applications need to handle imprecise data and to offer a flexible querying to their users. Several theoretical solutions have been proposed. However, the impact of this work in practice remained negligible with the exception of a few research prototypes based on the formal model GEFRED. In this chapter, the authors propose a new approach for exploitation of fuzzy relational databases (FRDB) described by the model GEFRED. This approach consists of 1) a new technique for extracting summary fuzzy data, Fuzzy SAINTETIQ, based on the classification of fuzzy data and formal concepts analysis; 2) an approach of assessing flexible queries in the context of FDB based on the set of fuzzy summaries generated by our fuzzy SAINTETIQ system; 3) an approach of repairing and substituting unanswered query.Comment: 22 pages, 6 figures, 8 tables. Multidisciplinary Approaches to Service-Oriented Engineering, 2018. arXiv admin note: text overlap with arXiv:1401.049

    Traitement approximatif des requ\^etes flexibles avec groupement d'attributs et jointure

    Full text link
    This paper addresses the problem of approximate processing for flexible queries in the form SELECT-FROM-WHERE-GROUP BY with join condition. It offers a flexible framework for online aggregation while promoting response time at the expense of result accuracy.Comment: in French. The 13\`eme Conf\'erence Francophone sur l'Extraction et la Gestion des Connaissances (EGC), pp. 29-30, 201

    Mod\`ele flou d'expression des pr\'ef\'erences bas\'e sur les CP-Nets

    Full text link
    This article addresses the problem of expressing preferences in flexible queries while basing on a combination of the fuzzy logic theory and Conditional Preference Networks or CP-Nets.Comment: 2 pages, EGC 201

    Dimensionality reduction with missing values imputation

    Full text link
    In this study, we propose a new statical approach for high-dimensionality reduction of heterogenous data that limits the curse of dimensionality and deals with missing values. To handle these latter, we propose to use the Random Forest imputation's method. The main purpose here is to extract useful information and so reducing the search space to facilitate the data exploration process. Several illustrative numeric examples, using data coming from publicly available machine learning repositories are also included. The experimental component of the study shows the efficiency of the proposed analytical approach.Comment: 6 pages, 2 figures, The first Computer science University of Tunis El Manar, PhD Symposium (CUPS'17), Tunisia, May 22-25, 201

    Classification non supervis\'ee des donn\'ees h\'et\'erog\`enes \`a large \'echelle

    Full text link
    When it comes to cluster massive data, response time, disk access and quality of formed classes becoming major issues for companies. It is in this context that we have come to define a clustering framework for large scale heterogeneous data that contributes to the resolution of these issues. The proposed framework is based on, firstly, the descriptive analysis based on MCA, and secondly, the MapReduce paradigm in a large scale environment. The results are encouraging and prove the efficiency of the hybrid deployment on response quality and time component as on qualitative and quantitative data.Comment: 6 pages, in French, 8 figure
    corecore