20 research outputs found
Flexible queries in XML native databases
To date, most of the XML native databases (DB) flexible querying systems are
based on exploiting the tree structure of their semi structured data (SSD).
However, it becomes important to test the efficiency of Formal Concept Analysis
(FCA) formalism for this type of data since it has been proved a great
performance in the field of information retrieval (IR). So, the IR in XML
databases based on FCA is mainly based on the use of the lattice structure.
Each concept of this lattice can be interpreted as a pair (response, query). In
this work, we provide a new flexible modeling of XML DB based on fuzzy FCA as a
first step towards flexible querying of SSD.Comment: 5 Pages, 1 Figur
Mining Semi-structured Data
The need for discovering knowledge from XML documents according to both
structure and content features has become challenging, due to the increase in
application contexts for which handling both structure and content information
in XML data is essential. So, the challenge is to find an hierarchical
structure which ensure a combination of data levels and their representative
structures. In this work, we will be based on the Formal Concept Analysis-based
views to index and query both content and structure. We evaluate given
structure in a querying process which allows the searching of user query
answers
Parallel architectures for fuzzy triadic similarity learning
In a context of document co-clustering, we define a new similarity measure
which iteratively computes similarity while combining fuzzy sets in a
three-partite graph. The fuzzy triadic similarity (FT-Sim) model can deal with
uncertainty offers by the fuzzy sets. Moreover, with the development of the Web
and the high availability of storage spaces, more and more documents become
accessible. Documents can be provided from multiple sites and make similarity
computation an expensive processing. This problem motivated us to use parallel
computing. In this paper, we introduce parallel architectures which are able to
treat large and multi-source data sets by a sequential, a merging or a
splitting-based process. Then, we proceed to a local and a central (or global)
computing using the basic FT-Sim measure. The idea behind these architectures
is to reduce both time and space complexities thanks to parallel computation
About Summarization in Large Fuzzy Databases
Moved by the need increased for modeling of the fuzzy data, the success of
the systems of exact generation of summary of data, we propose in this paper, a
new approach of generation of summary from fuzzy data called Fuzzy-SaintEtiQ.
This approach is an extension of the SaintEtiQ model to support the fuzzy data.
It presents the following optimizations such as 1) the minimization of the
expert risk; 2) the construction of a more detailed and more precise summaries
hierarchy, and 3) the co-operation with the user by giving him fuzzy summaries
in different hierarchical level
Flexible SQLf query based on fuzzy linguistic summaries
Data is often partially known, vague or ambiguous in many real world
applications. To deal with such imprecise information, fuzziness is introduced
in the classical model. SQLf is one of the practical language to deal with
flexible fuzzy querying in Fuzzy DataBases (FDB). However, with a huge amount
of fuzzy data, the necessity to work with synthetic views became a challenge
for many DB community researchers. The present work deals with Flexible SQLf
query based on fuzzy linguistic summaries. We use the fuzzy summaries produced
by our Fuzzy-SaintEtiq approach. It provides a description of objects depending
on the fuzzy linguistic labels specified as selection criteria
Towards a New Extracting and Querying Approach of Fuzzy Summaries
Diversification of DB applications highlighted the limitations of relational
database management system (RDBMS) particularly on the modeling plan. In fact,
in the real world, we are increasingly faced with the situation where
applications need to handle imprecise data and to offer a flexible querying to
their users. Several theoretical solutions have been proposed. However, the
impact of this work in practice remained negligible with the exception of a few
research prototypes based on the formal model GEFRED. In this chapter, the
authors propose a new approach for exploitation of fuzzy relational databases
(FRDB) described by the model GEFRED. This approach consists of 1) a new
technique for extracting summary fuzzy data, Fuzzy SAINTETIQ, based on the
classification of fuzzy data and formal concepts analysis; 2) an approach of
assessing flexible queries in the context of FDB based on the set of fuzzy
summaries generated by our fuzzy SAINTETIQ system; 3) an approach of repairing
and substituting unanswered query.Comment: 22 pages, 6 figures, 8 tables. Multidisciplinary Approaches to
Service-Oriented Engineering, 2018. arXiv admin note: text overlap with
arXiv:1401.049
Traitement approximatif des requ\^etes flexibles avec groupement d'attributs et jointure
This paper addresses the problem of approximate processing for flexible
queries in the form SELECT-FROM-WHERE-GROUP BY with join condition. It offers a
flexible framework for online aggregation while promoting response time at the
expense of result accuracy.Comment: in French. The 13\`eme Conf\'erence Francophone sur l'Extraction et
la Gestion des Connaissances (EGC), pp. 29-30, 201
Mod\`ele flou d'expression des pr\'ef\'erences bas\'e sur les CP-Nets
This article addresses the problem of expressing preferences in flexible
queries while basing on a combination of the fuzzy logic theory and Conditional
Preference Networks or CP-Nets.Comment: 2 pages, EGC 201
Dimensionality reduction with missing values imputation
In this study, we propose a new statical approach for high-dimensionality
reduction of heterogenous data that limits the curse of dimensionality and
deals with missing values. To handle these latter, we propose to use the Random
Forest imputation's method. The main purpose here is to extract useful
information and so reducing the search space to facilitate the data exploration
process. Several illustrative numeric examples, using data coming from publicly
available machine learning repositories are also included. The experimental
component of the study shows the efficiency of the proposed analytical
approach.Comment: 6 pages, 2 figures, The first Computer science University of Tunis El
Manar, PhD Symposium (CUPS'17), Tunisia, May 22-25, 201
Classification non supervis\'ee des donn\'ees h\'et\'erog\`enes \`a large \'echelle
When it comes to cluster massive data, response time, disk access and quality
of formed classes becoming major issues for companies. It is in this context
that we have come to define a clustering framework for large scale
heterogeneous data that contributes to the resolution of these issues. The
proposed framework is based on, firstly, the descriptive analysis based on MCA,
and secondly, the MapReduce paradigm in a large scale environment. The results
are encouraging and prove the efficiency of the hybrid deployment on response
quality and time component as on qualitative and quantitative data.Comment: 6 pages, in French, 8 figure