64,626 research outputs found
A new fuzzy set merging technique using inclusion-based fuzzy clustering
This paper proposes a new method of merging parameterized fuzzy sets based on clustering in the parameters space, taking into account the degree of inclusion of each fuzzy set in the cluster prototypes. The merger method is applied to fuzzy rule base simplification by automatically replacing the fuzzy sets corresponding to a given cluster with that pertaining to cluster prototype. The feasibility and the performance of the proposed method are studied using an application in mobile robot navigation. The results indicate that the proposed merging and rule base simplification approach leads to good navigation performance in the application considered and to fuzzy models that are interpretable by experts. In this paper, we concentrate mainly on fuzzy systems with Gaussian membership functions, but the general approach can also be applied to other parameterized fuzzy sets
A conceptual approach to gene expression analysis enhanced by visual analytics
The analysis of gene expression data is a complex task for biologists wishing to understand the role of genes in the formation of diseases such as cancer. Biologists need greater support when trying to discover, and comprehend, new relationships within their data. In this paper, we describe an approach to the analysis of gene expression data where overlapping groupings are generated by Formal Concept Analysis and interactively analyzed in a tool called CUBIST. The CUBIST workflow involves querying a semantic database and converting the result into a formal context, which can be simplified to make it manageable, before it is visualized as a concept lattice and associated charts
Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns
Understanding customer buying patterns is of great interest to the retail
industry and has shown to benefit a wide variety of goals ranging from managing
stocks to implementing loyalty programs. Association rule mining is a common
technique for extracting correlations such as "people in the South of France
buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour
bread." Unfortunately, sifting through a high number of buying patterns is not
useful in practice, because of the predominance of popular products in the top
rules. As a result, a number of "interestingness" measures (over 30) have been
proposed to rank rules. However, there is no agreement on which measures are
more appropriate for retail data. Moreover, since pattern mining algorithms
output thousands of association rules for each product, the ability for an
analyst to rely on ranking measures to identify the most interesting ones is
crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a
framework that provides analysts with the ability to compare the outcome of
interestingness measures applied to buying patterns in the retail industry. We
report on how we used CAPA to compare 34 measures applied to over 1,800 stores
of Intermarch\'e, one of the largest food retailers in France
Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare
For the last years, time-series mining has become a challenging issue for
researchers. An important application lies in most monitoring purposes, which
require analyzing large sets of time-series for learning usual patterns. Any
deviation from this learned profile is then considered as an unexpected
situation. Moreover, complex applications may involve the temporal study of
several heterogeneous parameters. In that paper, we propose a method for mining
heterogeneous multivariate time-series for learning meaningful patterns. The
proposed approach allows for mixed time-series -- containing both pattern and
non-pattern data -- such as for imprecise matches, outliers, stretching and
global translating of patterns instances in time. We present the early results
of our approach in the context of monitoring the health status of a person at
home. The purpose is to build a behavioral profile of a person by analyzing the
time variations of several quantitative or qualitative parameters recorded
through a provision of sensors installed in the home
Pairwise gene GO-based measures for biclustering of high-dimensional expression data
Background: Biclustering algorithms search for groups of genes that share the same
behavior under a subset of samples in gene expression data. Nowadays, the biological
knowledge available in public repositories can be used to drive these algorithms to
find biclusters composed of groups of genes functionally coherent. On the other hand,
a distance among genes can be defined according to their information stored in Gene
Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each
pair of genes which establishes their functional similarity. A scatter search-based
algorithm that optimizes a merit function that integrates GO information is studied in
this paper. This merit function uses a term that addresses the information through a GO
measure.
Results: The effect of two possible different gene pairwise GO measures on the
performance of the algorithm is analyzed. Firstly, three well known yeast datasets with
approximately one thousand of genes are studied. Secondly, a group of human
datasets related to clinical data of cancer is also explored by the algorithm. Most of
these data are high-dimensional datasets composed of a huge number of genes. The
resultant biclusters reveal groups of genes linked by a same functionality when the
search procedure is driven by one of the proposed GO measures. Furthermore, a
qualitative biological study of a group of biclusters show their relevance from a cancer
disease perspective.
Conclusions: It can be concluded that the integration of biological information
improves the performance of the biclustering process. The two different GO measures
studied show an improvement in the results obtained for the yeast dataset. However, if
datasets are composed of a huge number of genes, only one of them really improves
the algorithm performance. This second case constitutes a clear option to explore
interesting datasets from a clinical point of view.Ministerio de Economía y Competitividad TIN2014-55894-C2-
- …