3,296 research outputs found
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
General fuzzy min-max neural network for clustering and classification
This paper describes a general fuzzy min-max (GFMM) neural network which is a generalization and extension of the fuzzy min-max clustering and classification algorithms of Simpson (1992, 1993). The GFMM method combines supervised and unsupervised learning in a single training algorithm. The fusion of clustering and classification resulted in an algorithm that can be used as pure clustering, pure classification, or hybrid clustering classification. It exhibits a property of finding decision boundaries between classes while clustering patterns that cannot be said to belong to any of existing classes. Similarly to the original algorithms, the hyperbox fuzzy sets are used as a representation of clusters and classes. Learning is usually completed in a few passes and consists of placing and adjusting the hyperboxes in the pattern space; this is an expansion-contraction process. The classification results can be crisp or fuzzy. New data can be included without the need for retraining. While retaining all the interesting features of the original algorithms, a number of modifications to their definition have been made in order to accommodate fuzzy input patterns in the form of lower and upper bounds, combine the supervised and unsupervised learning, and improve the effectiveness of operations. A detailed account of the GFMM neural network, its comparison with the Simpson's fuzzy min-max neural networks, a set of examples, and an application to the leakage detection and identification in water distribution systems are given
Unsupervised classification of data streams based on typicality and eccentricity data analytics
In this paper, we propose a novel approach to unsupervised and online data classification. The algorithm is based on the statistical analysis of selected features and development of a self-evolving fuzzy-rule-basis. It starts learning from an empty rule basis and, instead of offline training, it learns “on-the-fly”. It is free of parameters and, thus, fuzzy rules, number, size or radius of the classes do not need to be pre-defined. It is very suitable for the classification of online data streams with realtime constraints. The past data do not need to be stored in memory, since that the algorithm is recursive, which makes it memory and computational power efficient. It is able to handle concept-drift and concept-evolution due to its evolving nature, which means that, not only rules/classes can be updated, but new classes can be created as new concepts emerge from the data. It can perform fuzzy classification/soft-labeling, which is preferred over traditional crisp classification in many areas of application. The algorithm was validated with an industrial pilot plant, where online calculated period and amplitude of control signal were used as input to a fault diagnosis application. The approach, however, is generic and can be applied to different problems and with much higher dimensional inputs. The results obtained from the real data are very significant
- …