149 research outputs found
Data granulation by the principles of uncertainty
Researches in granular modeling produced a variety of mathematical models,
such as intervals, (higher-order) fuzzy sets, rough sets, and shadowed sets,
which are all suitable to characterize the so-called information granules.
Modeling of the input data uncertainty is recognized as a crucial aspect in
information granulation. Moreover, the uncertainty is a well-studied concept in
many mathematical settings, such as those of probability theory, fuzzy set
theory, and possibility theory. This fact suggests that an appropriate
quantification of the uncertainty expressed by the information granule model
could be used to define an invariant property, to be exploited in practical
situations of information granulation. In this perspective, a procedure of
information granulation is effective if the uncertainty conveyed by the
synthesized information granule is in a monotonically increasing relation with
the uncertainty of the input data. In this paper, we present a data granulation
framework that elaborates over the principles of uncertainty introduced by
Klir. Being the uncertainty a mesoscopic descriptor of systems and data, it is
possible to apply such principles regardless of the input data type and the
specific mathematical setting adopted for the information granules. The
proposed framework is conceived (i) to offer a guideline for the synthesis of
information granules and (ii) to build a groundwork to compare and
quantitatively judge over different data granulation procedures. To provide a
suitable case study, we introduce a new data granulation technique based on the
minimum sum of distances, which is designed to generate type-2 fuzzy sets. We
analyze the procedure by performing different experiments on two distinct data
types: feature vectors and labeled graphs. Results show that the uncertainty of
the input data is suitably conveyed by the generated type-2 fuzzy set models.Comment: 16 pages, 9 figures, 52 reference
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
On the long-term correlations and multifractal properties of electric arc furnace time series
In this paper, we study long-term correlations and multifractal properties
elaborated from time series of three-phase current signals coming from an
industrial electric arc furnace plant. Implicit sinusoidal trends are suitably
detected by considering the scaling of the fluctuation functions. Time series
are then filtered via a Fourier-based analysis, removing hence such strong
periodicities. In the filtered time series we detected long-term, positive
correlations. The presence of positive correlations is in agreement with the
typical V--I characteristic (hysteresis) of the electric arc furnace, providing
thus a sound physical justification for the memory effects found in the current
time series. The multifractal signature is strong enough in the filtered time
series to be effectively classified as multifractal
Modeling and Recognition of Smart Grid Faults by a Combined Approach of Dissimilarity Learning and One-Class Classification
Detecting faults in electrical power grids is of paramount importance, either
from the electricity operator and consumer viewpoints. Modern electric power
grids (smart grids) are equipped with smart sensors that allow to gather
real-time information regarding the physical status of all the component
elements belonging to the whole infrastructure (e.g., cables and related
insulation, transformers, breakers and so on). In real-world smart grid
systems, usually, additional information that are related to the operational
status of the grid itself are collected such as meteorological information.
Designing a suitable recognition (discrimination) model of faults in a
real-world smart grid system is hence a challenging task. This follows from the
heterogeneity of the information that actually determine a typical fault
condition. The second point is that, for synthesizing a recognition model, in
practice only the conditions of observed faults are usually meaningful.
Therefore, a suitable recognition model should be synthesized by making use of
the observed fault conditions only. In this paper, we deal with the problem of
modeling and recognizing faults in a real-world smart grid system, which
supplies the entire city of Rome, Italy. Recognition of faults is addressed by
following a combined approach of multiple dissimilarity measures customization
and one-class classification techniques. We provide here an in-depth study
related to the available data and to the models synthesized by the proposed
one-class classifier. We offer also a comprehensive analysis of the fault
recognition results by exploiting a fuzzy set based reliability decision rule
A generative model for protein contact networks
In this paper we present a generative model for protein contact networks. The
soundness of the proposed model is investigated by focusing primarily on
mesoscopic properties elaborated from the spectra of the graph Laplacian. To
complement the analysis, we study also classical topological descriptors, such
as statistics of the shortest paths and the important feature of modularity.
Our experiments show that the proposed model results in a considerable
improvement with respect to two suitably chosen generative mechanisms,
mimicking with better approximation real protein contact networks in terms of
diffusion properties elaborated from the Laplacian spectra. However, as well as
the other considered models, it does not reproduce with sufficient accuracy the
shortest paths structure. To compensate this drawback, we designed a second
step involving a targeted edge reconfiguration process. The ensemble of
reconfigured networks denotes improvements that are statistically significant.
As a byproduct of our study, we demonstrate that modularity, a well-known
property of proteins, does not entirely explain the actual network architecture
characterizing protein contact networks. In fact, we conclude that modularity,
intended as a quantification of an underlying community structure, should be
considered as an emergent property of the structural organization of proteins.
Interestingly, such a property is suitably optimized in protein contact
networks together with the feature of path efficiency.Comment: 18 pages, 67 reference
Multifractal Characterization of Protein Contact Networks
The multifractal detrended fluctuation analysis of time series is able to
reveal the presence of long-range correlations and, at the same time, to
characterize the self-similarity of the series. The rich information derivable
from the characteristic exponents and the multifractal spectrum can be further
analyzed to discover important insights about the underlying dynamical process.
In this paper, we employ multifractal analysis techniques in the study of
protein contact networks. To this end, initially a network is mapped to three
different time series, each of which is generated by a stationary unbiased
random walk. To capture the peculiarities of the networks at different levels,
we accordingly consider three observables at each vertex: the degree, the
clustering coefficient, and the closeness centrality. To compare the results
with suitable references, we consider also instances of three well-known
network models and two typical time series with pure monofractal and
multifractal properties. The first result of notable interest is that time
series associated to proteins contact networks exhibit long-range correlations
(strong persistence), which are consistent with signals in-between the typical
monofractal and multifractal behavior. Successively, a suitable embedding of
the multifractal spectra allows to focus on ensemble properties, which in turn
gives us the possibility to make further observations regarding the considered
networks. In particular, we highlight the different role that small and large
fluctuations of the considered observables play in the characterization of the
network topology
An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery
We propose a multi-agent algorithm able to automatically discover relevant
regularities in a given dataset, determining at the same time the set of
configurations of the adopted parametric dissimilarity measure yielding compact
and separated clusters. Each agent operates independently by performing a
Markovian random walk on a suitable weighted graph representation of the input
dataset. Such a weighted graph representation is induced by the specific
parameter configuration of the dissimilarity measure adopted by the agent,
which searches and takes decisions autonomously for one cluster at a time.
Results show that the algorithm is able to discover parameter configurations
that yield a consistent and interpretable collection of clusters. Moreover, we
demonstrate that our algorithm shows comparable performances with other similar
state-of-the-art algorithms when facing specific clustering problems
- …