1,695 research outputs found
Graph ambiguity
In this paper, we propose a rigorous way to define the concept of ambiguity in the domain of graphs. In past studies, the classical definition of ambiguity has been derived starting from fuzzy set and fuzzy information theories. Our aim is to show that also in the domain of the graphs it is possible to derive a formulation able to capture the same semantic and mathematical concept. To strengthen the theoretical results, we discuss the application of the graph ambiguity concept to the graph classification setting, conceiving a new kind of inexact graph matching procedure. The results prove that the graph ambiguity concept is a characterizing and discriminative property of graphs. (C) 2013 Elsevier B.V. All rights reserved
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
Toward a multilevel representation of protein molecules: comparative approaches to the aggregation/folding propensity problem
This paper builds upon the fundamental work of Niwa et al. [34], which
provides the unique possibility to analyze the relative aggregation/folding
propensity of the elements of the entire Escherichia coli (E. coli) proteome in
a cell-free standardized microenvironment. The hardness of the problem comes
from the superposition between the driving forces of intra- and inter-molecule
interactions and it is mirrored by the evidences of shift from folding to
aggregation phenotypes by single-point mutations [10]. Here we apply several
state-of-the-art classification methods coming from the field of structural
pattern recognition, with the aim to compare different representations of the
same proteins gathered from the Niwa et al. data base; such representations
include sequences and labeled (contact) graphs enriched with chemico-physical
attributes. By this comparison, we are able to identify also some interesting
general properties of proteins. Notably, (i) we suggest a threshold around 250
residues discriminating "easily foldable" from "hardly foldable" molecules
consistent with other independent experiments, and (ii) we highlight the
relevance of contact graph spectra for folding behavior discrimination and
characterization of the E. coli solubility data. The soundness of the
experimental results presented in this paper is proved by the statistically
relevant relationships discovered among the chemico-physical description of
proteins and the developed cost matrix of substitution used in the various
discrimination systems.Comment: 17 pages, 3 figures, 46 reference
On the long-term correlations and multifractal properties of electric arc furnace time series
In this paper, we study long-term correlations and multifractal properties
elaborated from time series of three-phase current signals coming from an
industrial electric arc furnace plant. Implicit sinusoidal trends are suitably
detected by considering the scaling of the fluctuation functions. Time series
are then filtered via a Fourier-based analysis, removing hence such strong
periodicities. In the filtered time series we detected long-term, positive
correlations. The presence of positive correlations is in agreement with the
typical V--I characteristic (hysteresis) of the electric arc furnace, providing
thus a sound physical justification for the memory effects found in the current
time series. The multifractal signature is strong enough in the filtered time
series to be effectively classified as multifractal
Modeling and Recognition of Smart Grid Faults by a Combined Approach of Dissimilarity Learning and One-Class Classification
Detecting faults in electrical power grids is of paramount importance, either
from the electricity operator and consumer viewpoints. Modern electric power
grids (smart grids) are equipped with smart sensors that allow to gather
real-time information regarding the physical status of all the component
elements belonging to the whole infrastructure (e.g., cables and related
insulation, transformers, breakers and so on). In real-world smart grid
systems, usually, additional information that are related to the operational
status of the grid itself are collected such as meteorological information.
Designing a suitable recognition (discrimination) model of faults in a
real-world smart grid system is hence a challenging task. This follows from the
heterogeneity of the information that actually determine a typical fault
condition. The second point is that, for synthesizing a recognition model, in
practice only the conditions of observed faults are usually meaningful.
Therefore, a suitable recognition model should be synthesized by making use of
the observed fault conditions only. In this paper, we deal with the problem of
modeling and recognizing faults in a real-world smart grid system, which
supplies the entire city of Rome, Italy. Recognition of faults is addressed by
following a combined approach of multiple dissimilarity measures customization
and one-class classification techniques. We provide here an in-depth study
related to the available data and to the models synthesized by the proposed
one-class classifier. We offer also a comprehensive analysis of the fault
recognition results by exploiting a fuzzy set based reliability decision rule
Multifractal Characterization of Protein Contact Networks
The multifractal detrended fluctuation analysis of time series is able to
reveal the presence of long-range correlations and, at the same time, to
characterize the self-similarity of the series. The rich information derivable
from the characteristic exponents and the multifractal spectrum can be further
analyzed to discover important insights about the underlying dynamical process.
In this paper, we employ multifractal analysis techniques in the study of
protein contact networks. To this end, initially a network is mapped to three
different time series, each of which is generated by a stationary unbiased
random walk. To capture the peculiarities of the networks at different levels,
we accordingly consider three observables at each vertex: the degree, the
clustering coefficient, and the closeness centrality. To compare the results
with suitable references, we consider also instances of three well-known
network models and two typical time series with pure monofractal and
multifractal properties. The first result of notable interest is that time
series associated to proteins contact networks exhibit long-range correlations
(strong persistence), which are consistent with signals in-between the typical
monofractal and multifractal behavior. Successively, a suitable embedding of
the multifractal spectra allows to focus on ensemble properties, which in turn
gives us the possibility to make further observations regarding the considered
networks. In particular, we highlight the different role that small and large
fluctuations of the considered observables play in the characterization of the
network topology
A generative model for protein contact networks
In this paper we present a generative model for protein contact networks. The
soundness of the proposed model is investigated by focusing primarily on
mesoscopic properties elaborated from the spectra of the graph Laplacian. To
complement the analysis, we study also classical topological descriptors, such
as statistics of the shortest paths and the important feature of modularity.
Our experiments show that the proposed model results in a considerable
improvement with respect to two suitably chosen generative mechanisms,
mimicking with better approximation real protein contact networks in terms of
diffusion properties elaborated from the Laplacian spectra. However, as well as
the other considered models, it does not reproduce with sufficient accuracy the
shortest paths structure. To compensate this drawback, we designed a second
step involving a targeted edge reconfiguration process. The ensemble of
reconfigured networks denotes improvements that are statistically significant.
As a byproduct of our study, we demonstrate that modularity, a well-known
property of proteins, does not entirely explain the actual network architecture
characterizing protein contact networks. In fact, we conclude that modularity,
intended as a quantification of an underlying community structure, should be
considered as an emergent property of the structural organization of proteins.
Interestingly, such a property is suitably optimized in protein contact
networks together with the feature of path efficiency.Comment: 18 pages, 67 reference
Cancer diagnosis using deep learning: A bibliographic review
In this paper, we first describe the basics of the field of cancer diagnosis, which includes steps of cancer diagnosis followed by the typical classification methods used by doctors, providing a historical idea of cancer classification techniques to the readers. These methods include Asymmetry, Border, Color and Diameter (ABCD) method, seven-point detection method, Menzies method, and pattern analysis. They are used regularly by doctors for cancer diagnosis, although they are not considered very efficient for obtaining better performance. Moreover, considering all types of audience, the basic evaluation criteria are also discussed. The criteria include the receiver operating characteristic curve (ROC curve), Area under the ROC curve (AUC), F1 score, accuracy, specificity, sensitivity, precision, dice-coefficient, average accuracy, and Jaccard index. Previously used methods are considered inefficient, asking for better and smarter methods for cancer diagnosis. Artificial intelligence and cancer diagnosis are gaining attention as a way to define better diagnostic tools. In particular, deep neural networks can be successfully used for intelligent image analysis. The basic framework of how this machine learning works on medical imaging is provided in this study, i.e., pre-processing, image segmentation and post-processing. The second part of this manuscript describes the different deep learning techniques, such as convolutional neural networks (CNNs), generative adversarial models (GANs), deep autoencoders (DANs), restricted Boltzmann’s machine (RBM), stacked autoencoders (SAE), convolutional autoencoders (CAE), recurrent neural networks (RNNs), long short-term memory (LTSM), multi-scale convolutional neural network (M-CNN), multi-instance learning convolutional neural network (MIL-CNN). For each technique, we provide Python codes, to allow interested readers to experiment with the cited algorithms on their own diagnostic problems. The third part of this manuscript compiles the successfully applied deep learning models for different types of cancers. Considering the length of the manuscript, we restrict ourselves to the discussion of breast cancer, lung cancer, brain cancer, and skin cancer. The purpose of this bibliographic review is to provide researchers opting to work in implementing deep learning and artificial neural networks for cancer diagnosis a knowledge from scratch of the state-of-the-art achievements
An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery
We propose a multi-agent algorithm able to automatically discover relevant
regularities in a given dataset, determining at the same time the set of
configurations of the adopted parametric dissimilarity measure yielding compact
and separated clusters. Each agent operates independently by performing a
Markovian random walk on a suitable weighted graph representation of the input
dataset. Such a weighted graph representation is induced by the specific
parameter configuration of the dissimilarity measure adopted by the agent,
which searches and takes decisions autonomously for one cluster at a time.
Results show that the algorithm is able to discover parameter configurations
that yield a consistent and interpretable collection of clusters. Moreover, we
demonstrate that our algorithm shows comparable performances with other similar
state-of-the-art algorithms when facing specific clustering problems
- …