17,400 research outputs found
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Recommended from our members
Can graph-cutting improve microarray gene expression reconstructions?
Microarrays produce high-resolution image data that are, unfortunately, permeated with a great deal of “noise” that must be removed for precision purposes. This paper presents a technique for such a removal process. On completion of this non-trivial task, a new surface (devoid of gene spots) is subtracted from the original to render more precise gene expressions. The graph-cutting technique as implemented has the benefits that only the most appropriate pixels are replaced and these replacements are replicates rather than estimates. This means the influence of outliers and other artifacts are handled more appropriately (than in previous methods) as well as the variability of the final gene expressions being considerably reduced. Experiments are carried out to test the technique against commercial and previously researched reconstruction methods. Final results show that the graph-cutting inspired identification mechanism has a positive significant impact on reconstruction accuracy
Improved processing of microarray data using image reconstruction techniques
Spotted cDNA microarray data analysis suffers from various problems such as noise from a variety of sources, missing data, inconsistency, and, of course, the presence of outliers. This paper introduces a new method that dramatically reduces the noise when processing the original image data. The proposed approach recreates the microarray slide image, as it would have been with all the genes removed. By subtracting this background recreation from the original, the gene ratios can be calculated with more precision and less influence from outliers and other artifacts that would normally make the analysis of this data more difficult. The new technique is also beneficial, as it does not rely on the accurate fitting of a region to each gene, with its only requirement being an approximate coordinate. In experiments conducted, the new method was tested against one of the mainstream methods of processing spotted microarray images. Our method is shown to produce much less variation in gene measurements. This evidence is supported by clustering results that show a marked improvement in accuracy
Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods
We quantify the amount of information filtered by different hierarchical
clustering methods on correlations between stock returns comparing it with the
underlying industrial activity structure. Specifically, we apply, for the first
time to financial data, a novel hierarchical clustering approach, the Directed
Bubble Hierarchical Tree and we compare it with other methods including the
Linkage and k-medoids. In particular, by taking the industrial sector
classification of stocks as a benchmark partition, we evaluate how the
different methods retrieve this classification. The results show that the
Directed Bubble Hierarchical Tree can outperform other methods, being able to
retrieve more information with fewer clusters. Moreover, we show that the
economic information is hidden at different levels of the hierarchical
structures depending on the clustering method. The dynamical analysis on a
rolling window also reveals that the different methods show different degrees
of sensitivity to events affecting financial markets, like crises. These
results can be of interest for all the applications of clustering methods to
portfolio optimization and risk hedging.Comment: 31 pages, 17 figure
Hierarchical information clustering by means of topologically embedded graphs
We introduce a graph-theoretic approach to extract clusters and hierarchies
in complex data-sets in an unsupervised and deterministic manner, without the
use of any prior information. This is achieved by building topologically
embedded networks containing the subset of most significant links and analyzing
the network structure. For a planar embedding, this method provides both the
intra-cluster hierarchy, which describes the way clusters are composed, and the
inter-cluster hierarchy which describes how clusters gather together. We
discuss performance, robustness and reliability of this method by first
investigating several artificial data-sets, finding that it can outperform
significantly other established approaches. Then we show that our method can
successfully differentiate meaningful clusters and hierarchies in a variety of
real data-sets. In particular, we find that the application to gene expression
patterns of lymphoma samples uncovers biologically significant groups of genes
which play key-roles in diagnosis, prognosis and treatment of some of the most
relevant human lymphoid malignancies.Comment: 33 Pages, 18 Figures, 5 Table
Compressive Network Analysis
Modern data acquisition routinely produces massive amounts of network data.
Though many methods and models have been proposed to analyze such data, the
research of network data is largely disconnected with the classical theory of
statistical learning and signal processing. In this paper, we present a new
framework for modeling network data, which connects two seemingly different
areas: network data analysis and compressed sensing. From a nonparametric
perspective, we model an observed network using a large dictionary. In
particular, we consider the network clique detection problem and show
connections between our formulation with a new algebraic tool, namely Randon
basis pursuit in homogeneous spaces. Such a connection allows us to identify
rigorous recovery conditions for clique detection problems. Though this paper
is mainly conceptual, we also develop practical approximation algorithms for
solving empirical problems and demonstrate their usefulness on real-world
datasets
Hierarchical information clustering by means of topologically embedded graphs
We introduce a graph-theoretic approach to extract clusters and hierarchies
in complex data-sets in an unsupervised and deterministic manner, without the
use of any prior information. This is achieved by building topologically
embedded networks containing the subset of most significant links and analyzing
the network structure. For a planar embedding, this method provides both the
intra-cluster hierarchy, which describes the way clusters are composed, and the
inter-cluster hierarchy which describes how clusters gather together. We
discuss performance, robustness and reliability of this method by first
investigating several artificial data-sets, finding that it can outperform
significantly other established approaches. Then we show that our method can
successfully differentiate meaningful clusters and hierarchies in a variety of
real data-sets. In particular, we find that the application to gene expression
patterns of lymphoma samples uncovers biologically significant groups of genes
which play key-roles in diagnosis, prognosis and treatment of some of the most
relevant human lymphoid malignancies
A Tutorial on Clique Problems in Communications and Signal Processing
Since its first use by Euler on the problem of the seven bridges of
K\"onigsberg, graph theory has shown excellent abilities in solving and
unveiling the properties of multiple discrete optimization problems. The study
of the structure of some integer programs reveals equivalence with graph theory
problems making a large body of the literature readily available for solving
and characterizing the complexity of these problems. This tutorial presents a
framework for utilizing a particular graph theory problem, known as the clique
problem, for solving communications and signal processing problems. In
particular, the paper aims to illustrate the structural properties of integer
programs that can be formulated as clique problems through multiple examples in
communications and signal processing. To that end, the first part of the
tutorial provides various optimal and heuristic solutions for the maximum
clique, maximum weight clique, and -clique problems. The tutorial, further,
illustrates the use of the clique formulation through numerous contemporary
examples in communications and signal processing, mainly in maximum access for
non-orthogonal multiple access networks, throughput maximization using index
and instantly decodable network coding, collision-free radio frequency
identification networks, and resource allocation in cloud-radio access
networks. Finally, the tutorial sheds light on the recent advances of such
applications, and provides technical insights on ways of dealing with mixed
discrete-continuous optimization problems
- …