Search CORE

17,400 research outputs found

Graph Summarization

Author: Bonifati Angela
Dumbrava Stefania
Kondylakis Haridimos
Publication venue
Publication date: 01/04/2020
Field of study

The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Recommended from our members

Can graph-cutting improve microarray gene expression reconstructions?

Author: Fraser K
Kellam P
Li Y
Liu X
Wang Z
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Microarrays produce high-resolution image data that are, unfortunately, permeated with a great deal of “noise” that must be removed for precision purposes. This paper presents a technique for such a removal process. On completion of this non-trivial task, a new surface (devoid of gene spots) is subtracted from the original to render more precise gene expressions. The graph-cutting technique as implemented has the benefits that only the most appropriate pixels are replaced and these replacements are replicates rather than estimates. This means the influence of outliers and other artifacts are handled more appropriately (than in previous methods) as well as the variability of the final gene expressions being considerably reduced. Experiments are carried out to test the technique against commercial and previously researched reconstruction methods. Final results show that the graph-cutting inspired identification mechanism has a positive significant impact on reconstruction accuracy

Brunel University Research Archive

Improved processing of microarray data using image reconstruction techniques

Author: Liu X.
Magoulas George D.
O'Neill P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Spotted cDNA microarray data analysis suffers from various problems such as noise from a variety of sources, missing data, inconsistency, and, of course, the presence of outliers. This paper introduces a new method that dramatically reduces the noise when processing the original image data. The proposed approach recreates the microarray slide image, as it would have been with all the genes removed. By subtracting this background recreation from the original, the gene ratios can be calculated with more precision and less influence from outliers and other artifacts that would normally make the analysis of this data more difficult. The new technique is also beneficial, as it does not rely on the accurate fitting of a region to each gene, with its only requirement being an approximate coordinate. In experiments conducted, the new method was tested against one of the mainstream methods of processing spotted microarray images. Our method is shown to produce much less variation in gene measurements. This evidence is supported by clustering results that show a marked improvement in accuracy

CiteSeerX

Crossref

Birkbeck Institutional Research Online

Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods

Author: Aste Tomaso
Di Matteo Tiziana
Musmeci Nicolo
Publication venue: 'Infopro Digital Services Ltd'
Publication date: 01/01/2015
Field of study

We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing it with the underlying industrial activity structure. Specifically, we apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. In particular, by taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging.Comment: 31 pages, 17 figure

arXiv.org e-Print Archive

LSE Research Online

Directory of Open Access Journals

PubMed Central

King's Research Portal

FigShare

Hierarchical information clustering by means of topologically embedded graphs

Author: A Alizadeh
A Jain
AI Saez
AJ Nathalie
BB Ding
C Rivera
D Arthur
D Garlaschelli
DL Davies
DM Rocke
G Caldarelli
G Lenz
G Ringel
G Romeo
GL Pellegrini
GP Coffey
H Hooyberghs
IS Lossos
IT Hernádvölgyi
J Dunn
J Handl
J McQueen
J Quackenbush
J Ruan
J Shi
J Wang
JM Boyer
JS Abramson
JSJ Andrade
KII Goh
L Amaral
L Chen
L Hubert
L Leseux
LL Lam
M Arsura
M Eisen
M Filipits
M Girvan
M Kitsak
M Tumminello
MC de Souto
N Wada
PF Jonsson
R Diestel
R Seki
R Xu
RA Fisher
S Fortunato
ShaunS Wang
SV Buldyrev
T Aste
T Di Matteo
T Di Matteo
T Di Matteo
T Kamijo
T Kohonen
T Sorensen
T. Di Matteo
Tomaso Aste
U von Luxburg
WM Song
Won-Min Song
X Zhao
XF Zhao
Ying Xu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 20/10/2011
Field of study

We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster hierarchy, which describes the way clusters are composed, and the inter-cluster hierarchy which describes how clusters gather together. We discuss performance, robustness and reliability of this method by first investigating several artificial data-sets, finding that it can outperform significantly other established approaches. Then we show that our method can successfully differentiate meaningful clusters and hierarchies in a variety of real data-sets. In particular, we find that the application to gene expression patterns of lymphoma samples uncovers biologically significant groups of genes which play key-roles in diagnosis, prognosis and treatment of some of the most relevant human lymphoid malignancies.Comment: 33 Pages, 18 Figures, 5 Table

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

Kent Academic Repository

King's Research Portal

FigShare

Compressive Network Analysis

Author: Guibas Leonidas
Jiang Xiaoye
Liu Han
Yao Yuan
Publication venue
Publication date: 01/01/2011
Field of study

Modern data acquisition routinely produces massive amounts of network data. Though many methods and models have been proposed to analyze such data, the research of network data is largely disconnected with the classical theory of statistical learning and signal processing. In this paper, we present a new framework for modeling network data, which connects two seemingly different areas: network data analysis and compressed sensing. From a nonparametric perspective, we model an observed network using a large dictionary. In particular, we consider the network clique detection problem and show connections between our formulation with a new algebraic tool, namely Randon basis pursuit in homogeneous spaces. Such a connection allows us to identify rigorous recovery conditions for clique detection problems. Though this paper is mainly conceptual, we also develop practical approximation algorithms for solving empirical problems and demonstrate their usefulness on real-world datasets

arXiv.org e-Print Archive

CiteSeerX

Hierarchical information clustering by means of topologically embedded graphs

Author: Aste Tomaso
Matteo T. Di
Song Won-Min
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 10/12/2015
Field of study

The Australian National University

A Tutorial on Clique Problems in Communications and Signal Processing

Author: Al-Naffouri Tareq Y.
Alouini Mohamed-Slim
Dahrouj Hayssam
Douik Ahmed
Publication venue
Publication date: 25/02/2020
Field of study

Since its first use by Euler on the problem of the seven bridges of K\"onigsberg, graph theory has shown excellent abilities in solving and unveiling the properties of multiple discrete optimization problems. The study of the structure of some integer programs reveals equivalence with graph theory problems making a large body of the literature readily available for solving and characterizing the complexity of these problems. This tutorial presents a framework for utilizing a particular graph theory problem, known as the clique problem, for solving communications and signal processing problems. In particular, the paper aims to illustrate the structural properties of integer programs that can be formulated as clique problems through multiple examples in communications and signal processing. To that end, the first part of the tutorial provides various optimal and heuristic solutions for the maximum clique, maximum weight clique, and

k

-clique problems. The tutorial, further, illustrates the use of the clique formulation through numerous contemporary examples in communications and signal processing, mainly in maximum access for non-orthogonal multiple access networks, throughput maximization using index and instantly decodable network coding, collision-free radio frequency identification networks, and resource allocation in cloud-radio access networks. Finally, the tutorial sheds light on the recent advances of such applications, and provides technical insights on ways of dealing with mixed discrete-continuous optimization problems

arXiv.org e-Print Archive

Caltech Authors