72,338 research outputs found
Spectrally approximating large graphs with smaller graphs
How does coarsening affect the spectrum of a general graph? We provide
conditions such that the principal eigenvalues and eigenspaces of a coarsened
and original graph Laplacian matrices are close. The achieved approximation is
shown to depend on standard graph-theoretic properties, such as the degree and
eigenvalue distributions, as well as on the ratio between the coarsened and
actual graph sizes. Our results carry implications for learning methods that
utilize coarsening. For the particular case of spectral clustering, they imply
that coarse eigenvectors can be used to derive good quality assignments even
without refinement---this phenomenon was previously observed, but lacked formal
justification.Comment: 22 pages, 10 figure
Making Laplacians commute
In this paper, we construct multimodal spectral geometry by finding a pair of
closest commuting operators (CCO) to a given pair of Laplacians. The CCOs are
jointly diagonalizable and hence have the same eigenbasis. Our construction
naturally extends classical data analysis tools based on spectral geometry,
such as diffusion maps and spectral clustering. We provide several synthetic
and real examples of applications in dimensionality reduction, shape analysis,
and clustering, demonstrating that our method better captures the inherent
structure of multi-modal data
LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
We introduce a stochastic graph-based method for computing relative
importance of textual units for Natural Language Processing. We test the
technique on the problem of Text Summarization (TS). Extractive TS relies on
the concept of sentence salience to identify the most important sentences in a
document or set of documents. Salience is typically defined in terms of the
presence of particular important words or in terms of similarity to a centroid
pseudo-sentence. We consider a new approach, LexRank, for computing sentence
importance based on the concept of eigenvector centrality in a graph
representation of sentences. In this model, a connectivity matrix based on
intra-sentence cosine similarity is used as the adjacency matrix of the graph
representation of sentences. Our system, based on LexRank ranked in first place
in more than one task in the recent DUC 2004 evaluation. In this paper we
present a detailed analysis of our approach and apply it to a larger data set
including data from earlier DUC evaluations. We discuss several methods to
compute centrality using the similarity graph. The results show that
degree-based methods (including LexRank) outperform both centroid-based methods
and other systems participating in DUC in most of the cases. Furthermore, the
LexRank with threshold method outperforms the other degree-based techniques
including continuous LexRank. We also show that our approach is quite
insensitive to the noise in the data that may result from an imperfect topical
clustering of documents
- …