Search CORE

1,022 research outputs found

Compressive Embedding and Visualization using Graphs

Author: Paratte Johan
Perraudin Nathanaël
Vandergheynst Pierre
Publication venue
Publication date: 19/02/2017
Field of study

Visualizing high-dimensional data has been a focus in data analysis communities for decades, which has led to the design of many algorithms, some of which are now considered references (such as t-SNE for example). In our era of overwhelming data volumes, the scalability of such methods have become more and more important. In this work, we present a method which allows to apply any visualization or embedding algorithm on very large datasets by considering only a fraction of the data as input and then extending the information to all data points using a graph encoding its global similarity. We show that in most cases, using only

\mathcal{O}(\log(N))

samples is sufficient to diffuse the information to all

N

data points. In addition, we propose quantitative methods to measure the quality of embeddings and demonstrate the validity of our technique on both synthetic and real-world datasets

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Multiclass Total Variation Clustering

Author: Bresson Xavier
Laurent Thomas
Uminsky David
von Brecht James H.
Publication venue
Publication date: 01/01/2013
Field of study

Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches

arXiv.org e-Print Archive

CiteSeerX

University of San Francisco

Semi-Supervised Kernel PCA

Author: Christian Walder
Lars Kai Hansen
Mathematical Modelling
Morten Mørup
Ricardo Henao
Publication venue
Publication date: 01/01/2010
Field of study

We present three generalisations of Kernel Principal Components Analysis (KPCA) which incorporate knowledge of the class labels of a subset of the data points. The first, MV-KPCA, penalises within class variances similar to Fisher discriminant analysis. The second, LSKPCA is a hybrid of least squares regression and kernel PCA. The final LR-KPCA is an iteratively reweighted version of the previous which achieves a sigmoid loss function on the labeled points. We provide a theoretical risk bound as well as illustrative experiments on real and toy data sets

arXiv.org e-Print Archive

CiteSeerX

Online Research Database In Technology

Hypergraph Learning with Line Expansion

Author: Abdelzaher Tarek
Wang Ruijie
Yang Chaoqi
Yao Shuochao
Publication venue
Publication date: 08/09/2020
Field of study

Previous hypergraph expansions are solely carried out on either vertex level or hyperedge level, thereby missing the symmetric nature of data co-occurrence, and resulting in information loss. To address the problem, this paper treats vertices and hyperedges equally and proposes a new hypergraph formulation named the \emph{line expansion (LE)} for hypergraphs learning. The new expansion bijectively induces a homogeneous structure from the hypergraph by treating vertex-hyperedge pairs as "line nodes". By reducing the hypergraph to a simple graph, the proposed \emph{line expansion} makes existing graph learning algorithms compatible with the higher-order structure and has been proven as a unifying framework for various hypergraph expansions. We evaluate the proposed line expansion on five hypergraph datasets, the results show that our method beats SOTA baselines by a significant margin

arXiv.org e-Print Archive

spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R

Author: Mark Culp
Publication venue
Publication date
Field of study

In this paper, we present an R package that combines feature-based (X) data and graph-based (G) data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled) and missing for the remainder (unlabeled). We examine an approach for fitting Y = XÃÂ² + f(G) where ÃÂ² is a coefficient vector and f is a function over the vertices of the graph. The procedure is semi-supervised in nature (trained on the labeled and unlabeled sets), requiring iterative algorithms for fitting this estimate. The package provides several key functions for fitting and evaluating an estimator of this type. The package is illustrated on a text analysis data set, where the observations are text documents (papers), the response is the category of paper (either applied or theoretical statistics), the X information is the name of the journal in which the paper resides, and the graph is a co-citation network, with each vertex an observation and each edge the number of times that the two papers cite a common paper. An application involving classification of protein location using a protein interaction graph and an application involving classification on a manifold with part of the feature data converted to a graph are also presented.

Research Papers in Economics