4,483 research outputs found
ExplaiNE: An Approach for Explaining Network Embedding-based Link Predictions
Networks are powerful data structures, but are challenging to work with for
conventional machine learning methods. Network Embedding (NE) methods attempt
to resolve this by learning vector representations for the nodes, for
subsequent use in downstream machine learning tasks.
Link Prediction (LP) is one such downstream machine learning task that is an
important use case and popular benchmark for NE methods. Unfortunately, while
NE methods perform exceedingly well at this task, they are lacking in
transparency as compared to simpler LP approaches.
We introduce ExplaiNE, an approach to offer counterfactual explanations for
NE-based LP methods, by identifying existing links in the network that explain
the predicted links. ExplaiNE is applicable to a broad class of NE algorithms.
An extensive empirical evaluation for the NE method `Conditional Network
Embedding' in particular demonstrates its accuracy and scalability
FFT-Based Deep Learning Deployment in Embedded Systems
Deep learning has delivered its powerfulness in many application domains,
especially in image and speech recognition. As the backbone of deep learning,
deep neural networks (DNNs) consist of multiple layers of various types with
hundreds to thousands of neurons. Embedded platforms are now becoming essential
for deep learning deployment due to their portability, versatility, and energy
efficiency. The large model size of DNNs, while providing excellent accuracy,
also burdens the embedded platforms with intensive computation and storage.
Researchers have investigated on reducing DNN model size with negligible
accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN
training and inference model suitable for embedded platforms with reduced
asymptotic complexity of both computation and storage, making our approach
distinguished from existing approaches. We develop the training and inference
algorithms based on FFT as the computing kernel and deploy the FFT-based
inference model on embedded platforms achieving extraordinary processing speed.Comment: Design, Automation, and Test in Europe (DATE) For source code, please
contact Mahdi Nazemi at <[email protected]
VoG: Summarizing and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple
sentences? How can we measure the "importance" of a set of discovered subgraphs
in a large graph? These are exactly the problems we focus on. Our main ideas
are to construct a "vocabulary" of subgraph-types that often occur in real
graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the
most succinct description of a graph in terms of this vocabulary. We measure
success in a well-founded way by means of the Minimum Description Length (MDL)
principle: a subgraph is included in the summary if it decreases the total
description length of the graph.
Our contributions are three-fold: (a) formulation: we provide a principled
encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop
\method, an efficient method to minimize the description cost, and (c)
applicability: we report experimental results on multi-million-edge real
graphs, including Flickr and the Notre Dame web graph.Comment: SIAM International Conference on Data Mining (SDM) 201
Detecting and quantifying causal associations in large nonlinear time series datasets
Identifying causal relationships and quantifying their strength from observational time series data are key problems in disciplines dealing with complex dynamical systems such as the Earth system or the human body. Data-driven causal inference in such systems is challenging since datasets are often high dimensional and nonlinear with limited sample sizes. Here, we introduce a novel method that flexibly combines linear or nonlinear conditional independence tests with a causal discovery algorithm to estimate causal networks from large-scale time series datasets. We validate the method on time series of well-understood physical mechanisms in the climate system and the human heart and using large-scale synthetic datasets mimicking the typical properties of real-world data. The experiments demonstrate that our method outperforms state-of-the-art techniques in detection power, which opens up entirely new possibilities to discover and quantify causal networks from time series across a range of research fields
Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch
Graph-based Semi-supervised learning (SSL) algorithms have been successfully
used in a large number of applications. These methods classify initially
unlabeled nodes by propagating label information over the structure of graph
starting from seed nodes. Graph-based SSL algorithms usually scale linearly
with the number of distinct labels (m), and require O(m) space on each node.
Unfortunately, there exist many applications of practical significance with
very large m over large graphs, demanding better space and time complexity. In
this paper, we propose MAD-SKETCH, a novel graph-based SSL algorithm which
compactly stores label distribution on each node using Count-min Sketch, a
randomized data structure. We present theoretical analysis showing that under
mild conditions, MAD-SKETCH can reduce space complexity at each node from O(m)
to O(log m), and achieve similar savings in time complexity as well. We support
our analysis through experiments on multiple real world datasets. We observe
that MAD-SKETCH achieves similar performance as existing state-of-the-art
graph- based SSL algorithms, while requiring smaller memory footprint and at
the same time achieving up to 10x speedup. We find that MAD-SKETCH is able to
scale to datasets with one million labels, which is beyond the scope of
existing graph- based SSL algorithms.Comment: 9 page
- …