127,538 research outputs found
Data Analysis with Intersection Graphs
AbstractThis paper presents a new framework for multivariate data analysis, based on graph theory, using intersection graphs [1]. We have named this approach DAIG – Data Analysis with Intersection Graphs. This new framework represents data vectors as paths on a graph, which has a number of advantages over the classical table representation of data. To do so, each node represents an atom of information, i.e. a pair of a variable and a value, associated with the set of observations for which that pair occurs. An edge exists between a pair of nodes whenever the intersection of their respective sets is not empty. We show that this representation of data as an intersection graph allows an easy and intuitive geometric interpretation of data observations, groups of observations, and results of multivariate data analysis techniques such as biplots, principal components, cluster analysis, or multidimensional scaling. These will appear as paths on the graph, relating variables, values and observations. This approach allows for a compact and memory efficient representation of data that contains many missing values or multi-valued attributes. The basic principles and advantages of this approach are presented with an example of its application to a simple toy problem. The main features of this methodology are illustrated with the aid software specifically developed for this purpose
Moment-based parameter estimation in binomial random intersection graph models
Binomial random intersection graphs can be used as parsimonious statistical
models of large and sparse networks, with one parameter for the average degree
and another for transitivity, the tendency of neighbours of a node to be
connected. This paper discusses the estimation of these parameters from a
single observed instance of the graph, using moment estimators based on
observed degrees and frequencies of 2-stars and triangles. The observed data
set is assumed to be a subgraph induced by a set of nodes sampled from
the full set of nodes. We prove the consistency of the proposed estimators
by showing that the relative estimation error is small with high probability
for . As a byproduct, our analysis confirms that the
empirical transitivity coefficient of the graph is with high probability close
to the theoretical clustering coefficient of the model.Comment: 15 pages, 6 figure
An environment for studying the impact of spatialising sonified graphs on data comprehension
We describe AudioCave, an environment for exploring the impact of spatialising sonified graphs on a set of numerical data comprehension tasks. Its design builds on findings regarding the effectiveness of sonified graphs for numerical data overview and discovery by visually impaired and blind students. We demonstrate its use as a test bed for comparing the approach of accessing a single sonified numerical datum at a time to one where multiple sonified numerical data can be accessed concurrently. Results from this experiment show that concurrent access facilitates the tackling of our set multivariate data comprehension tasks. AudioCave also demonstrates how the spatialisation of the sonified graphs provides opportunities for sharing the representation. We present two experiments investigating users solving set data comprehension tasks collaboratively by sharing the data representation
Reasoning about Independence in Probabilistic Models of Relational Data
We extend the theory of d-separation to cases in which data instances are not
independent and identically distributed. We show that applying the rules of
d-separation directly to the structure of probabilistic models of relational
data inaccurately infers conditional independence. We introduce relational
d-separation, a theory for deriving conditional independence facts from
relational models. We provide a new representation, the abstract ground graph,
that enables a sound, complete, and computationally efficient method for
answering d-separation queries about relational models, and we present
empirical results that demonstrate effectiveness.Comment: 61 pages, substantial revisions to formalisms, theory, and related
wor
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
- …