13,159 research outputs found
Scaling Up Probabilistic Circuits by Latent Variable Distillation
Probabilistic Circuits (PCs) are a unified framework for tractable
probabilistic models that support efficient computation of various
probabilistic queries (e.g., marginal probabilities). One key challenge is to
scale PCs to model large and high-dimensional real-world datasets: we observe
that as the number of parameters in PCs increases, their performance
immediately plateaus. This phenomenon suggests that the existing optimizers
fail to exploit the full expressive power of large PCs. We propose to overcome
such bottleneck by latent variable distillation: we leverage the less tractable
but more expressive deep generative models to provide extra supervision over
the latent variables of PCs. Specifically, we extract information from
Transformer-based generative models to assign values to latent variables of
PCs, providing guidance to PC optimizers. Experiments on both image and
language modeling benchmarks (e.g., ImageNet and WikiText-2) show that latent
variable distillation substantially boosts the performance of large PCs
compared to their counterparts without latent variable distillation. In
particular, on the image modeling benchmarks, PCs achieve competitive
performance against some of the widely-used deep generative models, including
variational autoencoders and flow-based models, opening up new avenues for
tractable generative modeling
PONDER - A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope
This paper describes a new real-time versatile backend, the Pulsar Ooty Radio
Telescope New Digital Efficient Receiver (PONDER), which has been designed to
operate along with the legacy analog system of the Ooty Radio Telescope (ORT).
PONDER makes use of the current state of the art computing hardware, a
Graphical Processing Unit (GPU) and sufficiently large disk storage to support
high time resolution real-time data of pulsar observations, obtained by
coherent dedispersion over a bandpass of 16 MHz. Four different modes for
pulsar observations are implemented in PONDER to provide standard reduced data
products, such as time-stamped integrated profiles and dedispersed time series,
allowing faster avenues to scientific results for a variety of pulsar studies.
Additionally, PONDER also supports general modes of interplanetary
scintillation (IPS) measurements and very long baseline interferometry data
recording. The IPS mode yields a single polarisation correlated time series of
solar wind scintillation over a bandwidth of about four times larger (16 MHz)
than that of the legacy system as well as its fluctuation spectrum with high
temporal and frequency resolutions. The key point is that all the above modes
operate in real time. This paper presents the design aspects of PONDER and
outlines the design methodology for future similar backends. It also explains
the principal operations of PONDER, illustrates its capabilities for a variety
of pulsar and IPS observations and demonstrates its usefulness for a variety of
astrophysical studies using the high sensitivity of the ORT.Comment: 25 pages, 14 figures, Accepted by Experimental Astronom
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many
high-performance graph algorithms as well as for some linear solvers, such as
algebraic multigrid. The scaling of existing parallel implementations of SpGEMM
is heavily bound by communication. Even though 3D (or 2.5D) algorithms have
been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi
matrices, those algorithms had not been implemented in practice and their
complexities had not been analyzed for the general case. In this work, we
present the first ever implementation of the 3D SpGEMM formulation that also
exploits multiple (intra-node and inter-node) levels of parallelism, achieving
significant speedups over the state-of-the-art publicly available codes at all
levels of concurrencies. We extensively evaluate our implementation and
identify bottlenecks that should be subject to further research
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
We present a novel hybrid algorithm for Bayesian network structure learning,
called H2PC. It first reconstructs the skeleton of a Bayesian network and then
performs a Bayesian-scoring greedy hill-climbing search to orient the edges.
The algorithm is based on divide-and-conquer constraint-based subroutines to
learn the local structure around a target variable. We conduct two series of
experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is
currently the most powerful state-of-the-art algorithm for Bayesian network
structure learning. First, we use eight well-known Bayesian network benchmarks
with various data sizes to assess the quality of the learned structure returned
by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in
terms of goodness of fit to new data and quality of the network structure with
respect to the true dependence structure of the data. Second, we investigate
H2PC's ability to solve the multi-label learning problem. We provide
theoretical results to characterize and identify graphically the so-called
minimal label powersets that appear as irreducible factors in the joint
distribution under the faithfulness condition. The multi-label learning problem
is then decomposed into a series of multi-class classification problems, where
each multi-class variable encodes a label powerset. H2PC is shown to compare
favorably to MMHC in terms of global classification accuracy over ten
multi-label data sets covering different application domains. Overall, our
experiments support the conclusions that local structural learning with H2PC in
the form of local neighborhood induction is a theoretically well-motivated and
empirically effective learning framework that is well suited to multi-label
learning. The source code (in R) of H2PC as well as all data sets used for the
empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author
Simulating spin systems on IANUS, an FPGA-based computer
We describe the hardwired implementation of algorithms for Monte Carlo
simulations of a large class of spin models. We have implemented these
algorithms as VHDL codes and we have mapped them onto a dedicated processor
based on a large FPGA device. The measured performance on one such processor is
comparable to O(100) carefully programmed high-end PCs: it turns out to be even
better for some selected spin models. We describe here codes that we are
currently executing on the IANUS massively parallel FPGA-based system.Comment: 19 pages, 8 figures; submitted to Computer Physics Communication
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development
This paper describes PlinyCompute, a system for development of
high-performance, data-intensive, distributed computing tools and libraries. In
the large, PlinyCompute presents the programmer with a very high-level,
declarative interface, relying on automatic, relational-database style
optimization to figure out how to stage distributed computations. However, in
the small, PlinyCompute presents the capable systems programmer with a
persistent object data model and API (the "PC object model") and associated
memory management system that has been designed from the ground-up for high
performance, distributed, data-intensive computing. This contrasts with most
other Big Data systems, which are constructed on top of the Java Virtual
Machine (JVM), and hence must at least partially cede performance-critical
concerns such as memory management (including layout and de/allocation) and
virtual method/function dispatch to the JVM. This hybrid approach---declarative
in the large, trusting the programmer's ability to utilize PC object model
efficiently in the small---results in a system that is ideal for the
development of reusable, data-intensive tools and libraries. Through extensive
benchmarking, we show that implementing complex objects manipulation and
non-trivial, library-style computations on top of PlinyCompute can result in a
speedup of 2x to more than 50x or more compared to equivalent implementations
on Spark.Comment: 48 pages, including references and Appendi
Investigation of Air Transportation Technology at Princeton University, 1989-1990
The Air Transportation Technology Program at Princeton University proceeded along six avenues during the past year: microburst hazards to aircraft; machine-intelligent, fault tolerant flight control; computer aided heuristics for piloted flight; stochastic robustness for flight control systems; neural networks for flight control; and computer aided control system design. These topics are briefly discussed, and an annotated bibliography of publications that appeared between January 1989 and June 1990 is given
- …