52,503 research outputs found
Quilting Stochastic Kronecker Product Graphs to Generate Multiplicative Attribute Graphs
We describe the first sub-quadratic sampling algorithm for the Multiplicative
Attribute Graph Model (MAGM) of Kim and Leskovec (2010). We exploit the close
connection between MAGM and the Kronecker Product Graph Model (KPGM) of
Leskovec et al. (2010), and show that to sample a graph from a MAGM it suffices
to sample small number of KPGM graphs and \emph{quilt} them together. Under a
restricted set of technical conditions our algorithm runs in time, where is the number of nodes and is the number of edges
in the sampled graph. We demonstrate the scalability of our algorithm via
extensive empirical evaluation; we can sample a MAGM graph with 8 million nodes
and 20 billion edges in under 6 hours
Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device
Currently, most designers face a daunting task to
research different design flows and learn the intricacies of
specific software from various manufacturers in
hardware/software co-design. An urgent need of creating a
scalable hardware/software co-design platform has become a key
strategic element for developing hardware/software integrated
systems. In this paper, we propose a new design flow for building
a scalable co-design platform on FPGA-based system-on-chip.
We employ an integrated approach to implement a histogram
oriented gradients (HOG) and a support vector machine (SVM)
classification on a programmable device for pedestrian tracking.
Not only was hardware resource analysis reported, but the
precision and success rates of pedestrian tracking on nine open
access image data sets are also analysed. Finally, our proposed
design flow can be used for any real-time image processingrelated
products on programmable ZYNQ-based embedded
systems, which benefits from a reduced design time and provide a
scalable solution for embedded image processing products
BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing
Matrix-matrix multiplication is a key computational kernel for numerous
applications in science and engineering, with ample parallelism and data
locality that lends itself well to high-performance implementations. Many
matrix multiplication-dependent applications can use reduced-precision integer
or fixed-point representations to increase their performance and energy
efficiency while still offering adequate quality of results. However, precision
requirements may vary between different application phases or depend on input
data, rendering constant-precision solutions ineffective. We present BISMO, a
vectorized bit-serial matrix multiplication overlay for reconfigurable
computing. BISMO utilizes the excellent binary-operation performance of FPGAs
to offer a matrix multiplication performance that scales with required
precision and parallelism. We characterize the resource usage and performance
of BISMO across a range of parameters to build a hardware cost model, and
demonstrate a peak performance of 6.5 TOPS on the Xilinx PYNQ-Z1 board.Comment: To appear at FPL'1
- …