58 research outputs found
Investigation of the Permeability of Soil-rock Mixtures Using Lattice Boltzmann Simulations
Based on the discrete element method and the proposed virtual slicing technique for three-dimensional discrete element model, random pore-structural models of soil-rock mixtures are constructed and voxelized. Then, the three-dimensional lattice Boltzmann method is introduced to simulate the seepage flow in soil-rock mixtures on the pore scale. Finally, the influences of rock content, rock size, rock shape and rock orientation on the simulated permeability of soil-rock mixtures are comprehensively investigated. The results show that the permeability of soil-rock mixtures remarkably decreases with the increase of rock content. When the other conditions remain unchanged, the permeability of soil-rock mixtures increases with the increase of rock size. The permeability of soil-rock mixtures with bar-shaped rocks is smaller than that of soil-rock mixtures with block-shaped rocks, but larger than that of soil-rock mixtures with slab-shaped rocks. The rock orientation has a certain influence on the permeability of SRMs, and the amount of variation changes with the rock shape: when the rocks are bar-shaped, the permeability is slightly decreased as the major axes of these rocks change from parallel to perpendicular with respect to the direction of main flow; when the rocks are slab-shaped, the permeability decreases more significantly as the slab planes of these rocks change from parallel to perpendicular with respect to the direction of main flow
SPEC2: SPECtral SParsE CNN Accelerator on FPGAs
To accelerate inference of Convolutional Neural Networks (CNNs), various
techniques have been proposed to reduce computation redundancy. Converting
convolutional layers into frequency domain significantly reduces the
computation complexity of the sliding window operations in space domain. On the
other hand, weight pruning techniques address the redundancy in model
parameters by converting dense convolutional kernels into sparse ones. To
obtain high-throughput FPGA implementation, we propose SPEC2 -- the first work
to prune and accelerate spectral CNNs. First, we propose a systematic pruning
algorithm based on Alternative Direction Method of Multipliers (ADMM). The
offline pruning iteratively sets the majority of spectral weights to zero,
without using any handcrafted heuristics. Then, we design an optimized pipeline
architecture on FPGA that has efficient random access into the sparse kernels
and exploits various dimensions of parallelism in convolutional layers.
Overall, SPEC2 achieves high inference throughput with extremely low
computation complexity and negligible accuracy degradation. We demonstrate
SPEC2 by pruning and implementing LeNet and VGG16 on the Xilinx Virtex
platform. After pruning 75% of the spectral weights, SPEC2 achieves 0% accuracy
loss for LeNet, and <1% accuracy loss for VGG16. The resulting accelerators
achieve up to 24x higher throughput, compared with the state-of-the-art FPGA
implementations for VGG16.Comment: This is a 10-page conference paper in 26TH IEEE International
Conference On High Performance Computing, Data, and Analytics (HiPC
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
We investigate various prompting strategies for enhancing personalized
recommendation performance with large language models (LLMs) through input
augmentation. Our proposed approach, termed LLM-Rec, encompasses four distinct
prompting strategies: (1) basic prompting, (2) recommendation-driven prompting,
(3) engagement-guided prompting, and (4) recommendation-driven +
engagement-guided prompting. Our empirical experiments show that incorporating
the augmented input text generated by LLM leads to improved recommendation
performance. Recommendation-driven and engagement-guided prompting strategies
are found to elicit LLM's understanding of global and local item
characteristics. This finding highlights the importance of leveraging diverse
prompts and input augmentation techniques to enhance the recommendation
capabilities with LLMs
Quickly Finding a Truss in a Haystack
The k-truss of a graph is a subgraph such that each edge is tightly connected to the remaining elements in the k-truss. The k-truss of a graph can also represent an important community in the graph. Finding the k-truss of a graph can be done in a polynomial amount of time, in contrast finding other subgraphs such as cliques. While there are numerous formulations and algorithms for finding the maximal k-truss of a graph, many of these tend to be computationally expensive and do not scale well. Many algorithms are iterative and use static graph triangle counting in each iteration of the graph. In this work we present a novel algorithm for finding both the k- truss of the graph (for a given k), as well as the maximal k-truss using a dynamic graph formulation. Our algorithm has two main benefits. 1) Unlike many algorithms that rerun the static graph triangle counting after the removal of nonconforming edges, we use a new dynamic graph formulation that only requires updating the edges affected by the removal. As our updates are local, we only do a fraction of the work compared to the other algorithms. 2) Our algorithm is extremely scalable and is able to concurrently detect deleted triangles in contrast to past sequential approaches. While our algorithm is architecture independent, we show a CUDA based implementation for NVIDIA GPUs. In numerous instances, our new algorithm is anywhere from 100X-10000X faster than the Graph Challenge benchmark. Furthermore, our algorithm shows significant speedups, in some cases over 70X, over a recently developed sequential and highly optimized algorithm
SCE: Scalable Network Embedding from Sparsest Cut
Large-scale network embedding is to learn a latent representation for each
node in an unsupervised manner, which captures inherent properties and
structural information of the underlying graph. In this field, many popular
approaches are influenced by the skip-gram model from natural language
processing. Most of them use a contrastive objective to train an encoder which
forces the embeddings of similar pairs to be close and embeddings of negative
samples to be far. A key of success to such contrastive learning methods is how
to draw positive and negative samples. While negative samples that are
generated by straightforward random sampling are often satisfying, methods for
drawing positive examples remains a hot topic.
In this paper, we propose SCE for unsupervised network embedding only using
negative samples for training. Our method is based on a new contrastive
objective inspired by the well-known sparsest cut problem. To solve the
underlying optimization problem, we introduce a Laplacian smoothing trick,
which uses graph convolutional operators as low-pass filters for smoothing node
representations. The resulting model consists of a GCN-type structure as the
encoder and a simple loss function. Notably, our model does not use positive
samples but only negative samples for training, which not only makes the
implementation and tuning much easier, but also reduces the training time
significantly.
Finally, extensive experimental studies on real world data sets are
conducted. The results clearly demonstrate the advantages of our new model in
both accuracy and scalability compared to strong baselines such as GraphSAGE,
G2G and DGI.Comment: KDD 202
- …