120 research outputs found
Efficient Management of Non Redundant Rules in Large Pattern Bases: a Bitmap Approach
International audienceKnowledge Discovery from Databases has more and more impact nowadays and various tools are now available to extract efficiently (in time and memory space) some knowledge from huge databases. Nevertheless, those systems generally produce some large pattern bases and then the management of these one rapidly becomes untractable. Few works have focused on pattern base management systems and researches on that domain are really new. This paper comes within that context, dealing with a particular class of patterns that is association rules. More precisely, we present the way we have efficiently implemented the search for non redundant rules thanks to a representation of rules in the form of bitmap arrays. Some experiments show that the use of this technique increases dramatically the gain in time and space, allowing us to manage large pattern bases
Tree Contraction, Connected Components, Minimum Spanning Trees: a GPU Path to Vertex Fitting
Standard parallel computing operations are considered in the context of algorithms for solving 3D graph problems which have applications, e.g., in vertex finding in HEP. Exploiting GPUs for tree-accumulation and graph algorithms is challenging: GPUs offer extreme computational power and high memory-access bandwidth, combined with a model of fine-grained parallelism perhaps not suiting the irregular distribution of linked representations of graph data structures. Achieving data-race free computations may demand serialization through atomic transactions, inevitably producing poor parallel performance. A Minimum Spanning Tree algorithm for GPUs is presented, its implementation discussed, and its efficiency evaluated on GPU and multicore architectures
Semisupervised Autoencoder for Sentiment Analysis
In this paper, we investigate the usage of autoencoders in modeling textual
data. Traditional autoencoders suffer from at least two aspects: scalability
with the high dimensionality of vocabulary size and dealing with
task-irrelevant words. We address this problem by introducing supervision via
the loss function of autoencoders. In particular, we first train a linear
classifier on the labeled data, then define a loss for the autoencoder with the
weights learned from the linear classifier. To reduce the bias brought by one
single classifier, we define a posterior probability distribution on the
weights of the classifier, and derive the marginalized loss of the autoencoder
with Laplace approximation. We show that our choice of loss function can be
rationalized from the perspective of Bregman Divergence, which justifies the
soundness of our model. We evaluate the effectiveness of our model on six
sentiment analysis datasets, and show that our model significantly outperforms
all the competing methods with respect to classification accuracy. We also show
that our model is able to take advantage of unlabeled dataset and get improved
performance. We further show that our model successfully learns highly
discriminative feature maps, which explains its superior performance.Comment: To appear in AAAI 201
Scaling Expected Force: Efficient Identification of Key Nodes in Network-based Epidemic Models
Centrality measures are fundamental tools of network analysis as they
highlight the key actors within the network. This study focuses on a newly
proposed centrality measure, Expected Force (EF), and its use in identifying
spreaders in network-based epidemic models. We found that EF effectively
predicts the spreading power of nodes and identifies key nodes and immunization
targets. However, its high computational cost presents a challenge for its use
in large networks. To overcome this limitation, we propose two parallel
scalable algorithms for computing EF scores: the first algorithm is based on
the original formulation, while the second one focuses on a cluster-centric
approach to improve efficiency and scalability. Our implementations
significantly reduce computation time, allowing for the detection of key nodes
at large scales. Performance analysis on synthetic and real-world networks
demonstrates that the GPU implementation of our algorithm can efficiently scale
to networks with up to 44 million edges by exploiting modern parallel
architectures, achieving speed-ups of up to 300x, and 50x on average, compared
to the simple parallel solution
- …