120 research outputs found

    Efficient Management of Non Redundant Rules in Large Pattern Bases: a Bitmap Approach

    No full text
    International audienceKnowledge Discovery from Databases has more and more impact nowadays and various tools are now available to extract efficiently (in time and memory space) some knowledge from huge databases. Nevertheless, those systems generally produce some large pattern bases and then the management of these one rapidly becomes untractable. Few works have focused on pattern base management systems and researches on that domain are really new. This paper comes within that context, dealing with a particular class of patterns that is association rules. More precisely, we present the way we have efficiently implemented the search for non redundant rules thanks to a representation of rules in the form of bitmap arrays. Some experiments show that the use of this technique increases dramatically the gain in time and space, allowing us to manage large pattern bases

    Tree Contraction, Connected Components, Minimum Spanning Trees: a GPU Path to Vertex Fitting

    Get PDF
    Standard parallel computing operations are considered in the context of algorithms for solving 3D graph problems which have applications, e.g., in vertex finding in HEP. Exploiting GPUs for tree-accumulation and graph algorithms is challenging: GPUs offer extreme computational power and high memory-access bandwidth, combined with a model of fine-grained parallelism perhaps not suiting the irregular distribution of linked representations of graph data structures. Achieving data-race free computations may demand serialization through atomic transactions, inevitably producing poor parallel performance. A Minimum Spanning Tree algorithm for GPUs is presented, its implementation discussed, and its efficiency evaluated on GPU and multicore architectures

    Semisupervised Autoencoder for Sentiment Analysis

    Full text link
    In this paper, we investigate the usage of autoencoders in modeling textual data. Traditional autoencoders suffer from at least two aspects: scalability with the high dimensionality of vocabulary size and dealing with task-irrelevant words. We address this problem by introducing supervision via the loss function of autoencoders. In particular, we first train a linear classifier on the labeled data, then define a loss for the autoencoder with the weights learned from the linear classifier. To reduce the bias brought by one single classifier, we define a posterior probability distribution on the weights of the classifier, and derive the marginalized loss of the autoencoder with Laplace approximation. We show that our choice of loss function can be rationalized from the perspective of Bregman Divergence, which justifies the soundness of our model. We evaluate the effectiveness of our model on six sentiment analysis datasets, and show that our model significantly outperforms all the competing methods with respect to classification accuracy. We also show that our model is able to take advantage of unlabeled dataset and get improved performance. We further show that our model successfully learns highly discriminative feature maps, which explains its superior performance.Comment: To appear in AAAI 201

    Parametric model-based clustering

    Get PDF

    Scaling Expected Force: Efficient Identification of Key Nodes in Network-based Epidemic Models

    Full text link
    Centrality measures are fundamental tools of network analysis as they highlight the key actors within the network. This study focuses on a newly proposed centrality measure, Expected Force (EF), and its use in identifying spreaders in network-based epidemic models. We found that EF effectively predicts the spreading power of nodes and identifies key nodes and immunization targets. However, its high computational cost presents a challenge for its use in large networks. To overcome this limitation, we propose two parallel scalable algorithms for computing EF scores: the first algorithm is based on the original formulation, while the second one focuses on a cluster-centric approach to improve efficiency and scalability. Our implementations significantly reduce computation time, allowing for the detection of key nodes at large scales. Performance analysis on synthetic and real-world networks demonstrates that the GPU implementation of our algorithm can efficiently scale to networks with up to 44 million edges by exploiting modern parallel architectures, achieving speed-ups of up to 300x, and 50x on average, compared to the simple parallel solution
    • …
    corecore