39,864 research outputs found

    Threesomes, Degenerates, and Love Triangles

    Full text link
    The 3SUM problem is to decide, given a set of nn real numbers, whether any three sum to zero. It is widely conjectured that a trivial O(n2)O(n^2)-time algorithm is optimal and over the years the consequences of this conjecture have been revealed. This 3SUM conjecture implies Ω(n2)\Omega(n^2) lower bounds on numerous problems in computational geometry and a variant of the conjecture implies strong lower bounds on triangle enumeration, dynamic graph algorithms, and string matching data structures. In this paper we refute the 3SUM conjecture. We prove that the decision tree complexity of 3SUM is O(n3/2logn)O(n^{3/2}\sqrt{\log n}) and give two subquadratic 3SUM algorithms, a deterministic one running in O(n2/(logn/loglogn)2/3)O(n^2 / (\log n/\log\log n)^{2/3}) time and a randomized one running in O(n2(loglogn)2/logn)O(n^2 (\log\log n)^2 / \log n) time with high probability. Our results lead directly to improved bounds for kk-variate linear degeneracy testing for all odd k3k\ge 3. The problem is to decide, given a linear function f(x1,,xk)=α0+1ikαixif(x_1,\ldots,x_k) = \alpha_0 + \sum_{1\le i\le k} \alpha_i x_i and a set ARA \subset \mathbb{R}, whether 0f(Ak)0\in f(A^k). We show the decision tree complexity of this problem is O(nk/2logn)O(n^{k/2}\sqrt{\log n}). Finally, we give a subcubic algorithm for a generalization of the (min,+)(\min,+)-product over real-valued matrices and apply it to the problem of finding zero-weight triangles in weighted graphs. We give a depth-O(n5/2logn)O(n^{5/2}\sqrt{\log n}) decision tree for this problem, as well as an algorithm running in time O(n3(loglogn)2/logn)O(n^3 (\log\log n)^2/\log n)

    A New Approach to Speeding Up Topic Modeling

    Full text link
    Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic modeling paradigm, and recently finds many applications in computer vision and computational biology. In this paper, we propose a fast and accurate batch algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA algorithms require repeated scanning of the entire corpus and searching the complete topic space. To process massive corpora having a large number of topics, the training iteration of batch LDA algorithms is often inefficient and time-consuming. To accelerate the training speed, ABP actively scans the subset of corpus and searches the subset of topic space for topic modeling, therefore saves enormous training time in each iteration. To ensure accuracy, ABP selects only those documents and topics that contribute to the largest residuals within the residual belief propagation (RBP) framework. On four real-world corpora, ABP performs around 1010 to 100100 times faster than state-of-the-art batch LDA algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure

    Finding groups in data: Cluster analysis with ants

    Get PDF
    Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques
    corecore