2 research outputs found

    Efficient Sequential and Parallel Algorithms for Estimating Higher Order Spectra

    Full text link
    Polyspectral estimation is a problem of great importance in the analysis of nonlinear time series that has applications in biomedical signal processing, communications, geophysics, image, radar, sonar and speech processing, etc. Higher order spectra (HOS) have been used in unsupervised and supervised clustering in big data scenarios, in testing for Gaussianity, to suppress Gaussian noise, to characterize nonlinearities in time series data, and so on . Any algorithm for computing the kkth order spectra of a time series of length nn needs Ω(nk−1)\Omega(n^{k-1}) time since the output size will be Ω(nk−1)\Omega(n^{k-1}) as well. Given that we live in an era of big data, nn could be very large. In this case, sequential algorithms might take unacceptable amounts of time. Thus it is essential to develop parallel algorithms. There is also room for improving existing sequential algorithms. In addition, parallel algorithms in the literature are nongeneric. In this paper we offer generic sequential algorithms for computing higher order spectra that are asymptotically faster than any published algorithm for HOS. Further, we offer memory efficient algorithms. We also present optimal parallel implementations of these algorithms on parallel computing models such as the PRAM and the mesh. We provide experimental results on our sequential and parallel algorithms. Our parallel implementation achieves very good speedups.Comment: 12 pages, 4 figures, conferenc

    Novel Algorithms for Some Fundamental Big Data Problems

    Get PDF
    In this digital era data sets are growing rapidly. Storing, processing, and analyzing large volume of data require efficient techniques. These techniques deal with big data problems by providing time efficient methods, effective external memory algorithms, parallel and high performance solutions, and so on. This thesis studies three important areas of big data problems and presents state of the art approaches to address them. The first part of this thesis discusses the k-mer counting problem. A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. Genome assembly, repeat detection, multiple sequence alignment, error detection, and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications. We propose a novel trie-based algorithm for this k-mer counting problem. In the second part, we present algorithms for the record linkage problems. Integrating data from multiple sources is a crucial and challenging problem. Here we have come up with efficient sequential and parallel algorithms for this problem which can handle any number of datasets. Our methods employ single linkage as well as complete linkage hierarchical clustering to address this problem. The last part explains three problems with algorithmic challenges. The first one is the minimum spanning tree problem. Finding minimum spanning trees (MST) in various types of networks is a well-studied problem in theory and practical applications. We have devised a very efficient algorithm which combines ideas from randomized selection, Kruskal’s algorithm and Prim’s algorithm. The second problem is higher order spectra analysis of nonlinear time series. It has applications in biomedical signal processing, communications, geophysics, speech processing, etc. We address this problem by providing space and time efficient sequential and parallel algorithms. The third problem is the closest l-mers problem. Algorithms for finding the closest l-mers have been used in solving the (l, d)-motif search problem. We describe exact as well as very fast approximate algorithms for computing a group of three l-mers having the minimum combined distance among all possible such combinations
    corecore