1,891 research outputs found

    Error Correction Using Probabilistic Language Models

    Get PDF
    Error Correction has applications in a variety of domains given the prevalence of errors of various kinds and the need to programmatically correct them as accurately as possible. For example, error correction is used in portable mobile devices to fix typographical errors while taking input from the keypads. It can also be useful in lower level applications – to fix errors in storage media or to fix network transmission errors. The precision and the influence of such techniques can vary based on requirements and the capabilities of the correction technique but they essentially form a part of the application for its effective functioning. The research primarily focuses on various techniques to provide error correction given the location of the erroneous token. The errors are essentially Erasures which are missing bits in a stream of binary data, the locations of which are known. The basic idea behind these techniques lies in building up contextual information from an error-free training corpora and using these models, provide alternative suggestions which could replace the erroneous tokens. We look into two models - the topic-based LDA (Latent Dirichlet Allocation) model and the N-Gram model. We also propose an efficient mechanism to process such errors which offers exponential speed-ups. Using these models, we are able to achieve up to 5% improvement in accuracy as compared to a standard word distribution model using minimal domain knowledge

    Engineering Parallel String Sorting

    Get PDF
    We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

    Accelerating data retrieval steps in XML documents

    Get PDF
    • …
    corecore