1,891 research outputs found
Error Correction Using Probabilistic Language Models
Error Correction has applications in a variety of domains given the prevalence of errors of various kinds and the need to programmatically correct them as accurately as possible. For example, error correction is used in portable mobile devices to fix typographical errors while taking input from the keypads. It can also be useful in lower level applications β to fix errors in storage media or to fix network transmission errors. The precision and the influence of such techniques can vary based on requirements and the capabilities of the correction technique but they essentially form a part of the application for its effective functioning.
The research primarily focuses on various techniques to provide error correction given the location of the erroneous token. The errors are essentially Erasures which are missing bits in a stream of binary data, the locations of which are known. The basic idea behind these techniques lies in building up contextual information from an error-free training corpora and using these models, provide alternative suggestions which could replace the erroneous tokens. We look into two models - the topic-based LDA (Latent Dirichlet Allocation) model and the N-Gram model. We also propose an efficient mechanism to process such errors which offers exponential speed-ups. Using these models, we are able to achieve up to 5% improvement in accuracy as compared to a standard word distribution model using minimal domain knowledge
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we first propose string sample sort. The algorithm makes effective use
of the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Then we focus on NUMA architectures, and develop
parallel multiway LCP-merge and -mergesort to reduce the number of random
memory accesses to remote nodes. Additionally, we parallelize variants of
multikey quicksort and radix sort that are also useful in certain situations.
Comprehensive experiments on five current multi-core platforms are then
reported and discussed. The experiments show that our implementations scale
very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115
- β¦