Search CORE

36 research outputs found

Unsupervised Solution Post Identification from Discussion Forums

Author: Padmanabhan Deepak
Visweswariah Karthik
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Queen's University Belfast Research Portal

Universal lossless source coding with the Burrows Wheeler transform

Author: Effros Michelle
Kulkarni Sanjeev R.
Verdú Sergio
Visweswariah Karthik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source

CiteSeerX

Caltech Authors

Two-part segmentation of text documents.

Author: Deepak P.
Sani Sadiq
Visweswariah Karthik
Wiratunga Nirmalie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/10/2012
Field of study

We consider the problem of segmenting text documents that have a two-part structure such as a problem part and a solution part. Documents of this genre include incident reports that typically involve description of events relating to a problem followed by those pertaining to the solution that was tried. Segmenting such documents into the component two parts would render them usable in knowledge reuse frameworks such as Case-Based Reasoning. This segmentation problem presents a hard case for traditional text segmentation due to the lexical inter-relatedness of the segments. We develop a two-part segmentation technique that can harness a corpus of similar documents to model the behavior of the two segments and their inter-relatedness using language models and translation models respectively. In particular, we use separate language models for the problem and solution segment types, whereas the interrelatedness between segment types is modeled using an IBM Model 1 translation model. We model documents as being generated starting from the problem part that comprises of words sampled from the problem language model, followed by the solution part whose words are sampled either from the solution language model or from a translation model conditioned on the words already chosen in the problem part. We show, through an extensive set of experiments on real-world data, that our approach outperforms the state-of-the-art text segmentation algorithms in the accuracy of segmentation, and that such improved accuracy translates well to improved usability in Case-based Reasoning systems. We also analyze the robustness of our technique to varying amounts and types of noise and empirically illustrate that our technique is quite noise tolerant, and degrades gracefully with increasing amounts of noise

Crossref

Open Access Institutional Repository at Robert Gordon University

Rapid Adaptation With Linear Combinations Of Rank-One Matrices

Author: Karthik Visweswariah
Ramesh Gopinath
Vaibhava Goel
Publication venue
Publication date: 01/01/2002
Field of study

Linear transforms are often used to adapt the acoustic models in speech recognition systems. When there is very little (5-10 secs.) acoustic data adaptation suffers from unreliable parameter estimation. Typically this problem is handled by imposing a diagonal or block diagonal structure on the transform. This paper proposes using transforms that are linear combinations of rank-one matrices. This approach is applied to the adaptation of the Gaussian means, Gaussian covariances and the acoustic features. Experimental results with varying amounts of adaptation data indicate that for the same number of parameters, our new parameterization performs significantly better than simpler transform parameterizations (diagonal and/or block-diagonal)

CiteSeerX

Crossref