Search CORE

14,648 research outputs found

Randomized Maximum Entropy Language Models

Author: Asela Gunawardana
Puyang Xu
Sanjeev Khudanpur
Publication venue
Publication date
Field of study

Abstract—We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store. I

CiteSeerX

Committee-Based Sample Selection for Probabilistic Classifiers

Author: Argamon-Engelson S.
Dagan I.
Publication venue: 'AI Access Foundation'
Publication date: 01/06/2011
Field of study

In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to the real-world natural language processing task of stochastic part-of-speech tagging. We find that all variants of the method achieve a significant reduction in annotation cost, although their computational efficiency differs. In particular, the simplest variant, a two member committee with no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger

arXiv.org e-Print Archive

Crossref

A network model of interpersonal alignment in dialog

Author: Alexander Mehler
Anderson
Andy Lücking
Barrat
Bonchev
Bunke
Caldarelli
Caldarelli
Church
Clark
Cover
Diestel
Erdős
Feldman
Garey
Giles
Gärdenfors
Halliday
Kamp
Kraskov
Levelt
Lewis
Manning
Maturana
Mehler
Mehler
Pastor-Satorras
Petra Weiß
Rieger
Schenker
Schober
Tuldava
Publication venue
Publication date: 01/01/2010
Field of study

In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutors’ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutors’ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutor’s dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations. Keywords: alignment in communication; structural coupling; linguistic networks; graph distance measures; mutual information of graphs; quantitative network analysi

Crossref

Directory of Open Access Journals

Publications at Bielefeld University

Hochschulschriftenserver - Universität Frankfurt am Main

The Value of Help Bits in Randomized and Average-Case Complexity

Author: Beigi Salman
Etesami Omid
Gohari Amin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/08/2014
Field of study

"Help bits" are some limited trusted information about an instance or instances of a computational problem that may reduce the computational complexity of solving that instance or instances. In this paper, we study the value of help bits in the settings of randomized and average-case complexity. Amir, Beigel, and Gasarch (1990) show that for constant

k

, if

k

instances of a decision problem can be efficiently solved using less than

k

bits of help, then the problem is in P/poly. We extend this result to the setting of randomized computation: We show that the decision problem is in P/poly if using

\ell

help bits,

k

instances of the problem can be efficiently solved with probability greater than

2^{\ell-k}

. The same result holds if using less than

k(1 - h(\alpha))

help bits (where

h(\cdot)

is the binary entropy function), we can efficiently solve

(1-\alpha)

fraction of the instances correctly with non-vanishing probability. We also extend these two results to non-constant but logarithmic

k

. In this case however, instead of showing that the problem is in P/poly we show that it satisfies "

k

-membership comparability," a notion known to be related to solving

k

instances using less than

k

bits of help. Next we consider the setting of average-case complexity: Assume that we can solve

k

instances of a decision problem using some help bits whose entropy is less than

k

when the

k

instances are drawn independently from a particular distribution. Then we can efficiently solve an instance drawn from that distribution with probability better than

1/2

. Finally, we show that in the case where

k

is super-logarithmic, assuming

k

-membership comparability of a decision problem, one cannot prove that the problem is in P/poly by a "black-box proof.

arXiv.org e-Print Archive

CiteSeerX