14,648 research outputs found
Randomized Maximum Entropy Language Models
AbstractâWe address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store. I
Committee-Based Sample Selection for Probabilistic Classifiers
In many real-world learning tasks, it is expensive to acquire a sufficient
number of labeled examples for training. This paper investigates methods for
reducing annotation cost by `sample selection'. In this approach, during
training the learning program examines many unlabeled examples and selects for
labeling only those that are most informative at each stage. This avoids
redundantly labeling examples that contribute little new information. Our work
follows on previous research on Query By Committee, extending the
committee-based paradigm to the context of probabilistic classification. We
describe a family of empirical methods for committee-based sample selection in
probabilistic classification models, which evaluate the informativeness of an
example by measuring the degree of disagreement between several model variants.
These variants (the committee) are drawn randomly from a probability
distribution conditioned by the training set labeled so far. The method was
applied to the real-world natural language processing task of stochastic
part-of-speech tagging. We find that all variants of the method achieve a
significant reduction in annotation cost, although their computational
efficiency differs. In particular, the simplest variant, a two member committee
with no parameters to tune, gives excellent results. We also show that sample
selection yields a significant reduction in the size of the model used by the
tagger
A network model of interpersonal alignment in dialog
In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutorsâ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutorsâ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutorâs dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations. Keywords: alignment in communication; structural coupling; linguistic networks; graph distance measures; mutual information of graphs; quantitative network analysi
The Value of Help Bits in Randomized and Average-Case Complexity
"Help bits" are some limited trusted information about an instance or
instances of a computational problem that may reduce the computational
complexity of solving that instance or instances. In this paper, we study the
value of help bits in the settings of randomized and average-case complexity.
Amir, Beigel, and Gasarch (1990) show that for constant , if instances
of a decision problem can be efficiently solved using less than bits of
help, then the problem is in P/poly. We extend this result to the setting of
randomized computation: We show that the decision problem is in P/poly if using
help bits, instances of the problem can be efficiently solved with
probability greater than . The same result holds if using less than
help bits (where is the binary entropy function),
we can efficiently solve fraction of the instances correctly with
non-vanishing probability. We also extend these two results to non-constant but
logarithmic . In this case however, instead of showing that the problem is
in P/poly we show that it satisfies "-membership comparability," a notion
known to be related to solving instances using less than bits of help.
Next we consider the setting of average-case complexity: Assume that we can
solve instances of a decision problem using some help bits whose entropy is
less than when the instances are drawn independently from a particular
distribution. Then we can efficiently solve an instance drawn from that
distribution with probability better than .
Finally, we show that in the case where is super-logarithmic, assuming
-membership comparability of a decision problem, one cannot prove that the
problem is in P/poly by a "black-box proof.
- âŠ