14,648 research outputs found

    Randomized Maximum Entropy Language Models

    Get PDF
    Abstract—We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store. I

    Committee-Based Sample Selection for Probabilistic Classifiers

    Full text link
    In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to the real-world natural language processing task of stochastic part-of-speech tagging. We find that all variants of the method achieve a significant reduction in annotation cost, although their computational efficiency differs. In particular, the simplest variant, a two member committee with no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger

    A network model of interpersonal alignment in dialog

    Get PDF
    In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutors’ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutors’ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutor’s dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations. Keywords: alignment in communication; structural coupling; linguistic networks; graph distance measures; mutual information of graphs; quantitative network analysi

    The Value of Help Bits in Randomized and Average-Case Complexity

    Full text link
    "Help bits" are some limited trusted information about an instance or instances of a computational problem that may reduce the computational complexity of solving that instance or instances. In this paper, we study the value of help bits in the settings of randomized and average-case complexity. Amir, Beigel, and Gasarch (1990) show that for constant kk, if kk instances of a decision problem can be efficiently solved using less than kk bits of help, then the problem is in P/poly. We extend this result to the setting of randomized computation: We show that the decision problem is in P/poly if using ℓ\ell help bits, kk instances of the problem can be efficiently solved with probability greater than 2ℓ−k2^{\ell-k}. The same result holds if using less than k(1−h(α))k(1 - h(\alpha)) help bits (where h(⋅)h(\cdot) is the binary entropy function), we can efficiently solve (1−α)(1-\alpha) fraction of the instances correctly with non-vanishing probability. We also extend these two results to non-constant but logarithmic kk. In this case however, instead of showing that the problem is in P/poly we show that it satisfies "kk-membership comparability," a notion known to be related to solving kk instances using less than kk bits of help. Next we consider the setting of average-case complexity: Assume that we can solve kk instances of a decision problem using some help bits whose entropy is less than kk when the kk instances are drawn independently from a particular distribution. Then we can efficiently solve an instance drawn from that distribution with probability better than 1/21/2. Finally, we show that in the case where kk is super-logarithmic, assuming kk-membership comparability of a decision problem, one cannot prove that the problem is in P/poly by a "black-box proof.
    • 

    corecore