29 research outputs found

    A fast genomic dictionary

    No full text
    Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 252-254).by Valentin I. Spitkovsky.S.B.and M.Eng

    Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

    No full text
    Many statistical learning problems in NLP call for local model search methods. But accuracy tends to suffer with current techniques, which often explore either too narrowly or too broadly: hill-climbers can get stuck in local optima, whereas samplers may be inefficient. We propose to arrange individual local optimizers into organized networks. Our building blocks are operators of two types: (i) transform, which suggests new places to search, via non-random restarts from already-found local optima; and (ii) join, which merges candidate solutions to find better optima. Experiments on grammar induction show that pursuing different transforms (e.g., discarding parts of a learned model or ignoring portions of training data) results in improvements. Groups of locally-optimal solutions can be further perturbed jointly, by constructing mixtures. Using these tools, we designed several modular dependency grammar induction networks of increasing complexity. Our complete system achieves 48.6 % accuracy (directed dependency macro-average over all 19 languages in the 2006/7 CoNLL data) β€” more than 5% higher than the previous state-of-the-art.
    corecore