58 research outputs found

    Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing

    Get PDF
    This thesis focuses on unsupervised dependency parsing—parsing sentences of a language into dependency trees without accessing the training data of that language. Different from most prior work that uses unsupervised learning to estimate the parsing parameters, we estimate the parameters by supervised training on synthetic languages. Our parsing framework has three major components: Synthetic language generation gives a rich set of training languages by mix-and-match over the real languages; surface-form feature extraction maps an unparsed corpus of a language into a fixed-length vector as the syntactic signature of that language; and, finally, language-agnostic parsing incorporates the syntactic signature during parsing so that the decision on each word token is reliant upon the general syntax of the target language. The fundamental question we are trying to answer is whether some useful information about the syntax of a language could be inferred from its surface-form evidence (unparsed corpus). This is the same question that has been implicitly asked by previous papers on unsupervised parsing, which only assumes an unparsed corpus to be available for the target language. We show that, indeed, useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well. This thesis contains several large-scale experiments requiring hundreds of thousands of CPU-hours. To our knowledge, this is the largest study of unsupervised parsing yet attempted. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training yields further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous works’ interpretable typological features that require parsed corpora or expert categorization of the language

    Modeling Weather Impact on a Secondary Electrical Grid

    Get PDF
    Weather can cause problems for underground electrical grids by increasing the probability of serious “manhole events” such as fires and explosions. In this work, we compare a model that incorporates weather features associated with the dates of serious events into a single logistic regression, with a more complex approach that has three interdependent log linear models for weather, baseline manhole vulnerability, and vulnerability of manholes to weather. The latter approach more naturally incorporates the dependencies between the weather, structure properties, and structure vulnerability

    Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing

    No full text
    This thesis focuses on unsupervised dependency parsing—parsing sentences of a language into dependency trees without accessing the training data of that language. Different from most prior work that uses unsupervised learning to estimate the parsing parameters, we estimate the parameters by supervised training on synthetic languages. Our parsing framework has three major components: Synthetic language generation gives a rich set of training languages by mix-and-match over the real languages; surface-form feature extraction maps an unparsed corpus of a language into a fixed-length vector as the syntactic signature of that language; and, finally, language-agnostic parsing incorporates the syntactic signature during parsing so that the decision on each word token is reliant upon the general syntax of the target language. The fundamental question we are trying to answer is whether some useful information about the syntax of a language could be inferred from its surface-form evidence (unparsed corpus). This is the same question that has been implicitly asked by previous papers on unsupervised parsing, which only assumes an unparsed corpus to be available for the target language. We show that, indeed, useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well. This thesis contains several large-scale experiments requiring hundreds of thousands of CPU-hours. To our knowledge, this is the largest study of unsupervised parsing yet attempted. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training yields further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous works’ interpretable typological features that require parsed corpora or expert categorization of the language

    Galactic Dependencies

    No full text
    Dataset for TACL submission "The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages". The scripts and model parameters for replicating this dataset are available at https://github.com/gdtreebank/gdtreebank

    Agarose-Degrading Characteristics of a Deep-Sea Bacterium Vibrio Natriegens WPAGA4 and Its Cold-Adapted GH50 Agarase Aga3420

    No full text
    Up until now, the characterizations of GH50 agarases from Vibrio species have rarely been reported compared to GH16 agarases. In this study, a deep-sea strain, WPAGA4, was isolated and identified as Vibrio natriegens due to the maximum similarity of its 16S rRNA gene sequence, the values of its average nucleotide identity, and through digital DNA–DNA hybridization. Two circular chromosomes in V. natriegens WPAGA4 were assembled. A total of 4561 coding genes, 37 rRNA, 131 tRNA, and 59 other non-coding RNA genes were predicted in the genome of V. natriegens WPAGA4. An agarase gene belonging to the GH50 family was annotated in the genome sequence and expressed in E. coli cells. The optimum temperature and pH of the recombinant Aga3420 (rAga3420) were 40 °C and 7.0, respectively. Neoagarobiose (NA2) was the only product during the degradation process of agarose by rAga3420. rAga3420 had a favorable stability following incubation at 10–30 °C for 50 min. The Km, Vmax, and kcat values of rAga3420 were 2.8 mg/mL, 78.1 U/mg, and 376.9 s−1, respectively. rAga3420 displayed cold-adapted properties as 59.7% and 41.2% of the relative activity remained at 10 3 °C and 0 °C, respectively. This property ensured V. natriegens WPAGA4 could degrade and metabolize the agarose in cold deep-sea environments and enables rAga3420 to be an appropriate industrial enzyme for NA2 production, with industrial potential in medical and cosmetic fields

    Effect of Sulfuric Acid Corrosion on Flotation Performance of Calcite by Changing Surface Roughness

    No full text
    Surface roughness is a crucial factor that affects the flotation performance of minerals. In this study, the effect of sulfuric acid corrosion on the surface roughness of calcite flotation was investigated through microflotation tests, scanning electron microscopy (SEM–EDS), atomic force microscopy (AFM), Fourier transform infrared (FT-IR) spectroscopy, and contact angle analysis. Microflotation test results show that sulfuric acid treatment has a serious negative effect on the floatability of calcite. When the sulfuric acid dosage was 4 mL (3 mol/L), the flotation recovery of calcite was reduced to less than 19%. SEM–EDS and AFM results verified that the sulfuric acid treatment significantly changed the surface morphology of calcite, reduced the average surface roughness and surface area, and reduced the amount of active Ca2+ sites on the calcite surface. As characterized by FT-IR and contact angle analyses, the sulfuric acid treatment enhanced the hydrophilicity of the calcite surface and reduced the amount of sodium oleate adsorbed on the calcite surface. Consequently, sulfuric acid corrosion can reduce the average surface roughness of calcite and have a serious negative effect on the flotation performance of calcite
    • …
    corecore