174 research outputs found

    Data-Driven Morphological Analysis for Uralic Languages

    Get PDF
    This paper describes an initial set of experiments in data-driven morpholog-ical analysis of Uralic languages. The paper differs from previous work in thatour work covers both lemmatization and generating ambiguous analyses. Whilehand-crafted finite-state transducers represent the state of the art in morpholog-ical analysis for most Uralic languages, we believe that there is a place for data-driven approaches, especially with respect to making up for lack of completenessin the шlexicon. We present results for nine Uralic languages that show that, atleast for basic nominal morphology for six out of the nine languages, data-drivenmethods can achieve an F-score of over 90%, providing results that approach thoseof finite-state techniques. We also compare our system to an earlier approach toFinnish data-driven morphological analysis (Silfverberg and Hulden,2018) andshow that our system outperforms this baseline.Peer reviewe

    Weird inflects but OK : Making sense of morphological generation errors

    Get PDF
    We conduct a manual error analysis of the CoNLL-SIGMORPHON 2017 Shared Task on Morphological Reinflection. In this task, systems are given a word in citation form (e.g., hug) and asked to produce the corresponding inflected form (e.g., the simple past hugged). This design lets us analyze errors much like we might analyze children's production errors. We propose an error taxonomy and use it to annotate errors made by the top two systems across twelve languages. Many of the observed errors are related to inflectional patterns sensitive to inherent linguistic properties such as animacy or affect; many others are failures to predict truly unpredictable inflectional behaviors. We also find nearly one quarter of the residual "errors" reflect errors in the gold data. © 2019 Association for Computational Linguistics.Peer reviewe

    Yet Another Format of Universal Dependencies for Korean

    Full text link
    In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automatically. The effectiveness of the proposed format for Korean dependency parsing is then testified by both statistical and neural models, including UDPipe and Stanza, with our carefully constructed morpheme-based word embedding for Korean. morphUD outperforms parsing results for all Korean UD treebanks, and we also present detailed error analyses.Comment: COLING2022, Poste

    HFST—Framework for Compiling and Applying Morphologies

    Get PDF
    HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.Peer reviewe

    Microsatellite markers spanning the apple ( Malus x domestica Borkh.) genome

    Get PDF
    A new set of 148 apple microsatellite markers has been developed and mapped on the apple reference linkage map Fiesta x Discovery. One-hundred and seventeen markers were developed from genomic libraries enriched with the repeats GA, GT, AAG, AAC and ATC; 31 were developed from EST sequences. Markers derived from sequences containing dinucleotide repeats were generally more polymorphic than sequences containing trinucleotide repeats. Additional eight SSRs from published apple, pear, and Sorbus torminalis SSRs, whose position on the apple genome was unknown, have also been mapped. The transferability of SSRs across Maloideae species resulted in being efficient with 41% of the markers successfully transferred. For all 156 SSRs, the primer sequences, repeat type, map position, and quality of the amplification products are reported. Also presented are allele sizes, ranges, and number of SSRs found in a set of nine cultivars. All this information and those of the previous CH-SSR series can be searched at the apple SSR database ( http://www.hidras.unimi.it ) to which updates and comments can be added. A large number of apple ESTs containing SSR repeats are available and should be used for the development of new apple SSRs. The apple SSR database is also meant to become an international platform for coordinating this effort. The increased coverage of the apple genome with SSRs allowed the selection of a set of 86 reliable, highly polymorphic, and overall the apple genome well-scattered SSRs. These SSRs cover about 85% of the genome with an average distance of one marker per 15c

    Functional analysis and expression profiling of HcrVf1 and HcrVf2 for development of scab resistant cisgenic and intragenic apples

    Get PDF
    Apple scab resistance genes, HcrVf1 and HcrVf2, were isolated including their native promoter, coding and terminator sequences. Two fragment lengths (short and long) of the native gene promoters and the strong apple rubisco gene promoter (PMdRbc) were used for both HcrVf genes to test their effect on expression and phenotype. The scab susceptible cultivar ‘Gala’ was used for plant transformations and after selection of transformants, they were micrografted onto apple seedling rootstocks for scab disease tests. Apple transformants were also tested for HcrVf expression by quantitative RT-PCR (qRT-PCR). For HcrVf1 the long native promoter gave significantly higher expression that the short one; in case of HcrVf2 the difference between the two was not significant. The apple rubisco gene promoter proved to give the highest expression of both HcrVf1 and HcrVf2. The top four expanding leaves were used initially for inoculation with monoconidial isolate EU-B05 which belongs to race 1 of V. inaequalis. Later six other V. inaequalis isolates were used to study the resistance spectra of the individual HcrVf genes. The scab disease assays showed that HcrVf1 did not give resistance against any of the isolates tested regardless of the expression level. The HcrVf2 gene appeared to be the only functional gene for resistance against Vf avirulent isolates of V. inaequalis. HcrVf2 did not provide any resistance to Vf virulent strains, even not in case of overexpression. In conclusion, transformants carrying the apple-derived HcrVf2 gene in a cisgenic as well as in an intragenic configuration were able to reach scab resistance levels comparable to the Vf resistant control cultivar obtained by classical breeding, cv. ‘Santana’
    corecore