150 research outputs found

    Towards an open-source universal-dependency treebank for Erzya

    Get PDF
    This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.Peer reviewe

    A hitchhiker's guide to the cullin ubiquitin ligases: SCF and its kin

    Get PDF
    AbstractThe SCF (Skp1–Cullin–F-box) E3 ubiquitin ligase family was discovered through genetic requirements for cell cycle progression in budding yeast. In these multisubunit enzymes, an invariant core complex, composed of the Skp1 linker protein, the Cdc53/Cul1 scaffold protein and the Rbx1/Roc1/Hrt1 RING domain protein, engages one of a suite of substrate adaptors called F-box proteins that in turn recruit substrates for ubiquitination by an associated E2 enzyme. The cullin–RING domain–adaptor architecture has diversified through evolution, such that in total many hundreds of distinct SCF and SCF-like complexes enable degradation of myriad substrates. Substrate recognition by adaptors often depends on posttranslational modification of the substrate, which thus places substrate stability under dynamic regulation by intracellular signaling events. SCF complexes control cell proliferation through degradation of critical regulators such as cyclins, CDK inhibitors and transcription factors. A plethora of other processes in development and disease are controlled by other SCF-like complexes, including those based on Cul2–SOCS-box adaptor protein and Cul3–BTB domain adaptor protein combinations. Recent structural insights into SCF-like complexes have begun to illuminate aspects of substrate recognition and catalytic reaction mechanisms

    The Relevance of the Source Language in Transfer Learning for ASR

    Get PDF
    This study presents new experiments on Zyrian Komi speech recognition. We use Deep-Speech to train ASR models from a language documentation corpus that contains both contemporary and archival recordings. Earlier studies have shown that transfer learning from English and using a domain matching Komi language model both improve the CER and WER. In this study we experiment with transfer learning from a more relevant source language, Russian, and including Russian text in the language model construction. The motivation for this is that Russian and Komi are contemporary contact languages, and Russian is regularly present in the corpus. We found that despite the close contact of Russian and Komi, the size of the English speech corpus yielded greater performance when used as the source language. Additionally, we can report that already an update in DeepSpeech version improved the CER by 3.9% against the earlier studies, which is an important step in the development of Komi ASR.Peer reviewe

    Rapid regulation of protein activity in fission yeast

    Get PDF
    Background: The fission yeast Schizosaccharomyces pombe is widely-used as a model organism for the study of a broad range of eukaryotic cellular processes such as cell cycle, genome stability and cell morphology. Despite the availability of extensive set of genetic, molecular biological, biochemical and cell biological tools for analysis of protein function in fission yeast, studies are often hampered by the lack of an effective method allowing for the rapid regulation of protein level or protein activity. Results: In order to be able to regulate protein function, we have made use of a previous finding that the hormone binding domain of steroid receptors can be used as a regulatory cassette to subject the activity of heterologous proteins to hormonal regulation. The approach is based on fusing the protein of interest to the hormone binding domain (HBD) of the estrogen receptor (ER). The HBD tag will attract the Hsp90 complex, which can render the fusion protein inactive. Upon addition of estradiol the protein is quickly released from the Hsp90 complex and thereby activated. We have tagged and characterised the induction of activity of four different HBD-tagged proteins. Here we show that the tag provided the means to effectively regulate the activity of two of these proteins. Conclusion: The estradiol-regulatable hormone binding domain provides a means to regulate the function of some, though not all, fission yeast proteins. This system may result in very quick and reversible activation of the protein of interest. Therefore it will be a powerful tool and it will open experimental approaches in fission yeast that have previously not been possible. Since fission yeast is a widely-used model organism, this will be valuable in many areas of research

    Dependency parsing of code-switching data with cross-lingual feature representations

    Get PDF
    Partanen N, KyungTae L, RieĂźler M, Poibeau T. Dependency parsing of code-switching data with cross-lingual feature representations. In: Pirinen TA, RieĂźler M, Rueter J, Trosterud T, Tyers FM, eds. Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages. Helsinki: Association for Computational Linguistics; 2018: 1-17

    An OCR system for the Unified Northern Alphabet

    Get PDF
    Partanen N, RieĂźler M. An OCR system for the Unified Northern Alphabet. In: Pirinen TA, Kaalep H-J, Tyers FM, Association for Computational Linguistics, eds. The fifth International Workshop on Computational Linguistics for Uralic Languages. Tartu: Association for Computational Linguistics; 2019: 77-89.This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions
    • …
    corecore