110,388 research outputs found

    Learning morphology with Morfette

    Get PDF
    Morfette is a modular, data-driven, probabilistic system which learns to perform joint morphological tagging and lemmatization from morphologically annotated corpora. The system is composed of two learning modules which are trained to predict morphological tags and lemmas using the Maximum Entropy classifier. The third module dynamically combines the predictions of the Maximum-Entropy models and outputs a probability distribution over tag-lemma pair sequences. The lemmatization module exploits the idea of recasting lemmatization as a classification task by using class labels which encode mappings from wordforms to lemmas. Experimental evaluation results and error analysis on three morphologically rich languages show that the system achieves high accuracy with no language-specific feature engineering or additional resources

    Challenges in identifying and interpreting organizational modules in morphology

    Get PDF
    Form is a rich concept that agglutinates information about the proportions and topological arrangement of body parts. Modularity is readily measurable in both features, the variation of proportions (variational modules) and the organization of topology (organizational modules). The study of variational modularity and of organizational modularity faces similar challenges regarding the identification of meaningful modules and the validation of generative processes; however, most studies in morphology focus solely on variational modularity, while organizational modularity is much less understood. A possible cause for this bias is the successful development in the last twenty years of morphometrics, and specially geometric morphometrics, to study patters of variation. This contrasts with the lack of a similar mathematical framework to deal with patterns of organization. Recently, a new mathematical framework has been proposed to study the organization of gross anatomy using tools from Network Theory, so‐called Anatomical Network Analysis (AnNA). In this essay, I explore the potential use of this new framework—and the challenges it faces in identifying and validating biologically meaningful modules in morphological systems—by providing working examples of a complete analysis of modularity of the human skull and upper limb. Finally, I suggest further directions of research that may bridge the gap between variational and organizational modularity studies, and discuss how alternative modeling strategies of morphological systems using networks can benefit from each other

    SKOPE: A connectionist/symbolic architecture of spoken Korean processing

    Full text link
    Spoken language processing requires speech and natural language integration. Moreover, spoken Korean calls for unique processing methodology due to its linguistic characteristics. This paper presents SKOPE, a connectionist/symbolic spoken Korean processing engine, which emphasizes that: 1) connectionist and symbolic techniques must be selectively applied according to their relative strength and weakness, and 2) the linguistic characteristics of Korean must be fully considered for phoneme recognition, speech and language integration, and morphological/syntactic processing. The design and implementation of SKOPE demonstrates how connectionist/symbolic hybrid architectures can be constructed for spoken agglutinative language processing. Also SKOPE presents many novel ideas for speech and language processing. The phoneme recognition, morphological analysis, and syntactic analysis experiments show that SKOPE is a viable approach for the spoken Korean processing.Comment: 8 pages, latex, use aaai.sty & aaai.bst, bibfile: nlpsp.bib, to be presented at IJCAI95 workshops on new approaches to learning for natural language processin

    A MT System from Turkmen to Turkish employing finite state and statistical methods

    Get PDF
    In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages

    Comparison of musculoskeletal networks of the primate forelimb

    Get PDF
    Anatomical network analysis is a framework for quantitatively characterizing the topological organization of anatomical structures, thus providing a way to compare structural integration and modularity among species. Here we apply this approach to study the macroevolution of the forelimb in primates, a structure whose proportions and functions vary widely within this group. We analyzed musculoskeletal network models in 22 genera, including members of all major extant primate groups and three outgroup taxa, after an extensive literature survey and dissections. The modules of the proximal limb are largely similar among taxa, but those of the distal limb show substantial variation. Some network parameters are similar within phylogenetic groups (e.g., non-primates, strepsirrhines, New World monkeys, and hominoids). Reorganization of the modules in the hominoid hand compared to other primates may relate to functional changes such as coordination of individual digit movements, increased pronation/supination, and knuckle-walking. Surprisingly, humans are one of the few taxa we studied in which the thumb musculoskeletal structures do not form an independent anatomical module. This difference may be caused by the loss in humans of some intrinsic muscles associated with the digits or the acquisition of additional muscles that integrate the thumb more closely with surrounding structures

    Comparing rule-based and data-driven approaches to Spanish-to-Basque machine translation

    Get PDF
    In this paper, we compare the rule-based and data-driven approaches in the context of Spanish-to-Basque Machine Translation. The rule-based system we consider has been developed specifically for Spanish-to-Basque machine translation, and is tuned to this language pair. On the contrary, the data-driven system we use is generic, and has not been specifically designed to deal with Basque. Spanish-to-Basque Machine Translation is a challenge for data-driven approaches for at least two reasons. First, there is lack of bilingual data on which a data-driven MT system can be trained. Second, Basque is a morphologically-rich agglutinative language and translating to Basque requires a huge generation of morphological information, a difficult task for a generic system not specifically tuned to Basque. We present the results of a series of experiments, obtained on two different corpora, one being “in-domain” and the other one “out-of-domain” with respect to the data-driven system. We show that n-gram based automatic evaluation and edit-distance-based human evaluation yield two different sets of results. According to BLEU, the data-driven system outperforms the rule-based system on the in-domain data, while according to the human evaluation, the rule-based approach achieves higher scores for both corpora
    corecore