2,432 research outputs found

    Adverbial clauses and adverbial concord

    Get PDF
    This paper speculates that the merge site of an adverbial clause, i.e. its external syntax, is determined by its derivational history, i.e. its internal syntax. Starting from the distinction between central adverbial clauses and peripheral adverbial clauses, it is first shown that the degree of integration of an adverbial clause correlates with its internal syntax, i.e. the availability of left peripheral functional material. The correlation can be informally stated as follows "the more structure is manifested in the adverbial clause, the higher it is merged". This paper develops a derivational account for this correlation. The proposal adopts the movement derivation of adverbial clauses, according to which, like relative clauses, adverbial clauses are derived by movement of a specialized IP-related operator (aspectual, temporal, modal, etc) to the left periphery. The paper explores observations drawn from the traditional literature on Japanese grammar (Minami 1974; Noda 1989; 2002) to the effect that the amount of TP-internal functional structure in an adverbial clause also correlates with the presence of specialized functional particles in the matrix clause with which the clause merges. Specifically, we explore Japanese data discussed in Endo (2011; 2012). It is proposed that the merger of an adverbial clause with the associated main clause is determined by the label of the adverbial clause, itself the result of the movement derivation

    Report on first selection of resources

    Get PDF
    The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin

    Mapping Late Hokusai Research: Digitizing and Publishing Bilingual Research Data

    Full text link
    The initiative “Late Hokusai: Thought, Technique, Society” took place at the British Museum (BM) and SOAS, University of London (2016–2019). As part of its activities, it built a linked-data platform prototype on ResearchSpace. The prototype offers a redesigned process for how museum researchers and users find, research with, discuss and expand bilingual data about early modern Japanese artist Katsushika Hokusai (1760–1849) and instigated a discussion about what a collaborative research platform for the Hokusai research community could look like. While Japanese resource specialists have long recognized the complexity of Japanese script as a challenge for multilingual research and collection platforms, the processes for and results of integrating Japanese source data into bi- or multilingual museum databases remained unsatisfactory. This paper revisits the challenges posed by “non-Latin script” (NLS) in museum databases in the case of the Hokusai research platform at the British Museum, which integrated Japanese and English languages. It localizes the issues arising from working with Japanese source data in the Latin script project environment and accompanies the museum researchers’ tasks regarding the correct input, rendering and display of the source script at each step: 1) object analysis, 2) registering NLS metadata, 3) processing NLS information and 4) visualizing LS and NLS information for general and specialist audiences. After assessing these practices, the paper critically reflects on selected approaches, successes, and shortcomings experienced while creating such a prototype. By sharing its experiences, the project hopes to aid prospective research projects on a similar path regarding project setup and documentation. Furthermore, it advocates the sustainability of research practices according to data reusability parameters. L’initiative « Late Hokusai : Thought Technique and Society » (Hokusai tardif : Pensées techniques et société) a eu lieu au British Museum (BM) et SOAS, l’Université de Londres (2016-2019). Dans le cadre des activités, cette initiative a produit une plateforme prototype de Web des données sur ResearchSpace. Le prototype offre un processus redessiné aidant les chercheurs de musée et les usagers à trouver, à faire de la recherche, à discuter et à étoffer les données bilingues concernant l’artiste Katsushika Hokusai (1760-1849), du début de l’ère moderne japonaise. Cela a déclenché une discussion sur l’apparence possible d’une plateforme de recherche collaborative dédiée à la communauté de recherche sur Hokusai. Tandis que les spécialistes de ressources japonaises reconnaissent depuis longtemps la complexité de l’écriture japonaise comme un défi pour la recherche multilingue et pour les plateformes de collection, les processus et les résultats de l’intégration des données sources japonaises dans des bases de données de musées bi- ou plurilingues demeurent insatisfaisants. Cet article réexamine les défis liés à des « écritures non-latines » (NLS, non-Latin script) dans des bases de données de musée dans le cas de la plateforme de recherche sur Hokusai au British Museum, ce qui a intégré les langues japonaise et anglaise. L’article localise les questions qui se posent durant le travail avec les données sources japonaises dans un environnement de projet en écriture latine et accompagne les tâches des chercheurs de musée concernant l’entrée correcte, le rendu et l’affichage de l’écriture source à chaque étape : 1) les analyses d’objet, 2) les enregistrements de métadonnées NLS, 3) le traitement de l’information NLS et 4) la visualisation de l’information LS (écriture latine, Latin script) et NLS pour des audiences générales et spécialistes. Cet article présentera une évaluation de ces pratiques et, ensuite, considérera de façon critique les approches sélectionnées, les succès et les défauts rencontrés pendant la création d’un tel prototype. En partageant ces expériences, ce projet vise à aider des projets de recherche prospectifs qui se trouvent dans un cas similaire, considérant la configuration de projets et la documentation. En outre, ce projet promeut la viabilité de pratiques de recherche conformément à des paramètres de réutilisation de données

    Word order affects the time course of sentence formulation in Tzeltal

    No full text
    The scope of planning during sentence formulation is known to be flexible, as it can be influenced by speakers' communicative goals and language production pressures (among other factors). Two eye-tracked picture description experiments tested whether the time course of formulation is also modulated by grammatical structure and thus whether differences in linear word order across languages affect the breadth and order of conceptual and linguistic encoding operations. Native speakers of Tzeltal [a primarily verb–object–subject (VOS) language] and Dutch [a subject–verb–object (SVO) language] described pictures of transitive events. Analyses compared speakers' choice of sentence structure across events with more accessible and less accessible characters as well as the time course of formulation for sentences with different word orders. Character accessibility influenced subject selection in both languages in subject-initial and subject-final sentences, ruling against a radically incremental formulation process. In Tzeltal, subject-initial word orders were preferred over verb-initial orders when event characters had matching animacy features, suggesting a possible role for similarity-based interference in influencing word order choice. Time course analyses revealed a strong effect of sentence structure on formulation: In subject-initial sentences, in both Tzeltal and Dutch, event characters were largely fixated sequentially, while in verb-initial sentences in Tzeltal, relational information received priority over encoding of either character during the earliest stages of formulation. The results show a tight parallelism between grammatical structure and the order of encoding operations carried out during sentence formulation

    An Efficient Method for Generating Synthetic Data for Low-Resource Machine Translation – An empirical study of Chinese, Japanese to Vietnamese Neural Machine Translation

    Get PDF
    Data sparsity is one of the challenges for low-resource language pairs in Neural Machine Translation (NMT). Previous works have presented different approaches for data augmentation, but they mostly require additional resources and obtain low-quality dummy data in the low-resource issue. This paper proposes a simple and effective novel for generating synthetic bilingual data without using external resources as in previous approaches. Moreover, some works recently have shown that multilingual translation or transfer learning can boost the translation quality in low-resource situations. However, for logographic languages such as Chinese or Japanese, this approach is still limited due to the differences in translation units in the vocabularies. Although Japanese texts contain Kanji characters that are derived from Chinese characters, and they are quite homologous in sharp and meaning, the word orders in the sentences of these languages have a big divergence. Our study will investigate these impacts in machine translation. In addition, a combined pre-trained model is also leveraged to demonstrate the efficacy of translation tasks in the more high-resource scenario. Our experiments present performance improvements up to +6.2 and +7.8 BLEU scores over bilingual baseline systems on two low-resource translation tasks from Chinese to Vietnamese and Japanese to Vietnamese

    ミャンマー語テキストの形式手法による音節分割、正規化と辞書順排列

    Get PDF
    国立大学法人長岡技術科学大

    Sacral agenesis: a pilot whole exome sequencing and copy number study

    Get PDF
    Background: Caudal regression syndrome (CRS) or sacral agenesis is a rare congenital disorder characterized by a constellation of congenital caudal anomalies affecting the caudal spine and spinal cord, the hindgut, the urogenital system, and the lower limbs. CRS is a complex condition, attributed to an abnormal development of the caudal mesoderm, likely caused by the effect of interacting genetic and environmental factors. A well-known risk factor is maternal type 1 diabetes. Method: Whole exome sequencing and copy number variation (CNV) analyses were conducted on 4 Caucasian trios to identify de novo and inherited rare mutations. Results: In this pilot study, exome sequencing and copy number variation (CNV) analyses implicate a number of candidate genes, including SPTBN5, MORN1, ZNF330, CLTCL1 and PDZD2. De novo mutations were found in SPTBN5, MORN1 and ZNF330 and inherited predicted damaging mutations in PDZD2 (homozygous) and CLTCL1 (compound heterozygous). Importantly, predicted damaging mutations in PTEN (heterozygous), in its direct regulator GLTSCR2 (compound heterozygous) and in VANGL1 (heterozygous) were identified. These genes had previously been linked with the CRS phenotype. Two CNV deletions, one de novo (chr3q13.13) and one homozygous (chr8p23.2), were detected in one of our CRS patients. These deletions overlapped with CNVs previously reported in patients with similar phenotype. Conclusion: Despite the genetic diversity and the complexity of the phenotype, this pilot study identified genetic features common across CRS patients

    Head Labeling Preference and Language Change

    Get PDF
    This dissertation explores a cross-linguistic trend of a diachronic loss of obligatory syntactic movement, which includes the loss of phrasal movement, as in the new observation regarding the unidirectionality of wh-dependency changes—always from fronting to in-situ (e.g. Old to Modern Japanese, Archaic to Modern Mandarin, or Latin to Modern Romance, as well as the loss of head-movement, e.g. V-to-T in English or Swedish. I propose a unified explanation for these changes based on the preference for head-phrase {H,YP} configurations from the perspective of labeling (Chomsky 2013). I argue that the pressures imposed by Labeling Algorithm to maximize head-phrase configuration and minimize the {XP,YP} as well as {X,Y} merger (which are dispreferred from the standpoint of labeling) make the latter ones fragile and prone to loss. I extend this analysis to traditional grammaticalization and additional phenomena, e.g., change of word order and the loss of traditional rightward adjunction. I also investigate specifiers which are more resistant to diachronic change, in particular cases involving multiple movement and show that the loss of movement goes through a single wh-movement stage. I also explore the motivation for the existence of movement in general, discussing its semantic and interface-based triggers. Additionally, I propose an account of V2 where V2 involves two distinct configurations with distinct syntactic mechanisms and licensing conditions, with only one of them being subject to diachronic loss.I also explore the connection between historical change and language acquisition by investigating acquisitional errors of omission in the acquisition of reflexive clitics in Polish. I confirm the connection between acquisition and diachronic change by the history of SE-reflexives in Russian, as well as a broader pattern of acquisition with both monolingual and bilingual children, as well as mixed language varieties and show that all these phenomena provide support for the labeling-based structural preferences argued for in the thesis
    corecore