18 research outputs found

    A commentary on "The now-or-never bottleneck: a fundamental constraint on language", by Christiansen and Chater (2016)

    Get PDF
    In a recent article, Christiansen and Chater (2016) present a fundamental constraint on language, i.e. a now-or-never bottleneck that arises from our fleeting memory, and explore its implications, e.g., chunk-and-pass processing, outlining a framework that promises to unify different areas of research. Here we explore additional support for this constraint and suggest further connections from quantitative linguistics and information theory

    The sum of edge lengths in random linear arrangements

    Get PDF
    Spatial networks are networks where nodes are located in a space equipped with a metric. Typically, the space is two-dimensional and until recently and traditionally, the metric that was usually considered was the Euclidean distance. In spatial networks, the cost of a link depends on the edge length, i.e. the distance between the nodes that define the edge. Hypothesizing that there is pressure to reduce the length of the edges of a network requires a null model, e.g., a random layout of the vertices of the network. Here we investigate the properties of the distribution of the sum of edge lengths in random linear arrangement of vertices, that has many applications in different fields. A random linear arrangement consists of an ordering of the elements of the nodes of a network being all possible orderings equally likely. The distance between two vertices is one plus the number of intermediate vertices in the ordering. Compact formulae for the 1st and 2nd moments about zero as well as the variance of the sum of edge lengths are obtained for arbitrary graphs and trees. We also analyze the evolution of that variance in Erdos-Renyi graphs and its scaling in uniformly random trees. Various developments and applications for future research are suggested

    The risks of mixing dependency lengths from sequences of different length

    Get PDF
    Mixing dependency lengths from sequences of different length is a common practice in language research. However, the empirical distribution of dependency lengths of sentences of the same length differs from that of sentences of varying length and the distribution of dependency lengths depends on sentence length for real sentences and also under the null hypothesis that dependencies connect vertices located in random positions of the sequence. This suggests that certain results, such as the distribution of syntactic dependency lengths mixing dependencies from sentences of varying length, could be a mere consequence of that mixing. Furthermore, differences in the global averages of dependency length (mixing lengths from sentences of varying length) for two different languages do not simply imply a priori that one language optimizes dependency lengths better than the other because those differences could be due to differences in the distribution of sentence lengths and other factors.Comment: Laguage and referencing has been improved; Eqs. 7, 11, B7 and B8 have been correcte

    On Mean Dependency Distance as a Metric of Translation Quality Assessment

    Get PDF
    This paper has adopted a quantitative approach to carry out a linguistic study, within the theoretical framework of dependency grammar. Translation is a process where source language and target language interact with each other. The present study aims at exploring the feasibility of mean dependency distance as a metric for automated translation quality assessment. The current research hypothesized that different levels of translation are significantly different in the aspect of mean dependency distance. Data of this study were based on the written translation in Parallel Corpus of Chinese EFL Learners which was composed of translations from Chinese EFL learners in various topic. The translations were human-scored to determine the levels of translation, according to which the translations were categorized. Our results indicated that: (1) senior students perform better in translation than junior students, and mean dependency distance of translations from senior group is significantly shorter than the junior; (2) high quality translations yield shorter mean dependency distance than the low quality translations; (3) mean dependency distance of translations is moderately correlated with the human score. The resultant implication suggests the potential for mean dependency distance in differentiating translations of different quality

    The linear arrangement library: A new tool for research on syntactic dependency structures

    Get PDF
    The new and growing field of Quantitative Dependency Syntax has emerged at the crossroads between Dependency Syntax and Quantitative Linguistics. One of the main concerns in this field is the statistical patterns of syntactic dependency structures. These structures, grouped in treebanks, are the source for statistical analyses in these and related areas; dozens of scores devised over the years are the tools of a new industry to search for patterns and perform other sorts of analyses. The plethora of such metrics and their increasing complexity require sharing the source code of the programs used to perform such analyses. However, such code is not often shared with the scientific community or is tested following unknown standards. Here we present a new open-source tool, the Linear Arrangement Library (LAL), which caters to the needs of, especially, inexperienced programmers. This tool enables the calculation of these metrics on single syntactic dependency structures, treebanks, and collection of treebanks, grounded on ease of use and yet with great flexibility. LAL has been designed to be efficient, easy to use (while satisfying the needs of all levels of programming expertise), reliable (thanks to thorough testing), and to unite research from different traditions, geographic areas, and research fields.LAP is supported by Secretaria d’Universitats i Recerca de la Generalitat de Catalunya and the Social European Fund. RFC and LAP are supported by the grant TIN2017-89244-R from MINECO (Ministerio de Economía, Industria y Competitividad). RFC is also supported by the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). JLE is funded by the grant PID2019-109137GB-C22 from MINECO.Peer ReviewedPostprint (published version

    Language transfer in L2 academic writings: a dependency grammar approach

    Get PDF
    Dependency distance (DD) is an important factor in language processing and can affect the ease with which a sentence is understood. Previous studies have investigated the role of DD in L2 writing, but little is known about how the native language influences DD in L2 academic writing. This study is probably the first one that investigates, though a large dataset of over 400 million words, whether the native language of L2 writers influences the DD in their academic writings. Using a dataset of over 2.2 million abstracts of articles downloaded from Scopus in the fields of Arts & Humanities and Social Sciences, the study analyzes the DD patterns, parsed by the latest version of the syntactic parser Stanford Corenlp 4.5.5, in the academic writing of L2 learners from different language backgrounds. It is found that native languages influence the DD of English L2 academic writings. When the mean dependency distance (MDD) of native languages is much longer than that of native English, the MDD of their English L2 academic writings will be much longer than that of English native academic writings. The findings of this study will deepen our insights into the influence of native language transfer on L2 academic writing, potentially shaping pedagogical strategies in L2 academic writing education

    Does Scale-Free Syntactic Network Emerge in Second Language Learning?

    Get PDF
    Language is a complex system during whose operation many properties may emerge spontaneously. Using complex network approach, existing studies have found that, in first language (L1) acquisition, syntactic complex network featuring the scale-free and the small-world properties, will emerge at the age of 24 months. For foreign language (L2) learning, however, researchers have not reached a consensus on whether syntactic network with these two properties will emerge. Therefore, this study adopts complex network approach in L2 learning study, attempting to answer this question. In this study, nine networks are constructed on the basis of English compositions by Chinese students. Properties of these networks reveal that the syntactic network featuring these two properties, instead of emerging suddenly at a certain point, has existed at the very beginning of the L2 learning of Chinese students, and persists throughout the entire process of L2 learning, which is different from what has been found in L1 acquisition. The reason is probably that the already established L1 syntactic system provides foundation for L2 syntactic learning, and L2 learners tend to use the entrenched L1 syntactic network to generate L2 syntactic structures. L2 syntactic learning thus is not characterized by a sudden emergence of syntactic system, but a gradual approximation to the target language, with its own unique properties. For the first time, this study provides a tentative answer to L2 syntactic emergence from the perspective of complex network, and provides a macroscopic description of L2 syntactic developmental trajectory

    児童作文における係り受け距離と階層距離

    Get PDF
    University of Tsukuba会議名: 言語資源活用ワークショップ2021, 開催地: オンライン, 会期: 2021年9月13日-14日, 主催: 国立国語研究所 コーパス開発センター児童作文の文節係り受け構造について、係り受け距離と階層距離(係り受けの深さ)の分布を調べた。係り受け距離和と階層距離和の頻度分布はいずれも対数正規分布に従っており、それを文節数−1で除した係り受け平均と階層距離平均も同様の分布だった。係り受け距離平均と階層距離平均は文節数に従って大きくなるので、学年を変量効果としてμ=(af+ar)log(n/2)で線形混合モデル分析を行った。固定効果は後者の方が大きく、全体としては長い係り受けよりも深い係り受けを使って文を長くすることが分かった。また、変量効果を見ると小学校低学年から中学年にかけては長い係り受けを比較的多く使用し、高学年以降は比較的使わなくなっていくこと、ほぼ全学年を通じて学年が上がるほど深い係り受けをより多く使用するようになることが分かった
    corecore