4,516 research outputs found

    The ‘Galilean Style in Science’ and the Inconsistency of Linguistic Theorising

    Get PDF
    Chomsky’s principle of epistemological tolerance says that in theoretical linguistics contradictions between the data and the hypotheses may be temporarily tolerated in order to protect the explanatory power of the theory. The paper raises the following problem: What kinds of contradictions may be tolerated between the data and the hypotheses in theoretical linguistics? First a model of paraconsistent logic is introduced which differentiates between week and strong contradiction. As a second step, a case study is carried out which exemplifies that the principle of epistemological tolerance may be interpreted as the tolerance of week contradiction. The third step of the argumentation focuses on another case study which exemplifies that the principle of epistemological tolerance must not be interpreted as the tolerance of strong contradiction. The reason for the latter insight is the unreliability and the uncertainty of introspective data. From this finding the author draws the conclusion that it is the integration of different data types that may lead to the improvement of current theoretical linguistics and that the integration of different data types requires a novel methodology which, for the time being, is not available

    The ‘Galilean Style in Science’ and the Inconsistency of Linguistic Theorising

    Get PDF
    Chomsky’s principle of epistemological tolerance says that in theoretical linguistics contradictions between the data and the hypotheses may be temporarily tolerated in order to protect the explanatory power of the theory. The paper raises the following problem: What kinds of contradictions may be tolerated between the data and the hypotheses in theoretical linguistics? First a model of paraconsistent logic is introduced which differentiates between week and strong contradiction. As a second step, a case study is carried out which exemplifies that the principle of epistemological tolerance may be interpreted as the tolerance of week contradiction. The third step of the argumentation focuses on another case study which exemplifies that the principle of epistemological tolerance must not be interpreted as the tolerance of strong contradiction. The reason for the latter insight is the unreliability and the uncertainty of introspective data. From this finding the author draws the conclusion that it is the integration of different data types that may lead to the improvement of current theoretical linguistics and that the integration of different data types requires a novel methodology which, for the time being, is not available

    Patterns versus Characters in Subword-aware Neural Language Modeling

    Full text link
    Words in some natural languages can have a composite structure. Elements of this structure include the root (that could also be composite), prefixes and suffixes with which various nuances and relations to other words can be expressed. Thus, in order to build a proper word representation one must take into account its internal structure. From a corpus of texts we extract a set of frequent subwords and from the latter set we select patterns, i.e. subwords which encapsulate information on character nn-gram regularities. The selection is made using the pattern-based Conditional Random Field model with l1l_1 regularization. Further, for every word we construct a new sequence over an alphabet of patterns. The new alphabet's symbols confine a local statistical context stronger than the characters, therefore they allow better representations in Rn{\mathbb{R}}^n and are better building blocks for word representation. In the task of subword-aware language modeling, pattern-based models outperform character-based analogues by 2-20 perplexity points. Also, a recurrent neural network in which a word is represented as a sum of embeddings of its patterns is on par with a competitive and significantly more sophisticated character-based convolutional architecture.Comment: 10 page

    Aspects of the theory of syntax Special technical report no. 11

    Get PDF
    Formulation of transformational grammar - syntax theor

    Phase transition in a sexual age-structured model of learning foreign languages

    Full text link
    The understanding of language competition helps us to predict extinction and survival of languages spoken by minorities. A simple agent-based model of a sexual population, based on the Penna model, is built in order to find out under which circumstances one language dominates other ones. This model considers that only young people learn foreign languages. The simulations show a first order phase transition where the ratio between the number of speakers of different languages is the order parameter and the mutation rate is the control one.Comment: preliminary version, to be submitted to Int. J. Mod. Phys.

    Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures

    Get PDF
    The presence of Long Distance Dependencies (LDDs) in sequential data poses significant challenges for computational models. Various recurrent neural architectures have been designed to mitigate this issue. In order to test these state-of-the-art architectures, there is growing need for rich benchmarking datasets. However, one of the drawbacks of existing datasets is the lack of experimental control with regards to the presence and/or degree of LDDs. This lack of control limits the analysis of model performance in relation to the specific challenge posed by LDDs. One way to address this is to use synthetic data having the properties of subregular languages. The degree of LDDs within the generated data can be controlled through the k parameter, length of the generated strings, and by choosing appropriate forbidden strings. In this paper, we explore the capacity of different RNN extensions to model LDDs, by evaluating these models on a sequence of SPk synthesized datasets, where each subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple languages, the presence of LDDs does have significant impact on the performance of recurrent neural architectures, thus making them prime candidate in benchmarking tasks.Comment: International Conference of Artificial Neural Networks (ICANN) 201
    corecore