3 research outputs found

    A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures

    Full text link
    The present paper introduces a novel object of study - a language fractal structure. We hypothesize that a set of embeddings of all nn-grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all nn). The paper estimates intrinsic (genuine) dimensions of language fractal structures for the Russian and English languages. To this end, we employ methods based on (1) topological data analysis and (2) a minimum spanning tree of a data graph for a cloud of points considered (Steele theorem). For both languages, for all nn, the intrinsic dimensions appear to be non-integer values (typical for fractal sets), close to 9 for both of the Russian and English language.Comment: Preprint. Under revie

    Formal concept analysis for evaluating intrinsic dimension of a natural language

    Full text link
    Some results of a computational experiment for determining the intrinsic dimension of linguistic varieties for the Bengali and Russian languages are presented. At the same time, both sets of words and sets of bigrams in these languages were considered separately. The method used to solve this problem was based on formal concept analysis algorithms. It was found that the intrinsic dimensions of these languages are significantly less than the dimensions used in popular neural network models in natural language processing.Comment: Preprint, 10th International Conference on Pattern Recognition and Machine Intelligence (PReMI 2023
    corecore