354 research outputs found

    Non-Standard Words as Features for Text Categorization

    Full text link
    This paper presents categorization of Croatian texts using Non-Standard Words (NSW) as features. Non-Standard Words are: numbers, dates, acronyms, abbreviations, currency, etc. NSWs in Croatian language are determined according to Croatian NSW taxonomy. For the purpose of this research, 390 text documents were collected and formed the SKIPEZ collection with 6 classes: official, literary, informative, popular, educational and scientific. Text categorization experiment was conducted on three different representations of the SKIPEZ collection: in the first representation, the frequencies of NSWs are used as features; in the second representation, the statistic measures of NSWs (variance, coefficient of variation, standard deviation, etc.) are used as features; while the third representation combines the first two feature sets. Naive Bayes, CN2, C4.5, kNN, Classification Trees and Random Forest algorithms were used in text categorization experiments. The best categorization results are achieved using the first feature set (NSW frequencies) with the categorization accuracy of 87%. This suggests that the NSWs should be considered as features in highly inflectional languages, such as Croatian. NSW based features reduce the dimensionality of the feature space without standard lemmatization procedures, and therefore the bag-of-NSWs should be considered for further Croatian texts categorization experiments.Comment: IEEE 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2014), pp. 1415-1419, 201

    Comparison of the language networks from literature and blogs

    Full text link
    In this paper we present the comparison of the linguistic networks from literature and blog texts. The linguistic networks are constructed from texts as directed and weighted co-occurrence networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within the sentence. The comparison of the networks structure is performed at global level (network) in terms of: average node degree, average shortest path length, diameter, clustering coefficient, density and number of components. Furthermore, we perform analysis on the local level (node) by comparing the rank plots of in and out degree, strength and selectivity. The selectivity-based results point out that there are differences between the structure of the networks constructed from literature and blogs

    Multilayer Network of Language: a Unified Framework for Structural Analysis of Linguistic Subsystems

    Get PDF
    Recently, the focus of complex networks research has shifted from the analysis of isolated properties of a system toward a more realistic modeling of multiple phenomena - multilayer networks. Motivated by the prosperity of multilayer approach in social, transport or trade systems, we propose the introduction of multilayer networks for language. The multilayer network of language is a unified framework for modeling linguistic subsystems and their structural properties enabling the exploration of their mutual interactions. Various aspects of natural language systems can be represented as complex networks, whose vertices depict linguistic units, while links model their relations. The multilayer network of language is defined by three aspects: the network construction principle, the linguistic subsystem and the language of interest. More precisely, we construct a word-level (syntax, co-occurrence and its shuffled counterpart) and a subword level (syllables and graphemes) network layers, from five variations of original text (in the modeled language). The obtained results suggest that there are substantial differences between the networks structures of different language subsystems, which are hidden during the exploration of an isolated layer. The word-level layers share structural properties regardless of the language (e.g. Croatian or English), while the syllabic subword level expresses more language dependent structural properties. The preserved weighted overlap quantifies the similarity of word-level layers in weighted and directed networks. Moreover, the analysis of motifs reveals a close topological structure of the syntactic and syllabic layers for both languages. The findings corroborate that the multilayer network framework is a powerful, consistent and systematic approach to model several linguistic subsystems simultaneously and hence to provide a more unified view on language

    Complex Networks Measures for Differentiation between Normal and Shuffled Croatian Texts

    Full text link
    This paper studies the properties of the Croatian texts via complex networks. We present network properties of normal and shuffled Croatian texts for different shuffling principles: on the sentence level and on the text level. In both experiments we preserved the vocabulary size, word and sentence frequency distributions. Additionally, in the first shuffling approach we preserved the sentence structure of the text and the number of words per sentence. Obtained results showed that degree rank distributions exhibit no substantial deviation in shuffled networks, and strength rank distributions are preserved due to the same word frequencies. Therefore, standard approach to study the structure of linguistic co-occurrence networks showed no clear difference among the topologies of normal and shuffled texts. Finally, we showed that the in- and out- selectivity values from shuffled texts are constantly below selectivity values calculated from normal texts. Our results corroborate that the node selectivity measure can capture structural differences between original and shuffled Croatian texts

    Plant and fungi interaction mechanisms in endotrophic mycorrhiza

    Get PDF
    Endotrofna mikoriza je simbioza između gljive i biljke, a pri čemu gljiva raste djelomično i unutar samog korijena biljke. U arbuskularnoj mikorizi (AM) gljive iz reda Glomeromycota tvore arbuskule unutar korijena biljke. To su razgranate hife nalik na drvo čija je zadaća prijenos hranjivih tvari između biljke i gljive. Prvenstveno je poboljÅ”an prijenos fosfata i duÅ”ika u biljku, a ugljik nastao fotosintezom odlazi u gljivu. Tijekom evolucije, ā€žgenetički programā€œ AM poslužio je kao temelj i za druge simbioze koje uključuju korijen biljke pa tako duÅ”ik fiksirajuće bakterije koriste isti signalni put kao i AM. Međutim, molekularni mehanizmi koji vode do AM manje su poznati od onih koji dovode do simbioze biljke i duÅ”ik fiksirajuće bakterije pa se u zadnjih nekoliko godina endotrofna mikoriza intenzivno istražuje. Zahvaljujući tome otkriveni su strigolaktoni- signalne molekule koje otpuÅ”ta biljka i koje potiču hife gljive na grananje u smjeru biljke. Otkriveni su i signali (Myc faktori) koje otpuÅ”ta gljiva prije nego Å”to kolonizira biljku, iako joÅ” nije poznat točan sastav kao ni biljni receptori za koje se vežu. Ustanovljeno je da u uspjeÅ”noj kolonizaciji korijena aktivno sudjeluje i sama biljka pripremajući intracelularni okoliÅ” za hife gljiva stvarajući prepenetracijski aparat (PPA). Također su identificirani i bitni geni koji sudjeluju u signalnom putu (SYM): SYMRK- kinaza, CASTOR i POLLUX kationski kanali, CCaMK i CYCLOPS. Otkriveno je i da oscilacija koncentracije kalcija imaju bitnu ulogu u uspostavljanju ovog simbiotskog odnosa.Endotrophic mycorrhiza is symbiosis between fungi and plant in which fungi grow partly in the plant root. In arbuscular mycorrhiza, fungi from the genus Glomeromycota form arbuscules inside the plant root. These are treelike hyphae which role is nutrients exchange between fungi and plant. Primarily, phosphorus and nitrogen transfer to the plant is improved and photosynthetic driven carbon is transferred to the fungi. During evolution, AM ā€žgenetic programā€œ was the basis for other symbioses including plant root and that's the reason why nitrogen fixing bacteria utilize the same signalling pathway. However, molecular mechanisms that lead to AM are less known than those that lead to symbiosis with nitrogen fixing bacteria, so in past few years endotrophic mycorrhiza is being intensively investigated. Due to this fact, strigolactones, plant driven signal molecules inducing hyphal branching, have been discovered. Hypha driven preinfection signals (Myc factors) are discovered too, although their identity and plant receptors are not yet being defined. The plant participates actively in successful root colonization by creating prepenetration apparatus (PPA). Also, important SYM genes involved in signalling pathway have been identified: SYMRK kinase, CASTOR and POLLUX cation channels, CCaMK and CYCLOPS. It has been discovered that oscillations in calcium concentration plays an important role in establishment of this symbiotic relationship
    • ā€¦
    corecore