19,711 research outputs found

    Non-Standard Words as Features for Text Categorization

    Full text link
    This paper presents categorization of Croatian texts using Non-Standard Words (NSW) as features. Non-Standard Words are: numbers, dates, acronyms, abbreviations, currency, etc. NSWs in Croatian language are determined according to Croatian NSW taxonomy. For the purpose of this research, 390 text documents were collected and formed the SKIPEZ collection with 6 classes: official, literary, informative, popular, educational and scientific. Text categorization experiment was conducted on three different representations of the SKIPEZ collection: in the first representation, the frequencies of NSWs are used as features; in the second representation, the statistic measures of NSWs (variance, coefficient of variation, standard deviation, etc.) are used as features; while the third representation combines the first two feature sets. Naive Bayes, CN2, C4.5, kNN, Classification Trees and Random Forest algorithms were used in text categorization experiments. The best categorization results are achieved using the first feature set (NSW frequencies) with the categorization accuracy of 87%. This suggests that the NSWs should be considered as features in highly inflectional languages, such as Croatian. NSW based features reduce the dimensionality of the feature space without standard lemmatization procedures, and therefore the bag-of-NSWs should be considered for further Croatian texts categorization experiments.Comment: IEEE 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2014), pp. 1415-1419, 201

    Mining Images in Biomedical Publications: Detection and Analysis of Gel Diagrams

    Get PDF
    Authors of biomedical publications use gel images to report experimental results such as protein-protein interactions or protein expressions under different conditions. Gel images offer a concise way to communicate such findings, not all of which need to be explicitly discussed in the article text. This fact together with the abundance of gel images and their shared common patterns makes them prime candidates for automated image mining and parsing. We introduce an approach for the detection of gel images, and present a workflow to analyze them. We are able to detect gel segments and panels at high accuracy, and present preliminary results for the identification of gene names in these images. While we cannot provide a complete solution at this point, we present evidence that this kind of image mining is feasible.Comment: arXiv admin note: substantial text overlap with arXiv:1209.148

    Mixing Metaphors In The Cerebral Hemispheres: What Happens When Careers Collide?

    Get PDF
    Are processes of figurative comparison and figurative categorization different? An experiment combining alternative-sense and matched-sense metaphor priming with a divided visual field assessment technique sought to isolate processes of comparison and categorization in the 2 cerebral hemispheres. For target metaphors presented in the right visual field/left cerebral hemisphere (RVF/LH), only matched-sense primes were facilitative. Literal primes and alternative-sense primes had no effect on comprehension time compared to the unprimed baseline. The effects of matched-sense primes were additive with the rated conventionality of the targets. For target metaphors presented to the left visual field/right cerebral hemisphere (LVF/RH), matched-sense primes were again additively facilitative. However, alternative-sense primes, though facilitative overall, seemed to eliminate the preexisting advantages of conventional target metaphor senses in the LVF/RH in favor of metaphoric senses similar to those of the primes. These findings are consistent with tightly controlled categorical coding in the LH and coarse, flexible, context-dependent coding in the RH. (PsycINFO Database Record (c) 2013 APA, all rights reserved)(journal abstract
    corecore