19,711 research outputs found
Non-Standard Words as Features for Text Categorization
This paper presents categorization of Croatian texts using Non-Standard Words
(NSW) as features. Non-Standard Words are: numbers, dates, acronyms,
abbreviations, currency, etc. NSWs in Croatian language are determined
according to Croatian NSW taxonomy. For the purpose of this research, 390 text
documents were collected and formed the SKIPEZ collection with 6 classes:
official, literary, informative, popular, educational and scientific. Text
categorization experiment was conducted on three different representations of
the SKIPEZ collection: in the first representation, the frequencies of NSWs are
used as features; in the second representation, the statistic measures of NSWs
(variance, coefficient of variation, standard deviation, etc.) are used as
features; while the third representation combines the first two feature sets.
Naive Bayes, CN2, C4.5, kNN, Classification Trees and Random Forest algorithms
were used in text categorization experiments. The best categorization results
are achieved using the first feature set (NSW frequencies) with the
categorization accuracy of 87%. This suggests that the NSWs should be
considered as features in highly inflectional languages, such as Croatian. NSW
based features reduce the dimensionality of the feature space without standard
lemmatization procedures, and therefore the bag-of-NSWs should be considered
for further Croatian texts categorization experiments.Comment: IEEE 37th International Convention on Information and Communication
Technology, Electronics and Microelectronics (MIPRO 2014), pp. 1415-1419,
201
Mining Images in Biomedical Publications: Detection and Analysis of Gel Diagrams
Authors of biomedical publications use gel images to report experimental
results such as protein-protein interactions or protein expressions under
different conditions. Gel images offer a concise way to communicate such
findings, not all of which need to be explicitly discussed in the article text.
This fact together with the abundance of gel images and their shared common
patterns makes them prime candidates for automated image mining and parsing. We
introduce an approach for the detection of gel images, and present a workflow
to analyze them. We are able to detect gel segments and panels at high
accuracy, and present preliminary results for the identification of gene names
in these images. While we cannot provide a complete solution at this point, we
present evidence that this kind of image mining is feasible.Comment: arXiv admin note: substantial text overlap with arXiv:1209.148
Mixing Metaphors In The Cerebral Hemispheres: What Happens When Careers Collide?
Are processes of figurative comparison and figurative categorization different? An experiment combining alternative-sense and matched-sense metaphor priming with a divided visual field assessment technique sought to isolate processes of comparison and categorization in the 2 cerebral hemispheres. For target metaphors presented in the right visual field/left cerebral hemisphere (RVF/LH), only matched-sense primes were facilitative. Literal primes and alternative-sense primes had no effect on comprehension time compared to the unprimed baseline. The effects of matched-sense primes were additive with the rated conventionality of the targets. For target metaphors presented to the left visual field/right cerebral hemisphere (LVF/RH), matched-sense primes were again additively facilitative. However, alternative-sense primes, though facilitative overall, seemed to eliminate the preexisting advantages of conventional target metaphor senses in the LVF/RH in favor of metaphoric senses similar to those of the primes. These findings are consistent with tightly controlled categorical coding in the LH and coarse, flexible, context-dependent coding in the RH. (PsycINFO Database Record (c) 2013 APA, all rights reserved)(journal abstract
- …