Search CORE

7,430 research outputs found

Exploring The Value Of Folksonomies For Creating Semantic Metadata

Author: Al-Khalifa Hend S.
Davis Hugh C.
Publication venue
Publication date: 01/01/2007
Field of study

Finding good keywords to describe resources is an on-going problem: typically we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well populated source of unstructured tags describing web resources. This paper explores the value of the folksonomy tags as potential source of keyword metadata by examining the relationship between folksonomies, community produced annotations, and keywords extracted by machines. The experiment has been carried-out in two ways: subjectively, by asking two human indexers to evaluate the quality of the generated keywords from both systems; and automatically, by measuring the percentage of overlap between the folksonomy set and machine generated keywords set. The results of this experiment show that the folksonomy tags agree more closely with the human generated keywords than those automatically generated. The results also showed that the trained indexers preferred the semantics of folksonomy tags compared to keywords extracted automatically. These results can be considered as evidence for the strong relationship of folksonomies to the human indexer’s mindset, demonstrating that folksonomies used in the del.icio.us bookmarking service are a potential source for generating semantic metadata to annotate web resources

CiteSeerX

Southampton (e-Prints Soton)

"Scholarly Hypertext: Self-Represented Complexity"

Author: Kolb David
Publication venue
Publication date: 01/01/1997
Field of study

Scholarly hypertexts involve argument and explicit selfquestioning, and can be distinguished from both informational and literary hypertexts. After making these distinctions the essay presents general principles about attention, some suggestions for self-representational multi-level structures that would enhance scholarly inquiry, and a wish list of software capabilities to support such structures. The essay concludes with a discussion of possible conflicts between scholarly inquiry and hypertext

PhilPapers

CiteSeerX

Non-Standard Words as Features for Text Categorization

Author: Beliga Slobodan
Martinčić-Ipšić Sanda
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/11/2014
Field of study

This paper presents categorization of Croatian texts using Non-Standard Words (NSW) as features. Non-Standard Words are: numbers, dates, acronyms, abbreviations, currency, etc. NSWs in Croatian language are determined according to Croatian NSW taxonomy. For the purpose of this research, 390 text documents were collected and formed the SKIPEZ collection with 6 classes: official, literary, informative, popular, educational and scientific. Text categorization experiment was conducted on three different representations of the SKIPEZ collection: in the first representation, the frequencies of NSWs are used as features; in the second representation, the statistic measures of NSWs (variance, coefficient of variation, standard deviation, etc.) are used as features; while the third representation combines the first two feature sets. Naive Bayes, CN2, C4.5, kNN, Classification Trees and Random Forest algorithms were used in text categorization experiments. The best categorization results are achieved using the first feature set (NSW frequencies) with the categorization accuracy of 87%. This suggests that the NSWs should be considered as features in highly inflectional languages, such as Croatian. NSW based features reduce the dimensionality of the feature space without standard lemmatization procedures, and therefore the bag-of-NSWs should be considered for further Croatian texts categorization experiments.Comment: IEEE 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2014), pp. 1415-1419, 201

arXiv.org e-Print Archive

Crossref

Narrative and Hypertext 2011 Proceedings: a workshop at ACM Hypertext 2011, Eindhoven

Author
Publication venue: 'University of Southampton'
Publication date: 05/03/2012
Field of study

Southampton (e-Prints Soton)