Search CORE

115,198 research outputs found

Comparing the hierarchy of author given tags and repository given tags in a large document archive

Author: Palla Gergely
Pollner Péter
Tibély Gergely
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/06/2015
Field of study

Folksonomies - large databases arising from collaborative tagging of items by independent users - are becoming an increasingly important way of categorizing information. In these systems users can tag items with free words, resulting in a tripartite item-tag-user network. Although there are no prescribed relations between tags, the way users think about the different categories presumably has some built in hierarchy, in which more special concepts are descendants of some more general categories. Several applications would benefit from the knowledge of this hierarchy. Here we apply a recent method to check the differences and similarities of hierarchies resulting from tags given by independent individuals and from tags given by a centrally managed repository system. The results from out method showed substantial differences between the lower part of the hierarchies, and in contrast, a relatively high similarity at the top of the hierarchies.Comment: 10 page

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Image classification by visual bag-of-words refinement and reduction

Author: Lu Zhiwu
Wang Liwei
Wen Ji-Rong
Publication venue: 'Elsevier BV'
Publication date: 18/01/2015
Field of study

This paper presents a new framework for visual bag-of-words (BOW) refinement and reduction to overcome the drawbacks associated with the visual BOW model which has been widely used for image classification. Although very influential in the literature, the traditional visual BOW model has two distinct drawbacks. Firstly, for efficiency purposes, the visual vocabulary is commonly constructed by directly clustering the low-level visual feature vectors extracted from local keypoints, without considering the high-level semantics of images. That is, the visual BOW model still suffers from the semantic gap, and thus may lead to significant performance degradation in more challenging tasks (e.g. social image classification). Secondly, typically thousands of visual words are generated to obtain better performance on a relatively large image dataset. Due to such large vocabulary size, the subsequent image classification may take sheer amount of time. To overcome the first drawback, we develop a graph-based method for visual BOW refinement by exploiting the tags (easy to access although noisy) of social images. More notably, for efficient image classification, we further reduce the refined visual BOW model to a much smaller size through semantic spectral clustering. Extensive experimental results show the promising performance of the proposed framework for visual BOW refinement and reduction

arXiv.org e-Print Archive

Comparing the hierarchy of keywords in on-line news portals

Author: A Clauset
A Trusina
AL Barabási
B Corominas-Murtra
B Corominas-Murtra
C Cattuto
C Cattuto
C Goessmann
CV Damme
D Czégel
D Pumain
David Sousa-Rodrigues
DW McShea
E Mones
E Ravasz
ET Wimberley
F Floeck
FJ Brandenburg
G Ghosal
G Palla
G Tibély
G Tibély
Gergely Palla
Gergely Tibély
H Fushing
H Hirata
HW Ma
J Wickens
JI Perotti
K Juszczyszyn
L Lu
M Batty
M Fattore
M Kaiser
M Nagy
M Nagy
N Eldredge
P Heymann
P Mika
P Pollner
P Spyns
Peter Csermely
PR Krugman
Péter Pollner
R Guimerà
R Lambiotte
S Valverde
SN Dorogovtsev
V Zlatić
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

The tagging of on-line content with informative keywords is a widespread phenomenon from scientific article repositories through blogs to on-line news portals. In most of the cases, the tags on a given item are free words chosen by the authors independently. Therefore, relations among keywords in a collection of news items is unknown. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialised ones at the bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorised low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news portals

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

ELTE Digital Institutional Repository (EDIT)

FigShare

DCU 250 Arabic dependency bank: an LFG gold standard resource for the Arabic Penn treebank

Author: Akrout Amine
Al-Raheb Yafa
Dichy J.
van Genabith Josef
Publication venue
Publication date: 01/01/2006
Field of study

This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is necessary to evaluate against established gold standard resources. Gold standards for various languages have been developed, but to our knowledge, such a resource has not yet been constructed for Arabic. The construction of the DCU 250 marks the first step towards the creation of an automatic LFG f-structure annotation algorithm for the ATB, and for the extraction of Arabic grammatical and lexical resources

Irish Universities

DCU Online Research Access Service