115,198 research outputs found
Comparing the hierarchy of author given tags and repository given tags in a large document archive
Folksonomies - large databases arising from collaborative tagging of items by
independent users - are becoming an increasingly important way of categorizing
information. In these systems users can tag items with free words, resulting in
a tripartite item-tag-user network. Although there are no prescribed relations
between tags, the way users think about the different categories presumably has
some built in hierarchy, in which more special concepts are descendants of some
more general categories. Several applications would benefit from the knowledge
of this hierarchy. Here we apply a recent method to check the differences and
similarities of hierarchies resulting from tags given by independent
individuals and from tags given by a centrally managed repository system. The
results from out method showed substantial differences between the lower part
of the hierarchies, and in contrast, a relatively high similarity at the top of
the hierarchies.Comment: 10 page
Image classification by visual bag-of-words refinement and reduction
This paper presents a new framework for visual bag-of-words (BOW) refinement
and reduction to overcome the drawbacks associated with the visual BOW model
which has been widely used for image classification. Although very influential
in the literature, the traditional visual BOW model has two distinct drawbacks.
Firstly, for efficiency purposes, the visual vocabulary is commonly constructed
by directly clustering the low-level visual feature vectors extracted from
local keypoints, without considering the high-level semantics of images. That
is, the visual BOW model still suffers from the semantic gap, and thus may lead
to significant performance degradation in more challenging tasks (e.g. social
image classification). Secondly, typically thousands of visual words are
generated to obtain better performance on a relatively large image dataset. Due
to such large vocabulary size, the subsequent image classification may take
sheer amount of time. To overcome the first drawback, we develop a graph-based
method for visual BOW refinement by exploiting the tags (easy to access
although noisy) of social images. More notably, for efficient image
classification, we further reduce the refined visual BOW model to a much
smaller size through semantic spectral clustering. Extensive experimental
results show the promising performance of the proposed framework for visual BOW
refinement and reduction
Comparing the hierarchy of keywords in on-line news portals
The tagging of on-line content with informative keywords is a widespread
phenomenon from scientific article repositories through blogs to on-line news
portals. In most of the cases, the tags on a given item are free words chosen
by the authors independently. Therefore, relations among keywords in a
collection of news items is unknown. However, in most cases the topics and
concepts described by these keywords are forming a latent hierarchy, with the
more general topics and categories at the top, and more specialised ones at the
bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction
method to sets of keywords obtained from four different on-line news portals.
The resulting hierarchies show substantial differences not just in the topics
rendered as important (being at the top of the hierarchy) or of less interest
(categorised low in the hierarchy), but also in the underlying network
structure. This reveals discrepancies between the plausible keyword association
frameworks in the studied news portals
DCU 250 Arabic dependency bank: an LFG gold standard resource for the Arabic Penn treebank
This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is necessary to evaluate against established gold standard resources. Gold standards for various languages have been developed, but to our knowledge, such a resource has not yet been constructed for Arabic. The construction of the DCU 250 marks the first step
towards the creation of an automatic LFG f-structure annotation algorithm for the ATB,
and for the extraction of Arabic grammatical and lexical resources
- âŠ