1,406 research outputs found
Rules for Inducing Hierarchies from Social Tagging Data
Automatic generation of hierarchies from social tags is a challenging task. We identified three rules, set inclusion, graph centrality and information-theoretic condition from the literature and proposed two new rules, fuzzy set inclusion and probabilistic association to induce hierarchical relations. We proposed an hierarchy generation algorithm, which can incorporate each rule with different data representations, i.e., resource and Probabilistic Topic Model based representations. The learned hierarchies were compared to some of the widely used reference concept hierarchies. We found that probabilistic association and set inclusion based rules helped produce better quality hierarchies according to the evaluation metrics
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata
Many social Web sites allow users to annotate the content with descriptive
metadata, such as tags, and more recently to organize content hierarchically.
These types of structured metadata provide valuable evidence for learning how a
community organizes knowledge. For instance, we can aggregate many personal
hierarchies into a common taxonomy, also known as a folksonomy, that will aid
users in visualizing and browsing social content, and also to help them in
organizing their own content. However, learning from social metadata presents
several challenges, since it is sparse, shallow, ambiguous, noisy, and
inconsistent. We describe an approach to folksonomy learning based on
relational clustering, which exploits structured metadata contained in personal
hierarchies. Our approach clusters similar hierarchies using their structure
and tag statistics, then incrementally weaves them into a deeper, bushier tree.
We study folksonomy learning using social metadata extracted from the
photo-sharing site Flickr, and demonstrate that the proposed approach addresses
the challenges. Moreover, comparing to previous work, the approach produces
larger, more accurate folksonomies, and in addition, scales better.Comment: 10 pages, To appear in the Proceedings of ACM SIGKDD Conference on
Knowledge Discovery and Data Mining(KDD) 201
Tagging, Folksonomy & Co - Renaissance of Manual Indexing?
This paper gives an overview of current trends in manual indexing on the Web.
Along with a general rise of user generated content there are more and more
tagging systems that allow users to annotate digital resources with tags
(keywords) and share their annotations with other users. Tagging is frequently
seen in contrast to traditional knowledge organization systems or as something
completely new. This paper shows that tagging should better be seen as a
popular form of manual indexing on the Web. Difference between controlled and
free indexing blurs with sufficient feedback mechanisms. A revised typology of
tagging systems is presented that includes different user roles and knowledge
organization systems with hierarchical relationships and vocabulary control. A
detailed bibliography of current research in collaborative tagging is included.Comment: Preprint. 12 pages, 1 figure, 54 reference
Extracting tag hierarchies
Tagging items with descriptive annotations or keywords is a very natural way
to compress and highlight information about the properties of the given entity.
Over the years several methods have been proposed for extracting a hierarchy
between the tags for systems with a "flat", egalitarian organization of the
tags, which is very common when the tags correspond to free words given by
numerous independent people. Here we present a complete framework for automated
tag hierarchy extraction based on tag occurrence statistics. Along with
proposing new algorithms, we are also introducing different quality measures
enabling the detailed comparison of competing approaches from different
aspects. Furthermore, we set up a synthetic, computer generated benchmark
providing a versatile tool for testing, with a couple of tunable parameters
capable of generating a wide range of test beds. Beside the computer generated
input we also use real data in our studies, including a biological example with
a pre-defined hierarchy between the tags. The encouraging similarity between
the pre-defined and reconstructed hierarchy, as well as the seemingly
meaningful hierarchies obtained for other real systems indicate that tag
hierarchy extraction is a very promising direction for further research with a
great potential for practical applications.Comment: 25 pages with 21 pages of supporting information, 25 figure
The horse before the cart: improving the accuracy of taxonomic directions when building tag hierarchies
Content on the Web is huge and constantly growing, and building taxonomies for such content can help with navigation and organisation, but building taxonomies manually is costly and time-consuming. An alternative is to allow users to construct folksonomies: collective social classifications. Yet, folksonomies are inconsistent and their use for searching and browsing is limited. Approaches have been suggested for acquiring implicit hierarchical structures from folksonomies, however, but these approaches suffer from the ‘popularity-generality’ problem, in that popularity is assumed to be a proxy for generality, i.e. high-level taxonomic terms will occur more often than low-level ones. To tackle this problem, we propose in this paper an improved approach. It is based on the Heymann–Benz algorithm, and works by checking the taxonomic directions against a corpus of text. Our results show that popularity works as a proxy for generality in at most 90.91% of cases, but this can be improved to 95.45% using our approach, which should translate to higher-quality tag hierarchy structure
Knowledge Base Enrichment by Relation Learning from Social Tagging Data
There has been considerable interest in transforming unstructured social tagging data into structured knowledge for semantic-based retrieval and recommendation. Research in this line mostly exploits data co-occurrence and often overlooks the complex and ambiguous meanings of tags. Furthermore, there have been few comprehensive evaluation studies regarding the quality of the discovered knowledge. We propose a supervised learning method to discover subsumption relations from tags. The key to this method is quantifying the probabilistic association among tags to better characterise their relations. We further develop an algorithm to organise tags into hierarchies based on the learned relations. Experiments were conducted using a large, publicly available dataset, Bibsonomy, and three popular, human-engineered or data-driven knowledge bases: DBpedia, Microsoft Concept Graph, and ACM Computing Classification System. We performed a comprehensive evaluation using different strategies: relation-level, ontology-level, and knowledge base enrichment based evaluation. The results clearly show that the proposed method can extract knowledge of better quality than the existing methods against the gold standard knowledge bases. The proposed approach can also enrich knowledge bases with new subsumption relations, having the potential to significantly reduce time and human effort for knowledge base maintenance and ontology evolution
Hierarchical networks of scientific journals
Academic journals are the repositories of mankind’s gradually
accumulating
knowledge of the surrounding world. Just as knowledge is
organized into classes ranging from
major disciplines, subjects and fields, to increasingly specific
topics, journals can also be
categorized into groups using various metric. In addition, they
can be ranked according to
their overall influence. However, according to recent studies,
the impact, prestige and novelty
of journals cannot be characterized by a single parameter such
as, for example, the impact
factor. To increase understanding of journal impact, the
knowledge gap we set out to explore
in our study is the evaluation of journal relevance using
complex multi-dimensional measures.
Thus, for the first time, our objective is to organize journals
into multiple hierarchies based on
citation data. The two approaches we use are designed to address
this problem from different
perspectives. We use a measure related to the notion of m-
reaching centrality and find a
network that shows a journal’s level of influence in terms of
the direction and efficiency with
which information spreads through the network. We find we can
also obtain an alternative
network using a suitably modified nested hierarchy extraction
method applied to the
same data. In this case, in a self-organized way, the journals
become branches according to
the major scientific fields, where the local structure of the
branches reflect the hierarchy
within the given field, with usually the most prominent journal
(according to other measures)
in the field chosen by the algorithm as the local root, and more
specialized journals positioned
deeper in the branch. This can make the navigation within
different scientific fields and sub-
fields very simple, and equivalent to navigating in the
different branches of the nested
hierarchy. We expect this to be particularly helpful, for
example, when choosing the most
appropriate journal for a given manuscript. According to our
results, the two alternative
hierarchies show a somewhat different, but also consistent,
picture of the intricate relations
between scientific journals, and, as such, they also provide a
new perspective on how
scientific knowledge is organized into networks
- …