17,349 research outputs found
TiFi: Taxonomy Induction for Fictional Domains [Extended version]
Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin
Predicting and Explaining Human Semantic Search in a Cognitive Model
Recent work has attempted to characterize the structure of semantic memory
and the search algorithms which, together, best approximate human patterns of
search revealed in a semantic fluency task. There are a number of models that
seek to capture semantic search processes over networks, but they vary in the
cognitive plausibility of their implementation. Existing work has also
neglected to consider the constraints that the incremental process of language
acquisition must place on the structure of semantic memory. Here we present a
model that incrementally updates a semantic network, with limited computational
steps, and replicates many patterns found in human semantic fluency using a
simple random walk. We also perform thorough analyses showing that a
combination of both structural and semantic features are correlated with human
performance patterns.Comment: To appear in proceedings for CMCL 201
Semantic modelling of user interests based on cross-folksonomy analysis
The continued increase in Web usage, in particular participation in folksonomies, reveals a trend towards a more dynamic and interactive Web where individuals can organise and share resources. Tagging has emerged as the de-facto standard for the organisation of such resources, providing a versatile and reactive knowledge management mechanism that users find easy to use and understand. It is common nowadays for users to have multiple profiles in various folksonomies, thus distributing their tagging activities. In this paper, we present a method for the automatic consolidation of user profiles across two popular social networking sites, and subsequent semantic modelling of their interests utilising Wikipedia as a multi-domain model. We evaluate how much can be learned from such sites, and in which domains the knowledge acquired is focussed. Results show that far richer interest profiles can be generated for users when multiple tag-clouds are combine
Reciprocity in Social Networks with Capacity Constraints
Directed links -- representing asymmetric social ties or interactions (e.g.,
"follower-followee") -- arise naturally in many social networks and other
complex networks, giving rise to directed graphs (or digraphs) as basic
topological models for these networks. Reciprocity, defined for a digraph as
the percentage of edges with a reciprocal edge, is a key metric that has been
used in the literature to compare different directed networks and provide
"hints" about their structural properties: for example, are reciprocal edges
generated randomly by chance or are there other processes driving their
generation? In this paper we study the problem of maximizing achievable
reciprocity for an ensemble of digraphs with the same prescribed in- and
out-degree sequences. We show that the maximum reciprocity hinges crucially on
the in- and out-degree sequences, which may be intuitively interpreted as
constraints on some "social capacities" of nodes and impose fundamental limits
on achievable reciprocity. We show that it is NP-complete to decide the
achievability of a simple upper bound on maximum reciprocity, and provide
conditions for achieving it. We demonstrate that many real networks exhibit
reciprocities surprisingly close to the upper bound, which implies that users
in these social networks are in a sense more "social" than suggested by the
empirical reciprocity alone in that they are more willing to reciprocate,
subject to their "social capacity" constraints. We find some surprising linear
relationships between empirical reciprocity and the bound. We also show that a
particular type of small network motifs that we call 3-paths are the major
source of loss in reciprocity for real networks
Exploring The Value Of Folksonomies For Creating Semantic Metadata
Finding good keywords to describe resources is an on-going problem: typically we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well populated source of unstructured tags describing web resources. This paper explores the value of the folksonomy tags as potential source of keyword metadata by examining the relationship between folksonomies, community produced annotations, and keywords extracted by machines. The experiment has been carried-out in two ways: subjectively, by asking two human indexers to evaluate the quality of the generated keywords from both systems; and automatically, by measuring the percentage of overlap between the folksonomy set and machine generated keywords set. The results of this experiment show that the folksonomy tags agree more closely with the human generated keywords than those automatically generated. The results also showed that the trained indexers preferred the semantics of folksonomy tags compared to keywords extracted automatically. These results can be considered as evidence for the strong relationship of folksonomies to the human indexer’s mindset, demonstrating that folksonomies used in the del.icio.us bookmarking service are a potential source for generating semantic metadata to annotate web resources
- …