17,349 research outputs found

    TiFi: Taxonomy Induction for Fictional Domains [Extended version]

    No full text
    Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin

    Predicting and Explaining Human Semantic Search in a Cognitive Model

    Full text link
    Recent work has attempted to characterize the structure of semantic memory and the search algorithms which, together, best approximate human patterns of search revealed in a semantic fluency task. There are a number of models that seek to capture semantic search processes over networks, but they vary in the cognitive plausibility of their implementation. Existing work has also neglected to consider the constraints that the incremental process of language acquisition must place on the structure of semantic memory. Here we present a model that incrementally updates a semantic network, with limited computational steps, and replicates many patterns found in human semantic fluency using a simple random walk. We also perform thorough analyses showing that a combination of both structural and semantic features are correlated with human performance patterns.Comment: To appear in proceedings for CMCL 201

    Semantic modelling of user interests based on cross-folksonomy analysis

    Get PDF
    The continued increase in Web usage, in particular participation in folksonomies, reveals a trend towards a more dynamic and interactive Web where individuals can organise and share resources. Tagging has emerged as the de-facto standard for the organisation of such resources, providing a versatile and reactive knowledge management mechanism that users find easy to use and understand. It is common nowadays for users to have multiple profiles in various folksonomies, thus distributing their tagging activities. In this paper, we present a method for the automatic consolidation of user profiles across two popular social networking sites, and subsequent semantic modelling of their interests utilising Wikipedia as a multi-domain model. We evaluate how much can be learned from such sites, and in which domains the knowledge acquired is focussed. Results show that far richer interest profiles can be generated for users when multiple tag-clouds are combine

    Reciprocity in Social Networks with Capacity Constraints

    Full text link
    Directed links -- representing asymmetric social ties or interactions (e.g., "follower-followee") -- arise naturally in many social networks and other complex networks, giving rise to directed graphs (or digraphs) as basic topological models for these networks. Reciprocity, defined for a digraph as the percentage of edges with a reciprocal edge, is a key metric that has been used in the literature to compare different directed networks and provide "hints" about their structural properties: for example, are reciprocal edges generated randomly by chance or are there other processes driving their generation? In this paper we study the problem of maximizing achievable reciprocity for an ensemble of digraphs with the same prescribed in- and out-degree sequences. We show that the maximum reciprocity hinges crucially on the in- and out-degree sequences, which may be intuitively interpreted as constraints on some "social capacities" of nodes and impose fundamental limits on achievable reciprocity. We show that it is NP-complete to decide the achievability of a simple upper bound on maximum reciprocity, and provide conditions for achieving it. We demonstrate that many real networks exhibit reciprocities surprisingly close to the upper bound, which implies that users in these social networks are in a sense more "social" than suggested by the empirical reciprocity alone in that they are more willing to reciprocate, subject to their "social capacity" constraints. We find some surprising linear relationships between empirical reciprocity and the bound. We also show that a particular type of small network motifs that we call 3-paths are the major source of loss in reciprocity for real networks

    Exploring The Value Of Folksonomies For Creating Semantic Metadata

    No full text
    Finding good keywords to describe resources is an on-going problem: typically we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well populated source of unstructured tags describing web resources. This paper explores the value of the folksonomy tags as potential source of keyword metadata by examining the relationship between folksonomies, community produced annotations, and keywords extracted by machines. The experiment has been carried-out in two ways: subjectively, by asking two human indexers to evaluate the quality of the generated keywords from both systems; and automatically, by measuring the percentage of overlap between the folksonomy set and machine generated keywords set. The results of this experiment show that the folksonomy tags agree more closely with the human generated keywords than those automatically generated. The results also showed that the trained indexers preferred the semantics of folksonomy tags compared to keywords extracted automatically. These results can be considered as evidence for the strong relationship of folksonomies to the human indexer’s mindset, demonstrating that folksonomies used in the del.icio.us bookmarking service are a potential source for generating semantic metadata to annotate web resources
    corecore