Search CORE

10,679 research outputs found

TiFi: Taxonomy Induction for Fictional Domains [Extended version]

Author: Chu C.
Razniewski S.
Weikum G.
Publication venue
Publication date: 01/01/2019
Field of study

Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin

MPG.PuRe

Learning and Using Taxonomies For Fast Visual Categorization

Author: Griffin Gregory
Perona Pietro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

The computational complexity of current visual categorization algorithms scales linearly at best with the number of categories. The goal of classifying simultaneously N_(cat) = 10^4 - 10^5 visual categories requires sub-linear classification costs. We explore algorithms for automatically building classification trees which have, in principle, log N_(cat) complexity. We find that a greedy algorithm that recursively splits the set of categories into the two minimally confused subsets achieves 5-20 fold speedups at a small cost in classification performance. Our approach is independent of the specific classification algorithm used. A welcome by-product of our algorithm is a very reasonable taxonomy of the Caltech-256 dataset

CiteSeerX

Crossref

Caltech Authors

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

Author: Guo Weidong
Lai Kunfeng
Lin Jinghong
Liu Bang
Niu Di
Wang Chaoyue
Xu Shunnan
Xu Yu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/05/2019
Field of study

Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201

arXiv.org e-Print Archive

Crossref

Experiments on applying relaxation labeling to map multilingual hierarchies

Author: Daude Ventura Jordi
Padró Lluís
Rigau Claramunt German
Publication venue
Publication date: 01/01/1999
Field of study

This paper explores the automatic construction of a multilingual Lexical Knowledge Base from preexisting lexical resources. This paper presents a new approach for linking already existing hierarchies. The Relaxation labeling algorithm is used to select --among all the candidate connections proposed by a bilingual dictionary-- the right conection for each node in the taxonomy.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Unsupervised learning of visual taxonomies

Author: Bart Evgeniy
Perona Pietro
Porteous Ian
Welling Max
Publication venue
Publication date: 01/01/2008
Field of study

As more images and categories become available, organizing them becomes crucial. We present a novel statistical method for organizing a collection of images into a treeshaped hierarchy. The method employs a non-parametric Bayesian model and is completely unsupervised. Each image is associated with a path through a tree. Similar images share initial segments of their paths and therefore have a smaller distance from each other. Each internal node in the hierarchy represents information that is common to images whose paths pass through that node, thus providing a compact image representation. Our experiments show that a disorganized collection of images will be organized into an intuitive taxonomy. Furthermore, we find that the taxonomy allows good image categorization and, in this respect, is superior to the popular LDA model

CiteSeerX

Crossref

Caltech Authors

Predicting Network Attacks Using Ontology-Driven Inference

Author: Ansarinia Morteza
Salahi Ahmad
Publication venue
Publication date: 01/01/2012
Field of study

Graph knowledge models and ontologies are very powerful modeling and re asoning tools. We propose an effective approach to model network attacks and attack prediction which plays important roles in security management. The goals of this study are: First we model network attacks, their prerequisites and consequences using knowledge representation methods in order to provide description logic reasoning and inference over attack domain concepts. And secondly, we propose an ontology-based system which predicts potential attacks using inference and observing information which provided by sensory inputs. We generate our ontology and evaluate corresponding methods using CAPEC, CWE, and CVE hierarchical datasets. Results from experiments show significant capability improvements comparing to traditional hierarchical and relational models. Proposed method also reduces false alarms and improves intrusion detection effectiveness.Comment: 9 page

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

Author: Amarilli Antoine
Amsterdamer Yael
Milo Tova
Publication venue
Publication date: 16/12/2013
Field of study

We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

arXiv.org e-Print Archive

CiteSeerX