Search CORE

23,121 research outputs found

Automatic Taxonomy Construction from Keywords via Scalable Bayesian Rose Trees

Author: Member IEEE Haixun Wang
Member IEEE Yangqiu Song
Senior Member IEEE Shixia Liu
Xueqing Liu
Publication venue
Publication date: 31/03/2020
Field of study

Abstract-In this paper, we study a challenging problem of deriving a taxonomy from a set of keyword phrases. A solution can benefit many real-world applications because i) keywords give users the flexibility and ease to characterize a specific domain; and ii) in many applications, such as online advertisements, the domain of interest is already represented by a set of keywords. However, it is impossible to create a taxonomy out of a keyword set itself. We argue that additional knowledge and context are needed. To this end, we first use a general-purpose knowledgebase and keyword search to supply the required knowledge and context. Then we develop a Bayesian approach to build a hierarchical taxonomy for a given set of keywords. We reduce the complexity of previous hierarchical clustering approaches from O(n 2 log n) to O(n log n) using a nearest-neighbor-based approximation, so that we can derive a domain-specific taxonomy from one million keyword phrases in less than an hour. Finally, we conduct comprehensive large scale experiments to show the effectiveness and efficiency of our approach. A real life example of building an insurance-related Web search query taxonomy illustrates the usefulness of our approach for specific domains

CiteSeerX

Concept Extraction and Clustering for Topic Digital Library Construction

Author: Chengzhi Zhang
Dan Wu
Publication venue
Publication date: 01/12/2008
Field of study

This paper is to introduce a new approach to build topic digital library using concept extraction and document clustering. Firstly, documents in a special domain are automatically produced by document classification approach. Then, the keywords of each document are extracted using the machine learning approach. The keywords are used to cluster the documents subset. The clustered result is the taxonomy of the subset. Lastly, the taxonomy is modified to the hierarchical structure for user navigation by manual adjustments. The topic digital library is constructed after combining the full-text retrieval and hierarchical navigation function

E-LIS

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

Author: Guo Weidong
Lai Kunfeng
Lin Jinghong
Liu Bang
Niu Di
Wang Chaoyue
Xu Shunnan
Xu Yu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/05/2019
Field of study

Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201

arXiv.org e-Print Archive

Crossref

Automatic Annotation of Images from the Practitioner Perspective

Author: Enser Peter G.B.
Lewis Paul
Sandom Christine J.
Publication venue
Publication date: 01/01/2005
Field of study

This paper describes an ongoing project which seeks to contribute to a wider understanding of the realities of bridging the semantic gap in visual image retrieval. A comprehensive survey of the means by which real image retrieval transactions are realised is being undertaken. An image taxonomy has been developed, in order to provide a framework within which account may be taken of the plurality of image types, user needs and forms of textual metadata. Significant limitations exhibited by current automatic annotation techniques are discussed, and a possible way forward using ontologically supported automatic content annotation is briefly considered as a potential means of mitigating these limitations

Southampton (e-Prints Soton)

An evaluation of pedagogically informed parameterised questions for self assessment

Author: Davis Hugh
Gilbert Lester
Sitthisak Onjira
Publication venue: 'Informa UK Limited'
Publication date: 01/09/2008
Field of study

Self-assessment is a crucial component of learning. Learners can learn by asking themselves questions and attempting to answer them. However, creating effective questions is time-consuming because it may require considerable resources and the skill of critical thinking. Questions need careful construction to accurately represent the intended learning outcome and the subject matter involved. There are very few systems currently available which generate questions automatically, and these are confined to specific domains. This paper presents a system for automatically generating questions from a competency framework, based on a sound pedagogical and technological approach. This makes it possible to guide learners in developing questions for themselves, and to provide authoring templates which speed the creation of new questions for self-assessment. This novel design and implementation involves an ontological database that represents the intended learning outcome to be assessed across a number of dimensions, including level of cognitive ability and subject matter. The system generates a list of all the questions that are possible from a given learning outcome, which may then be used to test for understanding, and so could determine the degree to which learners actually acquire the desired knowledge. The way in which the system has been designed and evaluated is discussed, along with its educational benefits

Southampton (e-Prints Soton)

Exploring Maintainability Assurance Research for Service- and Microservice-Based Systems: Directions and Differences

Author: Bogner Justus
Wagner Stefan
Weller Adrian
Zimmermann Alfred
Publication venue: OASIcs - OpenAccess Series in Informatics. Joint Post-proceedings of the First and Second International Conference on Microservices (Microservices 2017/2019)
Publication date: 01/01/2020
Field of study

To ensure sustainable software maintenance and evolution, a diverse set of activities and concepts like metrics, change impact analysis, or antipattern detection can be used. Special maintainability assurance techniques have been proposed for service- and microservice-based systems, but it is difficult to get a comprehensive overview of this publication landscape. We therefore conducted a systematic literature review (SLR) to collect and categorize maintainability assurance approaches for service-oriented architecture (SOA) and microservices. Our search strategy led to the selection of 223 primary studies from 2007 to 2018 which we categorized with a threefold taxonomy: a) architectural (SOA, microservices, both), b) methodical (method or contribution of the study), and c) thematic (maintainability assurance subfield). We discuss the distribution among these categories and present different research directions as well as exemplary studies per thematic category. The primary finding of our SLR is that, while very few approaches have been suggested for microservices so far (24 of 223, ?11%), we identified several thematic categories where existing SOA techniques could be adapted for the maintainability assurance of microservices

Dagstuhl Research Online Publication Server