2,591 research outputs found
A Topic Modeling Guided Approach for Semantic Knowledge Discovery in e-Commerce
The task of mining large unstructured text archives, extracting useful patterns and then organizing them into a knowledgebase has attained a great attention due to its vast array of immediate applications in business. Businesses thus demand new and efficient algorithms for leveraging potentially useful patterns from heterogeneous data sources that produce huge volumes of unstructured data. Due to the ability to bring out hidden themes from large text repositories, topic modeling algorithms attained significant attention in the recent past. This paper proposes an efficient and scalable method which is guided by topic modeling for extracting concepts and relationships from e-commerce product descriptions and organizing them into knowledgebase. Semantic graphs can be generated from such a knowledgebase on which meaning aware product discovery experience can be built for potential buyers. Extensive experiments using proposed unsupervised algorithms with e-commerce product descriptions collected from open web shows that our proposed method outperforms some of the existing methods of leveraging concepts and relationships so that efficient knowledgebase construction is possible
A User-Centered Concept Mining System for Query and Document Understanding at Tencent
Concepts embody the knowledge of the world and facilitate the cognitive
processes of human beings. Mining concepts from web documents and constructing
the corresponding taxonomy are core research problems in text understanding and
support many downstream tasks such as query analysis, knowledge base
construction, recommendation, and search. However, we argue that most prior
studies extract formal and overly general concepts from Wikipedia or static web
pages, which are not representing the user perspective. In this paper, we
describe our experience of implementing and deploying ConcepT in Tencent QQ
Browser. It discovers user-centered concepts at the right granularity
conforming to user interests, by mining a large amount of user queries and
interactive search click logs. The extracted concepts have the proper
granularity, are consistent with user language styles and are dynamically
updated. We further present our techniques to tag documents with user-centered
concepts and to construct a topic-concept-instance taxonomy, which has helped
to improve search as well as news feeds recommendation in Tencent QQ Browser.
We performed extensive offline evaluation to demonstrate that our approach
could extract concepts of higher quality compared to several other existing
methods. Our system has been deployed in Tencent QQ Browser. Results from
online A/B testing involving a large number of real users suggest that the
Impression Efficiency of feeds users increased by 6.01% after incorporating the
user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201
A literature survey of methods for analysis of subjective language
Subjective language is used to express attitudes and opinions towards things, ideas and people. While content and topic centred natural language processing is now part of everyday life, analysis of subjective aspects of natural language have until recently been largely neglected by the research community. The explosive growth of personal blogs, consumer opinion sites and social network applications in the last years, have however created increased interest in subjective language analysis. This paper provides an overview of recent research conducted in the area
Exploratory Search on Mobile Devices
The goal of this thesis is to provide a general framework (MobEx) for exploratory search especially on mobile devices. The central part is the design, implementation, and evaluation of several core modules for on-demand unsupervised information extraction well suited for exploratory search on mobile devices and creating the MobEx framework. These core processing elements, combined with a multitouch - able user interface specially designed for two families of mobile devices, i.e. smartphones and tablets, have been finally implemented in a research prototype. The initial information request, in form of a query topic description, is issued online by a user to the system. The system then retrieves web snippets by using standard search engines. These snippets are passed through a chain of NLP components which perform an ondemand or ad-hoc interactive Query Disambiguation, Named Entity Recognition, and Relation Extraction task. By on-demand or ad-hoc we mean the components are capable to perform their operations on an unrestricted open domain within special time constraints. The result of the whole process is a topic graph containing the detected associated topics as nodes and the extracted relation ships as labelled edges between the nodes. The Topic Graph is presented to the user in different ways depending on the size of the device she is using. Various evaluations have been conducted that help us to understand the potentials and limitations of the framework and the prototype
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
- …