107 research outputs found
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
BibRank: Automatic Keyphrase Extraction Platform Using~Metadata
Automatic Keyphrase Extraction involves identifying essential phrases in a
document. These keyphrases are crucial in various tasks such as document
classification, clustering, recommendation, indexing, searching, summarization,
and text simplification. This paper introduces a platform that integrates
keyphrase datasets and facilitates the evaluation of keyphrase extraction
algorithms. The platform includes BibRank, an automatic keyphrase extraction
algorithm that leverages a rich dataset obtained by parsing bibliographic data
in BibTeX format. BibRank combines innovative weighting techniques with
positional, statistical, and word co-occurrence information to extract
keyphrases from documents. The platform proves valuable for researchers and
developers seeking to enhance their keyphrase extraction algorithms and advance
the field of natural language processing.Comment: 12 pages , 4 figures, 8 table
Human-competitive automatic topic indexing
Topic indexing is the task of identifying the main topics covered by a document. These are useful for many purposes: as subject headings in libraries, as keywords in academic publications and as tags on the web. Knowing a document's topics helps people judge its relevance quickly. However, assigning topics manually is labor intensive. This thesis shows how to generate them automatically in a way that competes with human performance.
Three kinds of indexing are investigated: term assignment, a task commonly performed by librarians, who select topics from a controlled vocabulary; tagging, a popular activity of web users, who choose topics freely; and a new method of keyphrase extraction, where topics are equated to Wikipedia article names. A general two-stage algorithm is introduced that first selects candidate topics and then ranks them by significance based on their properties. These properties draw on statistical, semantic, domain-specific and encyclopedic knowledge. They are combined using a machine learning algorithm that models human indexing behavior from examples.
This approach is evaluated by comparing automatically generated topics to those assigned by professional indexers, and by amateurs. We claim that the algorithm is human-competitive because it chooses topics that are as consistent with those assigned by humans as their topics are with each other. The approach is generalizable, requires little training data and applies across different domains and languages
A User-Centered Concept Mining System for Query and Document Understanding at Tencent
Concepts embody the knowledge of the world and facilitate the cognitive
processes of human beings. Mining concepts from web documents and constructing
the corresponding taxonomy are core research problems in text understanding and
support many downstream tasks such as query analysis, knowledge base
construction, recommendation, and search. However, we argue that most prior
studies extract formal and overly general concepts from Wikipedia or static web
pages, which are not representing the user perspective. In this paper, we
describe our experience of implementing and deploying ConcepT in Tencent QQ
Browser. It discovers user-centered concepts at the right granularity
conforming to user interests, by mining a large amount of user queries and
interactive search click logs. The extracted concepts have the proper
granularity, are consistent with user language styles and are dynamically
updated. We further present our techniques to tag documents with user-centered
concepts and to construct a topic-concept-instance taxonomy, which has helped
to improve search as well as news feeds recommendation in Tencent QQ Browser.
We performed extensive offline evaluation to demonstrate that our approach
could extract concepts of higher quality compared to several other existing
methods. Our system has been deployed in Tencent QQ Browser. Results from
online A/B testing involving a large number of real users suggest that the
Impression Efficiency of feeds users increased by 6.01% after incorporating the
user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201
Ideas Matchmaking for Supporting Innovators and Entrepreneurs
Käesolevas töös esitletakse süsteemi, mis on võimeline sirvima veebist ettevõtluse ja tehnoloogiaga seotud andmeid, mida saab siduda kasutajate poolt Innovvoice platvormil välja pakutud ideedega. Selline teenus on ideabator platvormi väärtuslik osa, mis toetab ettevõtluse uuendajaid ja potentsiaalseid ettevõtjaid.In this paper we show a system able to crawl content from the Web related to entrepreneurship and technology, to be matched with ideas proposed by users in the Innovvoice platform. We argue that such a service is a valuable component of an ideabator platform, supporting innovators and possible entrepreneurs
Keyphrase Generation: A Multi-Aspect Survey
Extractive keyphrase generation research has been around since the nineties,
but the more advanced abstractive approach based on the encoder-decoder
framework and sequence-to-sequence learning has been explored only recently. In
fact, more than a dozen of abstractive methods have been proposed in the last
three years, producing meaningful keyphrases and achieving state-of-the-art
scores. In this survey, we examine various aspects of the extractive keyphrase
generation methods and focus mostly on the more recent abstractive methods that
are based on neural networks. We pay particular attention to the mechanisms
that have driven the perfection of the later. A huge collection of scientific
article metadata and the corresponding keyphrases is created and released for
the research community. We also present various keyphrase generation and text
summarization research patterns and trends of the last two decades.Comment: 10 pages, 5 tables. Published in proceedings of FRUCT 2019, the 25th
Conference of the Open Innovations Association FRUCT, Helsinki, Finlan
- …