Search CORE

13 research outputs found

High-Precision Extraction of Emerging Concepts from Scientific Literature

Author: Devlin Jacob
Goodfellow Ian
He Xiangnan
Jo Yookyung
Mesbah Sepideh
Mihalcea Rada
Peters Matthew E
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/06/2020
Field of study

Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual identification cannot keep up with the torrent of new publications, while the precision of existing automatic techniques is too low for many applications. We present an unsupervised concept extraction method for scientific literature that achieves much higher precision than previous work. Our approach relies on a simple but novel intuition: each scientific concept is likely to be introduced or popularized by a single paper that is disproportionately cited by subsequent papers mentioning the concept. From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15,000 extractions. To stimulate research in this area, we release our code and data (https://github.com/allenai/ForeCite).Comment: Accepted to SIGIR 202

arXiv.org e-Print Archive

Crossref

RaKUn: Rank-based Keyword extraction via Unsupervised learning and Meta vertex aggregation

Author: A Spitz
B Škrlj
H Cai
KI Goh
M Jin
O Medelyan
P Doruker
R Campos
R Campos
S Beliga
SR El-Beltagy
Stuart Rose
TD Nguyen
U Brandes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/11/2019
Field of study

Keyword extraction is used for summarizing the content of a document and supports efficient document retrieval, and is as such an indispensable part of modern text-based systems. We explore how load centrality, a graph-theoretic measure applied to graphs derived from a given text can be used to efficiently identify and rank keywords. Introducing meta vertices (aggregates of existing vertices) and systematic redundancy filters, the proposed method performs on par with state-of-the-art for the keyword extraction task on 14 diverse datasets. The proposed method is unsupervised, interpretable and can also be used for document visualization.Comment: The final authenticated publication is available online at https://doi.org/10.1007/978-3-030-31372-2_2

arXiv.org e-Print Archive

Crossref

Exploring Technical Phrase Frames from Research Paper Titles

Author: Masada Tomonari
Win Yuzana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/04/2015
Field of study

This paper proposes a method for exploring technical phrase frames by extracting word n-grams that match our information needs and interests from research paper titles. Technical phrase frames, the outcome of our method, are phrases with wildcards that may be substituted for any technical term. Our method, first of all, extracts word trigrams from research paper titles and constructs a co-occurrence graph of the trigrams. Even by simply applying Page Rank algorithm to the co-occurrence graph, we obtain the trigrams that can be regarded as technical key phrases at the higher ranks in terms of Page Rank score. In contrast, our method assigns weights to the edges of the co-occurrence graph based on Jaccard similarity between trigrams and then apply weighted Page Rank algorithm. Consequently, we obtain widely different but more interesting results. While the top-ranked trigrams obtained by unweighted Page Rank have just a self-contained meaning, those obtained by our method are technical phrase frames, i.e., A word sequence that forms a complete technical phrase only after putting a technical word (or words) before or/and after it. We claim that our method is a useful tool for discovering important phrase logical patterns, which can expand query keywords for improving information retrieval performance and can also work as candidate phrasings in technical writing to make our research papers attractive.29th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015; Gwangju; South Korea; 25 March 2015 through 27 March 201

Crossref

Nagasaki University's Academic Output SITE: NAOSITE

Institutional Repositories DataBase (IRDB)

Nagasaki university's Academic Output SITE

Reducing the dependency of having prior domain knowledge for effective online information retrieval

Author: De Raffaele C.
De Raffaele C.
Smith S.
Smith S.
Windridge D.
Windridge D.
Zammit O.
Zammit O.
Publication venue: Wiley
Publication date: 01/01/2023
Field of study

Sometimes Internet users struggle to find what they are looking for on the Internet due to information overload. Search engines intend to identify documents related to a given keyphrase on the Internet and provide suggestions. Having some background knowledge about a topic or a domain will help in building effective search keyphrases that will lead to accurate results in information retrieval. This is further pronounced among students that rely on the internet to learn about a new topic. Students might not have the required background knowledge to build effective keyphrases and find what they are looking for. In this research, we are addressing this problem, and aim to help students find relevant information online. This research furthers existing literature by enhancing information retrieval frameworks through keyphrase assignment, aiming to expose students to new terminologies, therefore reducing the dependency of having background knowledge about the domain under study. We evaluated this framework and identified how it can be enhanced to suggest more effective search keyphrases. Our proposed suggestion is to introduce a keyphrase Ranking Mechanism that will improve the keyphrase assignment part of the framework by taking into consideration the part-of-speech of the generated keyphrases. To evaluate the proposed approach, various data sets were downloaded and processed. The results obtained showed that our proposed approach produces more effective keyphrases than the existing framework

Middlesex University Research Repository

Keyphrase Generation: A Multi-Aspect Survey

Author: azzam
bahdanau
barzilay
boudin
bougouin
chen
chen
dahlmeier
dauphin
david
fukumoto
gollapalli
keneshloo
kim
lin
liu
mani
medelyan
mihalcea
nart
nguyen
papieni
paulus
quinlan
quinlan
sutskever
turney
wan
wang
wu
yuan
zajac
çano
çano
çano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/10/2019
Field of study

Extractive keyphrase generation research has been around since the nineties, but the more advanced abstractive approach based on the encoder-decoder framework and sequence-to-sequence learning has been explored only recently. In fact, more than a dozen of abstractive methods have been proposed in the last three years, producing meaningful keyphrases and achieving state-of-the-art scores. In this survey, we examine various aspects of the extractive keyphrase generation methods and focus mostly on the more recent abstractive methods that are based on neural networks. We pay particular attention to the mechanisms that have driven the perfection of the later. A huge collection of scientific article metadata and the corresponding keyphrases is created and released for the research community. We also present various keyphrase generation and text summarization research patterns and trends of the last two decades.Comment: 10 pages, 5 tables. Published in proceedings of FRUCT 2019, the 25th Conference of the Open Innovations Association FRUCT, Helsinki, Finlan

arXiv.org e-Print Archive

Crossref

Keyphrases Concentrated Area Identification from Academic Articles as Feature of Keyphrase Extraction: A New Unsupervised Approach

Author: Azad Md Saiful
Mohammad Badrul Alam Miah
Rahman Md Mustafizur
Suryanti Awang
Publication venue: The Science and Information (SAI) Organization Limited
Publication date: 01/01/2022
Field of study

The extraction of high-quality keywords and sum-marising documents at a high level has become more difficult in current research due to technological advancements and the expo-nential expansion of textual data and digital sources. Extracting high-quality keywords and summarising the documents at a high-level need to use features for the keyphrase extraction, becoming more popular. A new unsupervised keyphrase concentrated area (KCA) identification approach is proposed in this study as a feature of keyphrase extraction: corpus, domain and language independent; document length-free; utilized by both supervised and unsupervised techniques. In the proposed system, there are three phases: data pre-processing, data processing, and KCA identification. The system employs various text pre-processing methods before transferring the acquired datasets to the data processing step. The pre-processed data is subsequently used during the data processing step. The statistical approaches, curve plotting, and curve fitting technique are applied in the KCA identification step. The proposed system is then tested and evaluated using benchmark datasets collected from various sources. To demonstrate our proposed approach’s effectiveness, merits, and significance, we compared it with other proposed techniques. The experimental results on eleven (11) datasets show that the proposed approach effectively recognizes the KCA from articles as well as significantly enhances the current keyphrase extraction methods based on various text sizes, languages, and domains

UMP Institutional Repository