Search CORE

682 research outputs found

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

Author: Hu Jianjun
Hu Jie
Li Shaobo
Yang Guanci
Yao Yong
Yu Liya
Publication venue: Scholar Commons
Publication date: 01/02/2018
Field of study

Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA) based on the distributed Skip-gram model for patent classification. We also develop a set of quantitative performance measures for keyword extraction evaluation based on information gain and cross-validation, based on Support Vector Machine (SVM) classification, which are valuable when human-annotated keywords are not available. We used a standard benchmark dataset and a homemade patent dataset to evaluate the performance of PKEA. Our patent dataset includes 2500 patents from five distinct technological fields related to autonomous cars (GPS systems, lidar systems, object recognition systems, radar systems, and vehicle control systems). We compared our method with Frequency, Term Frequency-Inverse Document Frequency (TF-IDF), TextRank and Rapid Automatic Keyword Extraction (RAKE). The experimental results show that our proposed algorithm provides a promising way to extract keywords from patent texts for patent classification

Directory of Open Access Journals

Scholar Commons - Institutional Repository of the University of South Carolina

Automatic keyword assignment system for medical research articles using nearest-neighbor searches

Author: Alpkoçak Adil
Dilmaç Fatih
Publication venue: The Scientific and Technological Research Council of Turkey (TUBITAK-ULAKBIM) - DIGITAL COMMONS JOURNALS
Publication date: 01/01/2022
Field of study

Assigning accurate keywords to research articles is increasingly important concern. Keywords should be selected meticulously to describe the article well since keywords play an important role in matching readers with research articles in order to reach a bigger audience. So, improper selection of keywords may result in less attraction to readers which results in degradation in its audience. Hence, we designed and developed an automatic keyword assignment system (AKAS) for research articles based on k-nearest neighbor (k-NN) and threshold-nearest neighbor (t-NN) accompanied with information retrieval systems (IRS), which is a corpus-based method by utilizing IRS using the Medline dataset in PubMed. First, AKAS accepts an abstract of the research article or a particular text as a query to the IRS. Next, the IRS returns a ranked list of articles to the given query. Then, we selected a set of documents from this list using two different methods, which are k-NN and t-NN representing the first k documents and documents whose similarity is greater than the threshold value of t, respectively. To evaluate our proposed system, we conducted a set of experiments on a selected subset of 458,594 PubMed articles. Then, we performed an experiment to observe the performance of AKAS results by comparing with the original keywords assigned by authors. The results we obtained showed that our system suggests keywords more than 55% match in terms of F-score. We presented both methods we used and results of experiments, in detail

Bakircay University Institutional Repository

Dokuz Eylul University Research Information System

mARC: Memory by Association and Reinforcement of Contexts

Author: Descourt Patrice
Rimoux Norbert
Publication venue
Publication date: 10/12/2013
Field of study

This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

arXiv.org e-Print Archive

CiteSeerX

Natural language processing and cognitive science : proceedings 2018

Author: Lubaszewski Wiesław
Sedes Florence
Sharp Bernadette
Publication venue: Jagiellonian Library
Publication date: 01/01/2018
Field of study

Jagiellonian Univeristy Repository

The Power of Patents: Leveraging Text Mining and Social Network Analysis to Forecast IoT Trends

Author: Kermani Mehrdad Agha Mohammadali
Khanizad Rahim
Maghsoudi Mehrdad
Nourbakhsh Reza
Publication venue
Publication date: 01/09/2023
Field of study

Technology has become an indispensable competitive tool as science and technology have progressed throughout history. Organizations can compete on an equal footing by implementing technology appropriately. Technology trends or technology lifecycles begin during the initiation phase. Finally, it reaches saturation after entering the maturity phase. As technology reaches saturation, it will be removed or replaced by another. This makes investing in technologies during this phase unjustifiable. Technology forecasting is a critical tool for research and development to determine the future direction of technology. Based on registered patents, this study examined the trends of IOT technologies. A total of 3697 patents related to the Internet of Things from the last six years of patenting have been gathered using lens.org for this purpose. The main people and companies were identified through the creation of the IOT patent registration cooperation network, and the main groups active in patent registration were identified by the community detection technique. The patents were then divided into six technology categories: Safety and Security, Information Services, Public Safety and Environment Monitoring, Collaborative Aware Systems, Smart Homes/Buildings, and Smart Grid. And their technical maturity was identified and examined using the Sigma Plot program. Based on the findings, information services technologies are in the saturation stage, while both smart homes/buildings, and smart grid technologies are in the saturation stage. Three technologies, Safety and Security, Public Safety and Environment Monitoring, and Collaborative Aware Systems are in the maturity stage

arXiv.org e-Print Archive

A tree based keyphrase extraction technique for academic literature

Author: Rabby Gollam
Publication venue
Publication date: 01/08/2019
Field of study

Automatic keyphrase extraction techniques aim to extract quality keyphrases to summarize a document at a higher level. Among the existing techniques some of them are domain-specific and require application domain knowledge, some of them are based on higher-order statistical methods and are computationally expensive, and some of them require large train data which are rare for many applications. Overcoming these issues, this thesis proposes a new unsupervised automatic keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, which is domain-independent, employs limited statistical knowledge, and requires no train data. The proposed technique also introduces a new variant of the binary tree, called KeyPhrase Extraction (KePhEx) tree to extract final keyphrases from candidate keyphrases. Depending on the candidate keyphrases the KePhEx tree structure is either expanded or shrunk or maintained. In addition, a measure, called Cohesiveness Index or CI, is derived that denotes the degree of cohesiveness of a given node with respect to the root which is used in extracting final keyphrases from a resultant tree in a flexible manner and is utilized in ranking keyphrases alongside Term Frequency. The effectiveness of the proposed technique is evaluated using an experimental evaluation on a benchmark corpus, called SemEval-2010 with total 244 train and test articles, and compared with other relevant unsupervised techniques by taking the representatives from both statistical (such as Term Frequency-Inverse Document Frequency and YAKE) and graph-based techniques (PositionRank, CollabRank (SingleRank), TopicRank, and MultipartiteRank) into account. Three evaluation metrics, namely precision, recall and F1 score are taken into consideration during the experiments. The obtained results demonstrate the improved performance of the proposed technique over other similar techniques in terms of precision, recall, and F1 scores

UMP Institutional Repository