694 research outputs found
AN EVOLUTIONARY APPROACH TO BIBLIOGRAPHIC CLASSIFICATION
This dissertation is research in the domain of information science and specifically, the organization and representation of information. The research has implications for classification of scientific books, especially as dissemination of information becomes more rapid and science becomes more diverse due to increases in multi-, inter-, trans-disciplinary research, which focus on phenomena, in contrast to traditional library classification schemes based on disciplines.The literature review indicates 1) human socio-cultural groups have many of the same properties as biological species, 2) output from human socio-cultural groups can be and has been the subject of evolutionary relationship analyses (i.e., phylogenetics), 3) library and information science theorists believe the most favorable and scientific classification for information packages is one based on common origin, but 4) library and information science classification researchers have not demonstrated a book classification based on evolutionary relationships of common origin.The research project supports the assertion that a sensible book classification method can be developed using a contemporary biological classification approach based on common origin, which has not been applied to a collection of books until now. Using a sample from a collection of earth-science digitized books, the method developed includes a text-mining step to extract important terms, which were converted into a dataset for input into the second step—the phylogenetic analysis. Three classification trees were produced and are discussed. Parsimony analysis, in contrast to distance and likelihood analyses, produced a sensible book classification tree. Also included is a comparison with a classification tree based on a well-known contemporary library classification scheme (the Library of Congress Classification).Final discussions connect this research with knowledge organization and information retrieval, information needs beyond science, and this type of research in context of a unified science of cultural evolution
BibRank: Automatic Keyphrase Extraction Platform Using~Metadata
Automatic Keyphrase Extraction involves identifying essential phrases in a
document. These keyphrases are crucial in various tasks such as document
classification, clustering, recommendation, indexing, searching, summarization,
and text simplification. This paper introduces a platform that integrates
keyphrase datasets and facilitates the evaluation of keyphrase extraction
algorithms. The platform includes BibRank, an automatic keyphrase extraction
algorithm that leverages a rich dataset obtained by parsing bibliographic data
in BibTeX format. BibRank combines innovative weighting techniques with
positional, statistical, and word co-occurrence information to extract
keyphrases from documents. The platform proves valuable for researchers and
developers seeking to enhance their keyphrase extraction algorithms and advance
the field of natural language processing.Comment: 12 pages , 4 figures, 8 table
The ACL OCL Corpus: advancing Open science in Computational Linguistics
We present a scholarly corpus from the ACL Anthology to assist Open
scientific research in the Computational Linguistics domain, named as ACL OCL.
Compared with previous ARC and AAN versions, ACL OCL includes structured
full-texts with logical sections, references to figures, and links to a large
knowledge resource (semantic scholar). ACL OCL contains 74k scientific papers,
together with 210k figures extracted up to September 2022. To observe the
development in the computational linguistics domain, we detect the topics of
all OCL papers with a supervised neural model. We observe ''Syntax: Tagging,
Chunking and Parsing'' topic is significantly shrinking and ''Natural Language
Generation'' is resurging. Our dataset is open and available to download from
HuggingFace in https://huggingface.co/datasets/ACL-OCL/ACL-OCL-Corpus
A knowledge graph embeddings based approach for author name disambiguation using literals
Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively
A knowledge graph embeddings based approach for author name disambiguation using literals
Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively
Recommended from our members
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods
- …