159 research outputs found
Metadata Extraction from References of Different Styles
Metadata extraction is the process of describing extrinsic and intrinsic qualities of the resource such as document, image, video, including getting data from references. References form an essential part of electronic scholarly publications. A reference is the way of giving acknowledgment to individuals for their creative and intellectual works that one utilized in his or her research work. It can also be used to locate particular sources and combat plagiarism. A reference style dictates the information necessary for a reference and how the information is ordered. Accurate and automatic reference metadata generation provides scalability, interoperability and usability for digital libraries of both public and private institution and their collections. Accurate reference metadata extraction becomes an intriguing task to researchers who want to collect data of scientific publications; therefore, this research work proposes a metadata extraction from references of different styles with the use of regular expression. This work accurately extract metadata such as author, title of article, volume, year of publication and institution from references of different styles limiting it to six referencing style
Searching and Visualization of References in Research Documents
This research aims to develop a module for information retrieval that can trace references from bibliography entries of research documents, specifically those based on Bogor Agricultural University (IPB)âs writing guidelines. A total of 242 research documents in PDF from the Department of Computer Science IPB were used to generate parsing patterns to extract the bibliography entries. With modified ParaTools, automatic extraction of bibliography entries was performed on text files generated from the PDF files. The entries are stored in a database that is used to visualize author relationship as graphs. This module is supplemented by an information retrieval system based on Sphinx search system and also provides information of authorsâ publications and citations. Evaluation showed that (1) bibliography entry extraction missed only 5.37% bibliography entries caused by incorrect bibliography formatting, (2) 91.54% bibliography entry attributes could be identified correctly, and (3) 90.31% entries were successfully connected to other documents
AI EDAM special issue: advances in implemented shape grammars: solutions and applications
This paper introduces the special issue âAdvances in Implemented Shape Grammars: Solutions and Applicationsâ and frames the topic of computer implementations of shape grammars, both with a theoretical and an applied focus. This special issue focuses on the current state of the art regarding computer implementations of shape grammars and brings a discussion about how those systems can evolve in the coming years so that they can be used in real life design scenarios. This paper presents a brief state of the art of shape grammars implementation and an overview of the papers included in the current special issue categorized under technical design, interpreters and interface design, and uses cases. The paper ends with a comprehensive outlook into the future of shape grammars implementations.info:eu-repo/semantics/acceptedVersio
Embedding a Creativity Support Tool within Computer Graphics Research
We describe the Dr Inventor creativity support tool that
aims to support and even enhance the creativity of active research
scientists, by discovering un-noticed analogical similarities between
publications. The tool combines text processing, lexical analysis and
computational cognitive modelling to find comparisons with the
greatest potential for a creative impact on the system users. A multi-year corpus of publications is used to drive the creativity of the
system, with a central graph matching algorithm being adapted to
identify the best analogy between any pair of papers. Dr Inventor
has been developed for use by computer graphics researchers, with
a particular focus on publications from the SIGGRAPH conference
series and it uses this context in three main ways. Firstly, the
pragmatic context of creativity support requires the identification of
comparisons that are unlike pre-existing information. Secondly, the
suggested inferences are assessed for quality within the context of a
corpus of graphics publications. Finally, expert users from this
discipline were asked to identify the qualities of greatest concern to
them, which then guided the subsequent evaluation task
Recommended from our members
B!SON: A Tool for Open Access Journal Recommendation
Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, fundersâ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project
Recommended from our members
Language Models for Citation Classification
Authors reference academic works for a variety of reasons. As a result, not all citations in a research article have the same purpose. The need to understand and distinguish these citation purposes led to the development of automated approaches that consider semantic cues in the form of the context surrounding the citations. Identifying the semantic aspects of citations has proven valuable in various applications including research assessment, information retrieval, document summarisation, and more.
While automated citation classification has been in progress since the early 2000s, current efforts to determine citation types based on their contexts remain largely domain-specific. Besides, there is a lack of standard benchmarks for evaluating models for citation classification. Extracting valuable metadata related to the reason behind citation in scientific articles, particularly across multiple domains, is laborious and researchers still lack consensus on what should be the optimal context size for effective detection of citation function. The current methods heavily rely on the amount of annotated data used for training, making them data-centric. The emergence of self-supervised language models, which efficiently learn contextual relationships from vast unannotated datasets, has brought about substantial changes in the realm of Natural Language Processing in recent years. Despite these advancements, the few-shot predictive capability of the language models remains under-utilised in this field.
This thesis addresses the above shortcomings of citation classification. We systematically and comprehensively review the existing methodologies used by the previous works and identify the research gap and the potential future works. This meta-analysis forms the foundation for the research problems addressed in Chapters 3, 4, 5 and 6.
Initially, we introduce a novel benchmark in the form of an open shared task competition for multi-disciplinary citation classification in Chapter 3. The methods submitted to this shared task highlight the superiority of deep learning-based approaches and hinted at the importance of incorporating additional context to enhance the performance of citation classification models.
Secondly, we create a new open access feature-enriched multi-disciplinary citation classification dataset to overcome the challenges associated with extracting meta-data from both citing and cited articles in Chapter 4. The feature extraction process, utilising multiple sources and the missing meta-data values, indicates the complexities involved in extracting features for a heterogeneous dataset.
In Chapter 5, we assess domain-specific and multi-disciplinary datasets by fine-tuning them on pre-trained scientific language models, specifically exploring various fixed citation context windows. We introduce a new method for automatically extracting dynamic context windows in an unsupervised manner. Both sets of experiments emphasise the significance of additional context in citation context classification. Moreover, the experimental results also show the domain dependence of the citation context window, providing evidence for the benefit of extracting context dynamically.
Lastly, Chapter 6 presents novel prompting strategies for scientific and general-purpose language models to reduce the dependence on labelled citation classification datasets. The analysis of model performances under zero and few-shot settings reveals the effectiveness of large language models with minimal supervision, particularly when employing the newly proposed dynamic citation context-based prompting strategy
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Communityâs Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by ConselleriÌa
de Cultura, EducacioÌn e OrdenacioÌn Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank InÌigo GarciaÌ -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
- âŠ