929 research outputs found
An analysis of the semantic shifts of citations
The semantic shifts in natural language is a well established phenomenon and have been
studied for many years. Similarly, the meanings of scientific publications may also change as
time goes by. In other words, the same publication may be cited in distinct contexts. To
investigate whether the meanings of citations have changed in different scenarios, which is also
called in the semantic shifts in citations, we followed the same ideas of how researchers studied
semantic shifts in language. To be more specific, we combined the temporal referencing model
and the Word2Vec model to explore the semantic shifts of scientific citations in two aspects:
their usages over time and their usages across different domains. By observing how citations
themselves changed over time and comparing the closest neighbors of citations, we concluded
that the semantics of scientific publications did shift in terms of cosine distances
The impact of corporate governance on default risk: BERTopic literature review
This study utilizes the BERTopic methodology, a topic modelling tool that facilitates a meticulous exploration of existing literature, to comprehensively review the interplay between corporate governance and default risk. Through analysis of diverse empirical studies, it delves into understanding how corporate governance practices influence default probability. The study underscores the importance of effective governance mechanisms — board attributes, ownership structures, executive compensation, shareholder rights, and disclosure practices — in molding default probabilities. It also highlights the role of external governance mechanisms and regulatory frameworks in managing default risk. Notably, this research advocates for further investigation into emerging governance models and their integration with modern machine-learning techniques to amplify their impact
Textual Analysis of ICALEPCS and IPAC Conference Proceedings: Revealing Research Trends, Topics, and Collaborations for Future Insights and Advanced Search
In this paper, we show a textual analysis of past ICALEPCS and IPAC
conference proceedings to gain insights into the research trends and topics
discussed in the field. We use natural language processing techniques to
extract meaningful information from the abstracts and papers of past conference
proceedings. We extract topics to visualize and identify trends, analyze their
evolution to identify emerging research directions, and highlight interesting
publications based solely on their content with an analysis of their network.
Additionally, we will provide an advanced search tool to better search the
existing papers to prevent duplication and easier reference findings. Our
analysis provides a comprehensive overview of the research landscape in the
field and helps researchers and practitioners to better understand the
state-of-the-art and identify areas for future research
Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning
This paper presents a hierarchical classification system that automatically
categorizes a scholarly publication using its abstract into a three-tier
hierarchical label set (discipline, field, subfield) in a multi-class setting.
This system enables a holistic categorization of research activities in the
mentioned hierarchy in terms of knowledge production through articles and
impact through citations, permitting those activities to fall into multiple
categories. The classification system distinguishes 44 disciplines, 718 fields
and 1,485 subfields among 160 million abstract snippets in Microsoft Academic
Graph (version 2018-05-17). We used batch training in a modularized and
distributed fashion to address and allow for interdisciplinary and interfield
classifications in single-label and multi-label settings. In total, we have
conducted 3,140 experiments in all considered models (Convolutional Neural
Networks, Recurrent Neural Networks, Transformers). The classification accuracy
is > 90% in 77.13% and 78.19% of the single-label and multi-label
classifications, respectively. We examine the advantages of our classification
by its ability to better align research texts and output with disciplines, to
adequately classify them in an automated way, and to capture the degree of
interdisciplinarity. The proposed system (a set of pre-trained models) can
serve as a backbone to an interactive system for indexing scientific
publications in the future.Comment: Under review in QS
Know an Emotion by the Company It Keeps: Word Embeddings from Reddit/Coronavirus
Social media is a crucial communication tool (e.g., with 430 million monthly active users in online forums such as Reddit), being an objective of Natural Language Processing (NLP) techniques. One of them (word embeddings) is based on the quotation, “You shall know a word by the company it keeps,” highlighting the importance of context in NLP. Meanwhile, “Context is everything in Emotion Research.” Therefore, we aimed to train a model (W2V) for generating word associations (also known as embeddings) using a popular Coronavirus Reddit forum, validate them using public evidence and apply them to the discovery of context for specific emotions previously reported as related to psychological resilience. We used Pushshiftr, quanteda, broom, wordVectors, and superheat R packages. We collected all 374,421 posts submitted by 104,351 users to Reddit/Coronavirus forum between January 2020 and July 2021. W2V identified 64 terms representing the context for seven positive emotions (gratitude, compassion, love, relief, hope, calm, and admiration) and 52 terms for seven negative emotions (anger, loneliness, boredom, fear, anxiety, confusion, sadness) all from valid experienced situations. We clustered them visually, highlighting contextual similarity. Although trained on a “small” dataset, W2V can be used for context discovery to expand on concepts such as psychological resilience
A Bibliometric Review of Large Language Models Research from 2017 to 2023
Large language models (LLMs) are a class of language models that have
demonstrated outstanding performance across a range of natural language
processing (NLP) tasks and have become a highly sought-after research area,
because of their ability to generate human-like language and their potential to
revolutionize science and technology. In this study, we conduct bibliometric
and discourse analyses of scholarly literature on LLMs. Synthesizing over 5,000
publications, this paper serves as a roadmap for researchers, practitioners,
and policymakers to navigate the current landscape of LLMs research. We present
the research trends from 2017 to early 2023, identifying patterns in research
paradigms and collaborations. We start with analyzing the core algorithm
developments and NLP tasks that are fundamental in LLMs research. We then
investigate the applications of LLMs in various fields and domains including
medicine, engineering, social science, and humanities. Our review also reveals
the dynamic, fast-paced evolution of LLMs research. Overall, this paper offers
valuable insights into the current state, impact, and potential of LLMs
research and its applications.Comment: 36 pages, 9 figures, and 4 table
- …