Search CORE

929 research outputs found

An analysis of the semantic shifts of citations

Author: Xue Jiayue
Publication venue: Helsingfors universitet
Publication date: 01/01/2021
Field of study

The semantic shifts in natural language is a well established phenomenon and have been studied for many years. Similarly, the meanings of scientific publications may also change as time goes by. In other words, the same publication may be cited in distinct contexts. To investigate whether the meanings of citations have changed in different scenarios, which is also called in the semantic shifts in citations, we followed the same ideas of how researchers studied semantic shifts in language. To be more specific, we combined the temporal referencing model and the Word2Vec model to explore the semantic shifts of scientific citations in two aspects: their usages over time and their usages across different domains. By observing how citations themselves changed over time and comparing the closest neighbors of citations, we concluded that the semantics of scientific publications did shift in terms of cosine distances

Helsingin yliopiston digitaalinen arkisto

The impact of corporate governance on default risk: BERTopic literature review

Author: Colantoni Federico
Publication venue
Publication date: 01/01/2023
Field of study

This study utilizes the BERTopic methodology, a topic modelling tool that facilitates a meticulous exploration of existing literature, to comprehensively review the interplay between corporate governance and default risk. Through analysis of diverse empirical studies, it delves into understanding how corporate governance practices influence default probability. The study underscores the importance of effective governance mechanisms — board attributes, ownership structures, executive compensation, shareholder rights, and disclosure practices — in molding default probabilities. It also highlights the role of external governance mechanisms and regulatory frameworks in managing default risk. Notably, this research advocates for further investigation into emerging governance models and their integration with modern machine-learning techniques to amplify their impact

Archivio istituzionale della Ricerca - Bocconi

Textual Analysis of ICALEPCS and IPAC Conference Proceedings: Revealing Research Trends, Topics, and Collaborations for Future Insights and Advanced Search

Author: Eichler Annika
Sulc Antonin
Wilksen Tim
Publication venue
Publication date: 13/10/2023
Field of study

In this paper, we show a textual analysis of past ICALEPCS and IPAC conference proceedings to gain insights into the research trends and topics discussed in the field. We use natural language processing techniques to extract meaningful information from the abstracts and papers of past conference proceedings. We extract topics to visualize and identify trends, analyze their evolution to identify emerging research directions, and highlight interesting publications based solely on their content with an analysis of their network. Additionally, we will provide an advanced search tool to better search the existing papers to prevent duplication and easier reference findings. Our analysis provides a comprehensive overview of the research landscape in the field and helps researchers and practitioners to better understand the state-of-the-art and identify areas for future research

arXiv.org e-Print Archive

Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning

Author: Egger Peter H.
Rao Susie Xi
Zhang Ce
Publication venue
Publication date: 11/07/2023
Field of study

This paper presents a hierarchical classification system that automatically categorizes a scholarly publication using its abstract into a three-tier hierarchical label set (discipline, field, subfield) in a multi-class setting. This system enables a holistic categorization of research activities in the mentioned hierarchy in terms of knowledge production through articles and impact through citations, permitting those activities to fall into multiple categories. The classification system distinguishes 44 disciplines, 718 fields and 1,485 subfields among 160 million abstract snippets in Microsoft Academic Graph (version 2018-05-17). We used batch training in a modularized and distributed fashion to address and allow for interdisciplinary and interfield classifications in single-label and multi-label settings. In total, we have conducted 3,140 experiments in all considered models (Convolutional Neural Networks, Recurrent Neural Networks, Transformers). The classification accuracy is > 90% in 77.13% and 78.19% of the single-label and multi-label classifications, respectively. We examine the advantages of our classification by its ability to better align research texts and output with disciplines, to adequately classify them in an automated way, and to capture the degree of interdisciplinarity. The proposed system (a set of pre-trained models) can serve as a backbone to an interactive system for indexing scientific publications in the future.Comment: Under review in QS

arXiv.org e-Print Archive

Studying fake news spreading, polarisation dynamics, and manipulation by bots: A tale of networks and language

Author: Ruffo Giancarlo
Semeraro Alfonso
Publication venue
Publication date: 01/01/2023
Field of study

Institutional Research Information System University of Turin

Know an Emotion by the Company It Keeps: Word Embeddings from Reddit/Coronavirus

Author: Cisek Katryna
Frey Dietmar
García-Rudolph Alejandro
Kelleher John
Opisso Eloy
Sanchez-Pinsach David
Publication venue: Technological University Dublin
Publication date: 01/01/2023
Field of study

Social media is a crucial communication tool (e.g., with 430 million monthly active users in online forums such as Reddit), being an objective of Natural Language Processing (NLP) techniques. One of them (word embeddings) is based on the quotation, “You shall know a word by the company it keeps,” highlighting the importance of context in NLP. Meanwhile, “Context is everything in Emotion Research.” Therefore, we aimed to train a model (W2V) for generating word associations (also known as embeddings) using a popular Coronavirus Reddit forum, validate them using public evidence and apply them to the discovery of context for specific emotions previously reported as related to psychological resilience. We used Pushshiftr, quanteda, broom, wordVectors, and superheat R packages. We collected all 374,421 posts submitted by 104,351 users to Reddit/Coronavirus forum between January 2020 and July 2021. W2V identified 64 terms representing the context for seven positive emotions (gratitude, compassion, love, relief, hope, calm, and admiration) and 52 terms for seven negative emotions (anger, loneliness, boredom, fear, anxiety, confusion, sadness) all from valid experienced situations. We clustered them visually, highlighting contextual similarity. Although trained on a “small” dataset, W2V can be used for context discovery to expand on concepts such as psychological resilience

Arrow@TUDublin

Studying fake news spreading, polarisation dynamics, and manipulation by bots: A tale of networks and language

Author: Giachanou Anastasia
Rosso Paolo
Ruffo Giancarlo
Semeraro Alfonso
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

A Bibliometric Review of Large Language Models Research from 2017 to 2023

Author: Fan Lizhou
Hemphill Libby
Lee Sanggyu
Li Lingyao
Ma Zihui
Yu Huizi
Publication venue
Publication date: 03/04/2023
Field of study

Large language models (LLMs) are a class of language models that have demonstrated outstanding performance across a range of natural language processing (NLP) tasks and have become a highly sought-after research area, because of their ability to generate human-like language and their potential to revolutionize science and technology. In this study, we conduct bibliometric and discourse analyses of scholarly literature on LLMs. Synthesizing over 5,000 publications, this paper serves as a roadmap for researchers, practitioners, and policymakers to navigate the current landscape of LLMs research. We present the research trends from 2017 to early 2023, identifying patterns in research paradigms and collaborations. We start with analyzing the core algorithm developments and NLP tasks that are fundamental in LLMs research. We then investigate the applications of LLMs in various fields and domains including medicine, engineering, social science, and humanities. Our review also reveals the dynamic, fast-paced evolution of LLMs research. Overall, this paper offers valuable insights into the current state, impact, and potential of LLMs research and its applications.Comment: 36 pages, 9 figures, and 4 table

arXiv.org e-Print Archive