1,283 research outputs found
Recommended from our members
Linking Textual Resources to Support Information Discovery
A vast amount of information is today stored in the form of textual documents, many of which are available online. These documents come from different sources and are of different types. They include newspaper articles, books, corporate reports, encyclopedia entries and research papers. At a semantic level, these documents contain knowledge, which was created by explicitly connecting information and expressing it in the form of a natural language. However, a significant amount of knowledge is not explicitly stated in a single document, yet can be derived or discovered by researching, i.e. accessing, comparing, contrasting and analysing, information from multiple documents. Carrying out this work using traditional search interfaces is tedious due to information overload and the difficulty of formulating queries that would help us to discover information we are not aware of.
In order to support this exploratory process, we need to be able to effectively navigate between related pieces of information across documents. While information can be connected using manually curated cross-document links, this approach not only does not scale, but cannot systematically assist us in the discovery of sometimes non-obvious (hidden) relationships. Consequently, there is a need for automatic approaches to link discovery.
This work studies how people link content, investigates the properties of different link types, presents new methods for automatic link discovery and designs a system in which link discovery is applied on a collection of millions of documents to improve access to public knowledge
A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies
In-text citation analysis is one of the most frequently used methods in research evaluation. We are seeing significant growth in citation analysis through bibliometric metadata, primarily due to the availability of citation databases such as the Web of Science, Scopus, Google Scholar, Microsoft Academic, and Dimensions. Due to better access to full-text publication corpora in recent years, information scientists have gone far beyond traditional bibliometrics by tapping into advancements in full-text data processing techniques to measure the impact of scientific publications in contextual terms. This has led to technical developments in citation classifications, citation sentiment analysis, citation summarisation, and citation-based recommendation. This article aims to narratively review the studies on these developments. Its primary focus is on publications that have used natural language processing and machine learning techniques to analyse citations
A scientometric analysis of deep learning approaches for detecting Fake News
The unregulated proliferation of counterfeit news creation and dissemination that has been
seen in recent years poses a constant threat to democracy. Fake news articles have the power to
persuade individuals, leaving them perplexed. This scientometric study examined 569 documents
from the Scopus database between 2012 and mid-2022 to look for general research trends, publication
and citation structures, authorship and collaboration patterns, bibliographic coupling, and productivity patterns in order to identify fake news using deep learning. For this study, Biblioshiny and
VOSviewer were used. The findings of this study clearly demonstrate a trend toward an increase in
publications since 2016, and this dissemination of fake news is still an issue from a global perspective.
Thematic analysis of papers reveals that research topics related to social media for surveillance and
monitoring of public attitudes and perceptions, as well as fake news, are crucial but underdeveloped,
while studies on deep fake detection, digital contents, digital forensics, and computer vision constitute
niche areas. Furthermore, the results show that China and the USA have the strongest international
collaboration, despite India writing more articles. This paper also examines the current state of the art
in deep learning techniques for fake news detection, with the goal of providing a potential roadmap
for researchers interested in undertaking research in this fiel
Hidden Citations Obscure True Impact in Science
References, the mechanism scientists rely on to signal previous knowledge,
lately have turned into widely used and misused measures of scientific impact.
Yet, when a discovery becomes common knowledge, citations suffer from
obliteration by incorporation. This leads to the concept of hidden citation,
representing a clear textual credit to a discovery without a reference to the
publication embodying it. Here, we rely on unsupervised interpretable machine
learning applied to the full text of each paper to systematically identify
hidden citations. We find that for influential discoveries hidden citations
outnumber citation counts, emerging regardless of publishing venue and
discipline. We show that the prevalence of hidden citations is not driven by
citation counts, but rather by the degree of the discourse on the topic within
the text of the manuscripts, indicating that the more discussed is a discovery,
the less visible it is to standard bibliometric analysis. Hidden citations
indicate that bibliometric measures offer a limited perspective on quantifying
the true impact of a discovery, raising the need to extract knowledge from the
full text of the scientific corpus
Study on open science: The general state of the play in Open Science principles and practices at European life sciences institutes
Nowadays, open science is a hot topic on all levels and also is one of the priorities of the European Research Area. Components that are commonly associated with open science are open access, open data, open methodology, open source, open peer review, open science policies and citizen science. Open science may a great potential to connect and influence the practices of researchers, funding institutions and the public. In this paper, we evaluate the level of openness based on public surveys at four European life sciences institute
UNDERSTANDING THE SCHOLARLY COMMUNICATION PROCESS THROUGH DIGITAL TRACES: A STUDY OF TWITTER
Through the lens of the exploratory framework of Digital Trace of Scholarly Acts (DTSA), this dissertation study explored researchers’ activities around scholarly articles on Twitter. Using a mixed-methods design, this study analyzed data collected from a large-scale survey and twenty interviews with researchers on Twitter. The Critical Incident Technique was used as part of the interview study to learn about the full stories behind researchers’ sharing of scholarly articles on Twitter. There were variations in the researcher’s sentiment of opinions on articles they tweeted, retweeted, replied, and liked, based on their demographics. Despite a general positive tendency, researchers’ Twitter activities were associated with different sentiment due to their different perceptions of these activities. Variations were also found in how sharing scholarly articles on Twitter fit into researchers’ scholarly acts workflow with no monolithic pattern. This study contributed to a better understanding of the digital traces left by researchers on Twitter by providing richer descriptions and narratives of their activities. Researchers shared scholarly articles on Twitter for a variety of motivations: networking, promoting, disseminating, commenting, communicating with intended users, acknowledgment, and saving for later reference. The findings particularly shed light on the role of Twitter in communicating research and network building. Investigating the impact of the articles on the researchers led to a better understanding of what types of articles had a higher premium of sharing by researchers on Twitter. Evidence was found to support both the normative theory and the constructivist theory – the categories of impact included connecting, informing, practice-changing, beyond research, and potential impact. However, more than half of the shared articles examined had no impact on the researchers’ own work, indicating that Twitter metrics, even solely based on researchers’ Twitter activities, should not be used as an evaluative metric of the articles shared.Doctor of Philosoph
Knowledge-Based Techniques for Scholarly Data Access: Towards Automatic Curation
Accessing up-to-date and quality scientific literature is a critical preliminary step in any research activity.
Identifying relevant scholarly literature for the extents of a given task or application is, however a complex and time consuming activity.
Despite the large number of tools developed over the years to support scholars in their literature surveying activity, such as Google Scholar, Microsoft Academic search, and others, the best way to access quality papers remains asking a domain expert who is actively involved in the field and knows research trends and directions.
State of the art systems, in fact, either do not allow exploratory search activity, such as identifying the active research directions within a given topic, or do not offer proactive features, such as content recommendation, which are both critical to researchers.
To overcome these limitations, we strongly advocate a paradigm shift in the development of scholarly data access tools: moving from traditional information retrieval and filtering tools towards automated agents able to make sense of the textual content of published papers and therefore monitor the state of the art.
Building such a system is however a complex task that implies tackling non trivial problems in the fields of Natural Language Processing, Big Data Analysis, User Modelling, and Information Filtering.
In this work, we introduce the concept of Automatic Curator System and present its fundamental components.openDottorato di ricerca in InformaticaopenDe Nart, Dari
Scaling court decisions with citation networks
To compare court decisions in a systematic way, it is typically necessary to first read these decisions and then apply legal methods to them. Measurement models that support analysts in this manual labor usually rely on judges’ voting records. Since these data are often not available, we instead propose a latent-variable model that uses the widely available references in court decisions to measure the decisions’ latent position in their common case-space. We showcase our model in the context of forum shopping and forum selling of Germany’s lower courts
- …