10 research outputs found
Using nanopublications as a distributed ledger of digital truth
With the increase in volume of research publications, it is very difficult for researchers to keep abreast of all work in their area. Additionally, the claims in
classical publications are not machine-readable making it challenging to retrieve,
integrate, and link prior work. Several semantic publishing approaches have been
proposed to address these challenges, including Research Object, Executable Paper,
Micropublications, and Nanopublications.
Nanopublications are a granular way of publishing research-based claims, their
associated provenance, and publication information (metadata of the nanopublication) in a machine-readable form. To date, over 10 million nanopublications have
been published, covering a wide range of topics, predominantly in the life sciences.
Nanopublications are immutable, decentralised/distributed, uniformly structured,
granular level, and authentic. These features of nanopublications allow them to
be used as a Distributed Ledger of Digital Truth. Such a ledger enables detecting
conflicting claims and generating the timeline of discussion on a particular topic.
However, the inability to identify all nanopublications related to a given topic prevent existing nanopublications forming a ledger.
In this dissertation, we make the following contributions: (i) Identify quality
issues regarding misuse of authorship properties and linkrot which impact on the
quality of the digital ledger. We argue that the Nanopub community needs to be
developed a set of guidelines for publishing nanopublications. (ii) Provide a framework for generating a timeline of discourse over a collection of nanopublications by
retrieving and combining nanopublications on a particular topic to provide interoperability between them. (iii) Detect contradictory claims between nanopublications
automatically highlighting the conflicts and provide explanations based on the provenance information in the nanopublications. Through these contributions, we show
that nanopublications can form a distributed ledger of digital truth, providing key
benefits such as citability, timelines of discourse, and conflict detection, to users of
the ledger
Serviços de integração de dados para aplicações biomédicas
Doutoramento em Informática (MAP-i)In the last decades, the field of biomedical science has fostered
unprecedented scientific advances. Research is stimulated by the
constant evolution of information technology, delivering novel and
diverse bioinformatics tools. Nevertheless, the proliferation of new and
disconnected solutions has resulted in massive amounts of resources
spread over heterogeneous and distributed platforms. Distinct
data types and formats are generated and stored in miscellaneous
repositories posing data interoperability challenges and delays in
discoveries. Data sharing and integrated access to these resources
are key features for successful knowledge extraction.
In this context, this thesis makes contributions towards accelerating
the semantic integration, linkage and reuse of biomedical resources.
The first contribution addresses the connection of distributed and
heterogeneous registries. The proposed methodology creates a
holistic view over the different registries, supporting semantic
data representation, integrated access and querying. The second
contribution addresses the integration of heterogeneous information
across scientific research, aiming to enable adequate data-sharing
services. The third contribution presents a modular architecture to
support the extraction and integration of textual information, enabling
the full exploitation of curated data. The last contribution lies
in providing a platform to accelerate the deployment of enhanced
semantic information systems. All the proposed solutions were
deployed and validated in the scope of rare diseases.Nas últimas décadas, o campo das ciências biomédicas proporcionou
grandes avanços científicos estimulados pela constante evolução das
tecnologias de informação. A criação de diversas ferramentas na
área da bioinformática e a falta de integração entre novas soluções
resultou em enormes quantidades de dados distribuídos por diferentes
plataformas. Dados de diferentes tipos e formatos são gerados
e armazenados em vários repositórios, o que origina problemas de
interoperabilidade e atrasa a investigação. A partilha de informação
e o acesso integrado a esses recursos são características fundamentais
para a extração bem sucedida do conhecimento científico.
Nesta medida, esta tese fornece contribuições para acelerar a
integração, ligação e reutilização semântica de dados biomédicos. A
primeira contribuição aborda a interconexão de registos distribuídos e
heterogéneos. A metodologia proposta cria uma visão holística sobre
os diferentes registos, suportando a representação semântica de dados
e o acesso integrado. A segunda contribuição aborda a integração
de diversos dados para investigações científicas, com o objetivo de
suportar serviços interoperáveis para a partilha de informação. O
terceiro contributo apresenta uma arquitetura modular que apoia a
extração e integração de informações textuais, permitindo a exploração
destes dados. A última contribuição consiste numa plataforma web
para acelerar a criação de sistemas de informação semânticos. Todas
as soluções propostas foram validadas no âmbito das doenças raras
Affordances and limitations of algorithmic criticism
Humanities scholars currently have access to unprecedented quantities of machine-readable texts, and, at the same time, the tools and the methods with which we can analyse and visualise these texts are becoming more and more sophisticated. As has been shown in numerous studies, many of the new technical possibilities that emerge from fields such as text mining and natural language processing can have useful applications within literary research. Computational methods can help literary scholars to discover interesting trends and correlations within massive text collections, and they can enable a thoroughly systematic examination of the stylistic properties of literary works. While such computer-assisted forms of reading have proven invaluable for research in the field of literary history, relatively few studies have applied these technologies to expand or to transform the ways in which we can interpret literary texts. Based on a comparative analysis of digital scholarship and traditional scholarship, this thesis critically examines the possibilities and the limitations of a computer-based literary criticism. It argues that quantitative analyses of data about literary techniques can often reveal surprising qualities of works of literature, which can, in turn, lead to new interpretative readings
Out of cite, out of mind: the current state of practice, policy, and technology for the citation of data
PREFACE
The growth in the capacity of the research community to collect and distribute data presents huge opportunities. It is already transforming old methods of scientific research and permitting the creation of new ones. However, the exploitation of these opportunities depends upon more than computing power, storage, and network connectivity. Among the promises of our growing universe of online digital data are the ability to integrate data into new forms of scholarly publishing to allow peer-examination and review of conclusions or analysis of experimental and observational data and the ability for subsequent researchers to make new analyses of the same data, including their combination with other data sets and uses that may have been unanticipated by the original producer or collector.
The use of published digital data, like the use of digitally published literature, depends upon the ability to identify, authenticate, locate, access, and interpret them. Data citations provide necessary support for these functions, as well as other functions such as attribution of credit and establishment of provenance. References to data, however, present challenges not encountered in references to literature. For example, how can one specify a particular subset of data in the absence of familiar conventions such as page numbers or chapters? The traditions and good practices for maintaining the scholarly record by proper references to a work are well established and understood in regard to journal articles and other literature, but attributing credit by bibliographic references to data are not yet so broadly implemented
Contributions towards understanding and building sustainable science
This dissertation focuses on either understanding and detecting threats to the epistemology of science (chapters 1-6) or making practical advances to remedy epistemological threats (chapters 7-9). Chapter 1 reviews the literature on responsible conduct of research, questionable research practices, and research misconduct. Chapter 2 reanalyzes Head et al (2015) their claims about widespread p-hacking for robustness. Chapter 3 examines 258,050 test results across 30,710 articles from eight high impact journals to investigate the existence of a peculiar prevalence of -values just below .05 (i.e., a bump) in the psychological literature, and a potential increase thereof over time. Chapter 4 examines evidence for false negatives in nonsignificant results throughout psychology, gender effects, and the Reproducibility Project: Psychology. Chapter 5 describes a dataset that is the result of content mining 167,318 published articles for statistical test results reported according to the standards prescribed by the American Psychological Association (APA). In Chapter 6, I test the validity of statistical methods to detect fabricated data in two studies. Chapter 7 tackles the issue of data extraction from figures in scholarly publications. In Chapter 8 I argue that "after-the-fact" research papers do not help alleviate issues of access, selective publication, and reproducibility, but actually cause some of these threats because the chronology of the research cycle is lost in a research paper. I propose to give up the academic paper and propose a digitally native "as-you-go" alternative. In Chapter 9 I propose a technical design for this
Bioinformatic approaches to identify genomic, proteomic and metabolomic biomarkers for the metabolic syndrome
Advances in technology have turned modern biology into a data-intensive enterprise. The advent of high-output technologies like Microarrays and Next-generation sequencing technologies has resulted in researchers grappling not just with huge volumes but also multiple types of data. While generation and storage of high-quality data are an important research focus, it is increasingly recognized that translating data into actionable information and insight is a critical research challenge. To infer reliable conclusions from the data, it is often necessary to integrate large amounts of heterogeneous data with different formats and semantics. Given the breadth and volume of data involved, this goal is best achieved through automated methods and tools for data integration and workflow management. This thesis presents automated strategies that combine bioinformatics and statistical methods to identify novel biomarkers in high-throughput OMICs datasets pertaining to the metabolic syndrome and to gain mechanistic insight into the underlying biological processes. An underlying theme in this thesis is data-driven approaches that generate plausible hypothesis which is followed by experimental verification.UBL - phd migration 201
Converting Scholarly Journals to Open Access: A Review of Approaches and Experiences
This report identifies ways through which subscription-based scholarly journals have converted their publishing models to open access (OA). The major goal was to identify specific scenarios that have been used or proposed for transitioning subscription journals to OA so that these scenarios can provide options for others seeking to “flip” their journals to OA. The report is based on the published literature as well as “gray” literature such as blog posts and press releases. In addition, interviews were conducted with eight experts in scholarly publishing. The report identifies a variety of goals for converting a journal to OA. While there are altruistic goals of making scholarship more accessible, the literature review and interviews suggest that there are also many practical reasons for transitioning to an OA model. In some instances, an OA business model is simply more economically viable. Also, it is not unusual for a society or editorial board to transition to an OA business model as a means of gaining independence from the current publisher. Increasing readership, the number and quality of submissions, and impact as measured in citations are important goals for most journals that are considering flipping. Goals and their importance often differ for various regions in the world and across different disciplines. Each journal’s situation is unique and it is important for those seeking to flip a journal to carefully consider exactly what they hope to achieve, what barriers they are likely to face, and how the changes that are being implemented will further the goals intended for their journal
Converting Scholarly Journals to Open Access: A Review of Approaches and Experiences
This report identifies ways through which subscription-based scholarly journals have converted their publishing models to open access (OA). The major goal was to identify specific scenarios that have been used or proposed for transitioning subscription journals to OA so that these scenarios can provide options for others seeking to “flip” their journals to OA. The report is based on the published literature as well as “gray” literature such as blog posts and press releases. In addition, interviews were conducted with eight experts in scholarly publishing. The report identifies a variety of goals for converting a journal to OA. While there are altruistic goals of making scholarship more accessible, the literature review and interviews suggest that there are also many practical reasons for transitioning to an OA model. In some instances, an OA business model is simply more economically viable. Also, it is not unusual for a society or editorial board to transition to an OA business model as a means of gaining independence from the current publisher. Increasing readership, the number and quality of submissions, and impact as measured in citations are important goals for most journals that are considering flipping. Goals and their importance often differ for various regions in the world and across different disciplines. Each journal’s situation is unique and it is important for those seeking to flip a journal to carefully consider exactly what they hope to achieve, what barriers they are likely to face, and how the changes that are being implemented will further the goals intended for their journal