4 research outputs found
Decentralized provenance-aware publishing with nanopublications
Publication and archival of scientific results is still commonly considered the responsability of classical publishing companies. Classical forms of publishing, however, which center around printed narrative articles, no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. In this article, we propose to design scientific data publishing as a web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used as a low-level data publication layer to serve the Semantic Web in general. Our evaluation of the current network shows that this system is efficient and reliable
Micropublication: incentivizing community curation and placing unpublished data into the public domain
Large volumes of data generated by research laboratories coupled with the required effort and cost of curation present a significant barrier to inclusion of these data in authoritative community databases. Further, many publicly funded experimental observations remain invisible to curation simply because they are never published: results often do not fit within the scope of a standard publication; trainee-generated data are forgotten when the experimenter (e.g. student, post-doc) leaves the lab; results are omitted from science narratives due to publication bias where certain results are considered irrelevant for the publication. While authors are in the best position to curate their own data, they face a steep learning curve to ensure that appropriate referential tags, metadata, and ontologies are applied correctly to their observations, a task sometimes considered beyond the scope of their research and other numerous responsibilities. Getting researchers to adopt a new system of data reporting and curation requires a fundamental change in behavior among all members of the research community. To solve these challenges, we have created a novel scholarly communication platform that captures data from researchers and directly delivers them to information resources via Micropublication. This platform incentivizes authors to publish their unpublished observations along with associated metadata by providing a deliberately fast and lightweight but still peer-reviewed process that results in a citable publication. Our long-term goal is to develop a data ecosystem that improves reproducibility and accountability of publicly funded research and in turn accelerates both basic and translational discovery
Serviços de integração de dados para aplicações biomédicas
Doutoramento em Informática (MAP-i)In the last decades, the field of biomedical science has fostered
unprecedented scientific advances. Research is stimulated by the
constant evolution of information technology, delivering novel and
diverse bioinformatics tools. Nevertheless, the proliferation of new and
disconnected solutions has resulted in massive amounts of resources
spread over heterogeneous and distributed platforms. Distinct
data types and formats are generated and stored in miscellaneous
repositories posing data interoperability challenges and delays in
discoveries. Data sharing and integrated access to these resources
are key features for successful knowledge extraction.
In this context, this thesis makes contributions towards accelerating
the semantic integration, linkage and reuse of biomedical resources.
The first contribution addresses the connection of distributed and
heterogeneous registries. The proposed methodology creates a
holistic view over the different registries, supporting semantic
data representation, integrated access and querying. The second
contribution addresses the integration of heterogeneous information
across scientific research, aiming to enable adequate data-sharing
services. The third contribution presents a modular architecture to
support the extraction and integration of textual information, enabling
the full exploitation of curated data. The last contribution lies
in providing a platform to accelerate the deployment of enhanced
semantic information systems. All the proposed solutions were
deployed and validated in the scope of rare diseases.Nas últimas décadas, o campo das ciências biomédicas proporcionou
grandes avanços científicos estimulados pela constante evolução das
tecnologias de informação. A criação de diversas ferramentas na
área da bioinformática e a falta de integração entre novas soluções
resultou em enormes quantidades de dados distribuídos por diferentes
plataformas. Dados de diferentes tipos e formatos são gerados
e armazenados em vários repositórios, o que origina problemas de
interoperabilidade e atrasa a investigação. A partilha de informação
e o acesso integrado a esses recursos são características fundamentais
para a extração bem sucedida do conhecimento científico.
Nesta medida, esta tese fornece contribuições para acelerar a
integração, ligação e reutilização semântica de dados biomédicos. A
primeira contribuição aborda a interconexão de registos distribuídos e
heterogéneos. A metodologia proposta cria uma visão holística sobre
os diferentes registos, suportando a representação semântica de dados
e o acesso integrado. A segunda contribuição aborda a integração
de diversos dados para investigações científicas, com o objetivo de
suportar serviços interoperáveis para a partilha de informação. O
terceiro contributo apresenta uma arquitetura modular que apoia a
extração e integração de informações textuais, permitindo a exploração
destes dados. A última contribuição consiste numa plataforma web
para acelerar a criação de sistemas de informação semânticos. Todas
as soluções propostas foram validadas no âmbito das doenças raras