Search CORE

10 research outputs found

Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data

Author: Benis Nirupama
Bernabe Cesar Henrique
Cornet Ronald
Dumontier Michel
Godoy Mario Prieto
Jacobsen Annika
Kaliyaperumal Rajaram
Kool Leo J. Schultze
Lalout Nawel
Le Cornec Clemence M. A.
Moreno Pablo Alarcon
Queralt-Rosinach Nuria
Roos Marco
Swertz Morris A.
van Damme Philip
van der Velde K. Joeri
Vieira Bruna dos Santos
Wilkinson Mark D.
Zhang Shuxin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/03/2022
Field of study

BACKGROUND: The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. RESULTS: Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. CONCLUSIONS: Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

PubMed Central

Digital.CSIC

Dissertations of the University of Groningen

The Case of Wikidata

Author: Farda-Sarbas Mariam
Publication venue
Publication date: 01/01/2024
Field of study

Since its launch in 2012, Wikidata has grown to become the largest open knowledge base (KB), containing more than 100 million data items and over 6 million registered users. Wikidata serves as the structured data backbone of Wikipedia, addressing data inconsistencies, and adhering to the motto of “serving anyone anywhere in the world,” a vision realized through the diversity of knowledge. Despite being a collaboratively contributed platform, the Wikidata community heavily relies on bots, automated accounts with batch, and speedy editing rights, for a majority of edits. As Wikidata approaches its first decade, the question arises: How close is Wikidata to achieving its vision of becoming a global KB and how diverse is it in serving the global population? This dissertation investigates the current status of Wikidata’s diversity, the role of bot interventions on diversity, and how bots can be leveraged to improve diversity within the context of Wikidata. The methodologies used in this study are mapping study and content analysis, which led to the development of three datasets: 1) Wikidata Research Articles Dataset, covering the literature on Wikidata from its first decade of existence sourced from online databases to inspect its current status; 2) Wikidata Requests-for-Permissions Dataset, based on the pages requesting bot rights on the Wikidata website to explore bots from a community perspective; and 3) Wikidata Revision History Dataset, compiled from the edit history of Wikidata to investigate bot editing behavior and its impact on diversity, all of which are freely available online. The insights gained from the mapping study reveal the growing popularity of Wikidata in the research community and its various application areas, indicative of its progress toward the ultimate goal of reaching the global community. However, there is currently no research addressing the topic of diversity in Wikidata, which could shed light on its capacity to serve a diverse global population. To address this gap, this dissertation proposes a diversity measurement concept that defines diversity in a KB context in terms of variety, balance, and disparity and is capable of assessing diversity in a KB from two main angles: user and data. The application of this concept on the domains and classes of the Wikidata Revision History Dataset exposes imbalanced content distribution across Wikidata domains, which indicates low data diversity in Wikidata domains. Further analysis discloses that bots have been active since the inception of Wikidata, and the community embraces their involvement in content editing tasks, often importing data from Wikipedia, which shows a low diversity of sources in bot edits. Bots and human users engage in similar editing tasks but exhibit distinct editing patterns. The findings of this thesis confirm that bots possess the potential to influence diversity within Wikidata by contributing substantial amounts of data to specific classes and domains, leading to an imbalance. However, this potential can also be harnessed to enhance coverage in classes with limited content and restore balance, thus improving diversity. Hence, this study proposes to enhance diversity through automation and demonstrate the practical implementation of the recommendations using a specific use case. In essence, this research enhances our understanding of diversity in relation to a KB, elucidates the influence of automation on data diversity, and sheds light on diversity improvement within a KB context through the usage of automation.Seit seiner Einführung im Jahr 2012 hat sich Wikidata zu der größten offenen Wissensdatenbank entwickelt, die mehr als 100 Millionen Datenelemente und über 6 Millionen registrierte Benutzer enthält. Wikidata dient als das strukturierte Rückgrat von Wikipedia, indem es Datenunstimmigkeiten angeht und sich dem Motto verschrieben hat, ’jedem überall auf der Welt zu dienen’, eine Vision, die durch die Diversität des Wissens verwirklicht wird. Trotz seiner kooperativen Natur ist die Wikidata-Community in hohem Maße auf Bots, automatisierte Konten mit Batch- Verarbeitung und schnelle Bearbeitungsrechte angewiesen, um die Mehrheit der Bearbeitungen durchzuführen. Da Wikidata seinem ersten Jahrzehnt entgegengeht, stellt sich die Frage: Wie nahe ist Wikidata daran, seine Vision, eine globale Wissensdatenbank zu werden, zu verwirklichen, und wie ausgeprägt ist seine Dienstleistung für die globale Bevölkerung? Diese Dissertation untersucht den aktuellen Status der Diversität von Wikidata, die Rolle von Bot-Eingriffen in Bezug auf Diversität und wie Bots im Kontext von Wikidata zur Verbesserung der Diversität genutzt werden können. Die in dieser Studie verwendeten Methoden sind Mapping-Studie und Inhaltsanalyse, die zur Entwicklung von drei Datensätzen geführt haben: 1) Wikidata Research Articles Dataset, die die Literatur zu Wikidata aus dem ersten Jahrzehnt aus Online-Datenbanken umfasst, um den aktuellen Stand zu untersuchen; 2) Requestfor- Permission Dataset, der auf den Seiten zur Beantragung von Bot-Rechten auf der Wikidata-Website basiert, um Bots aus der Perspektive der Gemeinschaft zu untersuchen; und 3)Wikidata Revision History Dataset, der aus der Bearbeitungshistorie von Wikidata zusammengestellt wurde, um das Bearbeitungsverhalten von Bots zu untersuchen und dessen Auswirkungen auf die Diversität, die alle online frei verfügbar sind. Die Erkenntnisse aus der Mapping-Studie zeigen die wachsende Beliebtheit von Wikidata in der Forschungsgemeinschaft und in verschiedenen Anwendungsbereichen, was auf seinen Fortschritt hin zur letztendlichen Zielsetzung hindeutet, die globale Gemeinschaft zu erreichen. Es gibt jedoch derzeit keine Forschung, die sich mit dem Thema der Diversität in Wikidata befasst und Licht auf seine Fähigkeit werfen könnte, eine vielfältige globale Bevölkerung zu bedienen. Um diese Lücke zu schließen, schlägt diese Dissertation ein Konzept zur Messung der Diversität vor, das die Diversität im Kontext einer Wissensbasis anhand von Vielfalt, Balance und Diskrepanz definiert und in der Lage ist, die Diversität aus zwei Hauptperspektiven zu bewerten: Benutzer und Daten. Die Anwendung dieses Konzepts auf die Bereiche und Klassen des Wikidata Revision History Dataset zeigt eine unausgewogene Verteilung des Inhalts über die Bereiche von Wikidata auf, was auf eine geringe Diversität der Daten in den Bereichen von Wikidata hinweist. Weitere Analysen zeigen, dass Bots seit der Gründung von Wikidata aktiv waren und von der Gemeinschaft inhaltliche Bearbeitungsaufgaben angenommen werden, oft mit Datenimporten aus Wikipedia, was auf eine geringe Diversität der Quellen bei Bot-Bearbeitungen hinweist. Bots und menschliche Benutzer führen ähnliche Bearbeitungsaufgaben aus, zeigen jedoch unterschiedliche Bearbeitungsmuster. Die Ergebnisse dieser Dissertation bestätigen, dass Bots das Potenzial haben, die Diversität in Wikidata zu beeinflussen, indem sie bedeutende Datenmengen zu bestimmten Klassen und Bereichen beitragen, was zu einer Ungleichgewichtung führt. Dieses Potenzial kann jedoch auch genutzt werden, um die Abdeckung in Klassen mit begrenztem Inhalt zu verbessern und das Gleichgewicht wiederherzustellen, um die Diversität zu verbessern. Daher schlägt diese Studie vor, die Diversität durch Automatisierung zu verbessern und die praktische Umsetzung der Empfehlungen anhand eines spezifischen Anwendungsfalls zu demonstrieren. Kurz gesagt trägt diese Forschung dazu bei, unser Verständnis der Diversität im Kontext einer Wissensbasis zu vertiefen, wirft Licht auf den Einfluss von Automatisierung auf die Diversität von Daten und zeigt die Verbesserung der Diversität im Kontext einer Wissensbasis durch die Verwendung von Automatisierung auf

Institutional Repository of the Freie Universität Berlin

NFDI4Microbiota – national research data infrastructure for microbiota research

Author: Alexander Goesmann
Alexander Sczyrba
Alfred Pühler
Alice McHardy
Anke Becker
Barbara Götz
Dietrich Rebholz-Schuhmann
Franziska Hufsky
Jens Stoye
Jochen Blom
Justine Vandendorpe
Jörg Overmann
Konrad U. Förstner
Manja Marz
Marie-Louise Körner
Marius Dieckmann
Peer Bork
Sebastian Jünemann
Thea Van Rossum
Thomas Clavel
Thomas Gübitz
Ulisses Nunes Da Rocha
Publication venue: Pensoft Publishers
Publication date: 01/01/2023
Field of study

Microbes – bacteria, archaea, unicellular eukaryotes, and viruses – play an important role in human and environmental health. Growing awareness of this fact has led to a huge increase in microbiological research and applications in a variety of fields. Driven by technological advances that allow high-throughput molecular characterization of microbial species and communities, microbiological research now offers unparalleled opportunities to address current and emerging needs. As well as helping to address global health threats such as antimicrobial resistance and viral pandemics, it also has a key role to play in areas such as agriculture, waste management, water treatment, ecosystems remediation, and the diagnosis, treatment and prevention of various diseases. Reflecting this broad potential, billions of euros have been invested in microbiota research programs worldwide. Though run independently, many of these projects are closely related. However, Germany currently has no infrastructure to connect such projects or even compare their results. Thus, the potential synergy of data and expertise is being squandered. The goal of the NFDI4Microbiota consortium is to serve and connect this broad and heterogeneous research community by elevating the availability and quality of research results through dedicated training, and by facilitating the generation, management, interpretation, sharing, and reuse of microbial data. In doing so, we will also foster interdisciplinary interactions between researchers. NFDI4Microbiota will achieve this by creating a German microbial research network through training and community-building activities, and by creating a cloud-based system that will make the storage, integration and analysis of microbial data, especially omics data, consistent, reproducible, and accessible across all areas of life sciences. In addition to increasing the quality of microbial research in Germany, our training program will support widespread and proper usage of these services. Through this dual emphasis on education and services, NFDI4Microbiota will ensure that microbial research in Germany is synergistic and efficient, and thus excellent. By creating a central resource for German microbial research, NDFDI4Microbiota will establish a connecting hub for all NFDI consortia that work with microbiological data, including GHGA, NFDI4Biodiversity, NFDI4Agri and several others. NFDI4Microbiota will provide non-microbial specialists from these consortia with direct and easy access to the necessary expertise and infrastructure in microbial research in order to facilitate their daily work and enhance their research. The links forged through NFDI4Microbiota will not only increase the synergy between NFDI consortia, but also elevate the overall quality and relevance of microbial research in Germany

Directory of Open Access Journals

Publications at Bielefeld University

ARPHA OAI-PMH Endpoint

ARPHA Preprints

AIUCD 2021 - Book of Extended Abstracts

Author
Publication venue
Publication date: 23/06/2021
Field of study

Il decimo convegno annuale dell'Associazione per l’Informatica Umanistica e la Cultura Digitale ha nell’edizione 2021 un titolo peculiare e importante: "DH per la società: e-guaglianza, partecipazione, diritti e valori nell’era digitale". Questo volume raccoglie gli abstract estesi e sottoposti a review per la conferenza di AIUCD2021 tenutasi in forma virtuale a Pisa

AMS Acta

CLARIN

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 30/01/2023
Field of study

The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

Directory of Open Access Books (DOAB)

CLARIN. The infrastructure for language resources

Author: Fišer Darja
Witt Andreas
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 17/10/2022
Field of study

CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

Publikationsserver des Instituts für Deutsche Sprache

CLARIN

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library

Neolithic land-use in the Dutch wetlands: estimating the land-use implications of resource exploitation strategies in the Middle Swifterbant Culture (4600-3900 BCE)

Author: Dusseldorp G.L.
Out W.A.
Publication venue
Publication date: 09/09/2021
Field of study

The Dutch wetlands witness the gradual adoption of Neolithic novelties by foraging societies during the Swifterbant period. Recent analyses provide new insights into the subsistence palette of Middle Swifterbant societies. Small-scale livestock herding and cultivation are in evidence at this time, but their importance if unclear. Within the framework of PAGES Land-use at 6000BP project, we aim to translate the information on resource exploitation into information on land-use that can be incorporated into global climate modelling efforts, with attention for the importance of agriculture. A reconstruction of patterns of resource exploitation and their land-use dimensions is complicated by methodological issues in comparing the results of varied recent investigations. Analyses of organic residues in ceramics have attested to the cooking of aquatic foods, ruminant meat, porcine meat, as well as rare cases of dairy. In terms of vegetative matter, some ceramics exclusively yielded evidence of wild plants, while others preserve cereal remains. Elevated δ15N values of human were interpreted as demonstrating an important aquatic component of the diet well into the 4th millennium BC. Yet recent assays on livestock remains suggest grazing on salt marshes partly accounts for the human values. Finally, renewed archaeozoological investigations have shown the early presence of domestic animals to be more limited than previously thought. We discuss the relative importance of exploited resources to produce a best-fit interpretation of changing patterns of land-use during the Middle Swifterbant phase. Our review combines recent archaeological data with wider data on anthropogenic influence on the landscape. Combining the results of plant macroremains, information from pollen cores about vegetation development, the structure of faunal assemblages, and finds of arable fields and dairy residue, we suggest the most parsimonious interpretation is one of a limited land-use footprint of cultivation and livestock keeping in Dutch wetlands between 4600 and 3900 BCE.NWOVidi 276-60-004Human Origin

Leiden University Scholary Publications

Taphonomy, environment or human plant exploitation strategies?: Deciphering changes in Pleistocene-Holocene plant representation at Umhlatuzana rockshelter, South Africa

Author: Dusseldorp G.L.
Esteban I.
Murungi M.
Sifogeorgakis E.
Publication venue
Publication date: 10/09/2021
Field of study

The period between ~40 and 20 ka BP encompassing the Middle Stone Age (MSA) and Later Stone Age (LSA) transition has long been of interest because of the associated technological change. Understanding this transition in southern Africa is complicated by the paucity of archaeological sites that span this period. With its occupation sequence spanning the last ~70,000 years, Umhlatuzana Rock Shelter is one of the few sites that record this transition. Umhlatuzana thus offers a great opportunity to study past environmental dynamics from the Late Pleistocene (MIS 4) to the Late Holocene, and past human subsistence strategies, their social organisation, technological and symbolic innovations. Although organic preservation is poor (bones, seeds, and charcoal) at the site, silica phytoliths preserve generally well throughout the sequence. These microscopic silica particles can identify different plant types that are no longer visible at the site because of decomposition or burning to a reliable taxonomical level. Thus, to trace site occupation, plant resource use, and in turn reconstruct past vegetation, we applied phytolith analyses to sediment samples of the newly excavated Umhlatuzana sequence. We present results of the phytolith assemblage variability to determine change in plant use from the Pleistocene to the Holocene and discuss them in relation to taphonomical processes and human plant gathering strategies and activities. This study ultimately seeks to provide a palaeoenvironmental context for modes of occupation and will shed light on past human-environmental interactions in eastern South Africa.NWOVidi 276-60-004Human Origin

Leiden University Scholary Publications

Ways and Capacity in Archaeological Data Management in Serbia

Author: Tapavički-Ilić Milica
Šegan Radonjić Marija
Publication venue: EAA - European Association of Archaeologists
Publication date: 01/01/2021
Field of study

Over the past year and due to the COVID-19 pandemic, the entire world has witnessed inequalities across borders and societies. They also include access to archaeological resources, both physical and digital. Both archaeological data creators and users spent a lot of time working from their homes, away from artefact collections and research data. However, this was the perfect moment to understand the importance of making data freely and openly available, both nationally and internationally. This is why the authors of this paper chose to make a selection of data bases from various institutions responsible for preservation and protection of cultural heritage, in order to understand their policies regarding accessibility and usage of the data they keep. This will be done by simple visits to various web-sites or data bases. They intend to check on the volume and content, but also importance of the offered archaeological heritage. In addition, the authors will estimate whether the heritage has adequately been classified and described and also check whether data is available in foreign languages. It needs to be seen whether it is possible to access digital objects (documents and the accompanying metadata), whether access is opened for all users or it requires a certain hierarchy access, what is the policy of usage, reusage and distribution etc. It remains to be seen whether there are public API or whether it is possible to collect data through API. In case that there is a public API, one needs to check whether datasets are interoperable or messy, requiring data cleaning. After having visited a certain number of web-sites, the authors expect to collect enough data to make a satisfactory conclusion about accessibility and usage of Serbian archaeological data web bases

Repository of the Institute of Archaeology, Belgrade (RAI)