85 research outputs found
Towards FAIRification of sensitive and fragmented rare disease patient data: challenges and solutions in European reference network registries
Introduction: Rare disease patient data are typically sensitive, present in multiple registries controlled by different custodians, and non-interoperable. Making these data Findable, Accessible, Interoperable, and Reusable (FAIR) for humans and machines at source enables federated discovery and analysis across data custodians. This facilitates accurate diagnosis, optimal clinical management, and personalised treatments. In Europe, twenty-four European Reference Networks (ERNs) work on rare disease registries in different clinical domains. The process and the implementation choices for making data FAIR (âFAIRificationâ) differ among ERN registries. For example, registries use different software systems and are subject to different legal regulations. To support the ERNs in making informed decisions and to harmonise FAIRification, the FAIRification steward team was established to work as liaisons between ERNs and researchers from the European Joint Programme on Rare Diseases. Results: The FAIRification steward team inventoried the FAIRification challenges of the ERN registries and proposed solutions collectively with involved stakeholders to address them. Ninety-eight FAIRification challenges from 24 ERNsâ registries were collected and categorised into âtrainingâ (31), âcommunityâ (9), âmodellingâ (12), âimplementationâ (26), and âlegalâ (20). After curating and aggregating highly similar challenges, 41 unique FAIRification challenges remained. The two categories with the most challenges were âtrainingâ (15) and âimplementationâ (9), followed by âcommunityâ (7), and then âmodellingâ (5) and âlegalâ (5). To address all challenges, eleven types of solutions were proposed. Among them, the provision of guidelines and the organisation of training activities resolved the âtrainingâ challenges, which ranged from less-technical âcoffee-roundsâ to technical workshops, from informal FAIR Games to formal hackathons. Obtaining implementation support from technical experts was the solution type for tackling the âimplementationâ challenges. Conclusion: This work shows that a dedicated team of FAIR data stewards is an asset for harmonising the various processes of making data FAIR in a large organisation with multiple stakeholders. Additionally, multi-levelled training activities are required to accommodate the diverse needs of the ERNs. Finally, the lessons learned from the experience of the FAIRification steward team described in this paper may help to increase FAIR awareness and provide insights into FAIRification challenges and solutions of rare disease registries
Performance assessment of ontology matching systems for FAIR data
© The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision.
Results: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappingsâ classes belonged to top-level classes that matched.
Conclusions: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem
Recommended from our members
Rewilding and restoring nature in a changing world
Overview of the PLoS ONE collection 'Rewilding and Restoration'
High-level integration of murine intestinal transcriptomics data highlights the importance of the complement system in mucosal homeostasis.
BACKGROUND: The mammalian intestine is a complex biological system that exhibits functional plasticity in its response to diverse stimuli to maintain homeostasis. To improve our understanding of this plasticity, we performed a high-level data integration of 14 whole-genome transcriptomics datasets from samples of intestinal mouse mucosa. We used the tool Centrality based Pathway Analysis (CePa), along with information from the Reactome database. RESULTS: The results show an integrated response of the mouse intestinal mucosa to challenges with agents introduced orally that were expected to perturb homeostasis. We observed that a common set of pathways respond to different stimuli, of which the most reactive was the Regulation of Complement Cascade pathway. Altered expression of the Regulation of Complement Cascade pathway was verified in mouse organoids challenged with different stimuli in vitro. CONCLUSIONS: Results of the integrated transcriptomics analysis and data driven experiment suggest an important role of epithelial production of complement and host complement defence factors in the maintenance of homeostasis
Estimating food production in an urban landscape
There is increasing interest in urban food production for reasons of food security, environmental sustainability, social and health benefits. In developed nations urban food growing is largely informal and localised, in gardens, allotments and public spaces, but we know little about the magnitude of this production. Here we couple own-grown crop yield data with garden and allotment areal surveys and urban fruit tree occurrence to provide one of the first estimates for current and potential food production in a UK urban setting. Current production is estimated to be sufficient to supply the urban population with fruit and vegetables for about 30 days per year, while the most optimistic model results suggest that existing land cultivated for food could supply over half of the annual demand. Our findings provide a baseline for current production whilst highlighting the potential for change under the scaling up of cultivation on existing land
Recommended from our members
Mapping and modeling the impact of climate change on recreational ecosystem services using machine learning and big data
The use of recreational ecosystem services is highly dependent on the surrounding environmental and climate conditions. Due to this dependency, future recreational opportunities provided by nature are at risk from climate change. To understand how climate change will impact recreation we need to understand current recreational patterns, but traditional data is limited and low resolution. Fortunately, social media data presents an opportunity to overcome those data limitations and machine learning offers a tool to effectively use that big data. We use data from the social media site Flickr as a proxy for recreational visitation and random forest to model the relationships between social, environmental, and climate factors and recreation for the peak season (summer) in California. We then use the model to project how non-urban recreation will change as the climate changes. Our model shows that current patterns are exacerbated in the future under climate change, with currently popular summer recreation areas becoming more suitable and unpopular summer recreation areas becoming less suitable for recreation. Our model results have land management implications as recreation regions that see high visitation consequently experience impacts to surrounding ecosystems, ecosystem services, and infrastructure. This information can be used to include climate change impacts into land management plans to more effectively provide sustainable nature recreation opportunities for current and future generations. Furthermore, our study demonstrates that crowdsourced data and machine learning offer opportunities to better integrate socio-ecological systems into climate impacts research and more holistically understand climate change impacts to human well-being
Comparison of two mathematical models for correlating the organic matter removal efficiency with hydraulic retention time in a hybrid anaerobic baffled reactor treating molasses
A modelling of the anaerobic digestion process of molasses was conducted in a 70-L multistage anaerobic biofilm reactor or hybrid anaerobic baffled reactor with six compartments at an operating temperature of 26 Ă°C. Five hydraulic retention times (6, 16, 24, 72 and 120 h) were studied at a constant influent COD concentration of 10,000 mg/L. Two different kinetic models (one was based on a dispersion model with first-order kinetics for substrate consumption and the other based on a modification of the Young equation) were evaluated and compared to predict the organic matter removal efficiency or fractional conversion. The first-order kinetic constant obtained with the dispersion model was 0.28 h -1, the Peclet dispersion number being 45, with a mean relative error of 2%. The model based on the Young equation predicted the behaviour of the reactor more accurately showing deviations lower than 10% between the theoretical and experimental values of the fractional conversion, the mean relative error being 0.9% in this case. Ă© 2011 Springer-Verlag.The authors gratefully acknowledge the financial support of the Water Research Center of Greentech (Co., Ltd.), Shiraz and the R&D Center of Anshan Corporation.Peer Reviewe
Recommended from our members
A review of machine learning and big data applications in addressing ecosystem service research gaps
Ecosystem services are essential for human well-being, but are currently facing many natural and anthropogenic threats. Modeling and mapping ecosystem services helps us mitigate, adapt to, and manage these pressures, but overall the field faces multiple major limitations. These include: 1) data availability, 2) understanding, estimation, and reporting of uncertainties, and 3) connecting socio-ecological aspects of ecosystem services. Recent technological advancements in machine learning coupled with rising availability of big data, offer an opportunity to overcome these challenges. We review studies utilizing machine learning and/or big data to overcome these limitations. We collect 56 papers that exemplify the current use of machine learning and big data to address the three identified gaps in the ecosystem service field. We find that although the use of these tools in ecosystem service research is relatively new, it is growing quickly. Big data can directly address data gaps, especially as new big data resources relevant to ecosystem service mapping become available (ex. social media data). Some properties of machine learning can also contribute to addressing data gaps in data sparse environments. Also, many machine learning algorithms can estimate and consider uncertainty, whereas big data can significantly increase sample size, reducing uncertainties in some situations. Some big data sources, like crowdsourced data, provide direct sources of social behaviors and preferences that relate to ecosystem service demand, thus allowing researchers to connect social and biophysical aspects of ecosystem services. Machine learning algorithms provide an effective and efficient tool for handling these large nonlinear socio-ecological datasets in tandem, giving researchers the ability to more realistically model and map ecosystem services without relying on oversimplified proxies or linear algorithms. Despite these opportunities, implementation is still lacking and limitations still hinder use
Towards sustainable palm oil production: The positive and negative impacts on ecosystem services and human wellbeing
Palm oil is an important commodity contributing to livelihoods of many communities, GDP of governments and the achievement of several sustainable development goals (SDG) including no poverty, zero hunger, and decent work and economic growth. However, its cultivation and continuous expansion due to high and increasing demand has led to many negative effects and subsequent calls to make production sustainable. To this end, information is needed to understand the negative and positive impacts on both the environment and human wellbeing to respond appropriately. Sustainability in palm oil trade entails having a global supply chain based on environmentally friendly and socially acceptable production and sourcing. Much has been done in understanding and responding to impacts on the environment but not so much on social impacts partly due to a lack of information. The direct (socio-economic) and indirect (through ecosystem services) impacts of palm oil trade were reviewed using peer-reviewed literature and the Environmental Justice Atlas (EJA). Our results show that most of the 57 case studies were conducted in Indonesia and Malaysia where 85% of global production of palm oil occurs. The results show both negative (109) and positive (99) direct impacts on humans. Indirect impacts through ecosystems services were predominantly negative (116) as were the direct negative impacts. The most frequently studied direct negative impacts were conflicts (25%), housing conditions (18%) and land grabbing (16%) while the most frequently studied direct positive impacts were income generation (33%) and employment (19%). Ongoing initiatives to make the palm oil sector sustainable such as the RSPO are focused on the environment but need to pay more attention to (related) social impacts. To make palm oil production sustainable and to meet SDGs such as ensuring healthy lives and promoting wellbeing as well as responsible consumption and production, negative social impacts of palm oil trade need to be addressed.</p
- âŠ