21 research outputs found
Improved standardization of transcribed digital specimen data
There are more than 1.2 billion biological specimens in the world's museums and herbaria. These objects are particularly important forms of biological sample and observation. They underpin biological taxonomy but the data they contain have many other uses in the biological and environmental sciences. Nevertheless, from their conception they are almost entirely documented on paper, either as labels attached to the specimens or in catalogues linked with catalogue numbers. In order to make the best use of these data and to improve the findability of these specimens, these data must be transcribed digitally and made to conform to standards, so that these data are also interoperable and reusable. Through various digitization projects, the authors have experimented with transcription by volunteers, expert technicians, scientists, commercial transcription services and automated systems. We have also been consumers of specimen data for taxonomical, biogeographical and ecological research. In this paper, we draw from our experiences to make specific recommendations to improve transcription data. The paper is split into two sections. We first address issues related to database implementation with relevance to data transcription, namely versioning, annotation, unknown and incomplete data and issues related to language. We then focus on particular data types that are relevant to biological collection specimens, namely nomenclature, dates, geography, collector numbers and uniquely identifying people. We make recommendations to standards organizations, software developers, data scientists and transcribers to improve these data with the specific aim of improving interoperability between collection datasets.Peer reviewe
Text Style Transfer Back-Translation
Back Translation (BT) is widely used in the field of machine translation, as
it has been proved effective for enhancing translation quality. However, BT
mainly improves the translation of inputs that share a similar style (to be
more specific, translation-like inputs), since the source side of BT data is
machine-translated. For natural inputs, BT brings only slight improvements and
sometimes even adverse effects. To address this issue, we propose Text Style
Transfer Back Translation (TST BT), which uses a style transfer model to modify
the source side of BT data. By making the style of source-side text more
natural, we aim to improve the translation of natural inputs. Our experiments
on various language pairs, including both high-resource and low-resource ones,
demonstrate that TST BT significantly improves translation performance against
popular BT benchmarks. In addition, TST BT is proved to be effective in domain
adaptation so this strategy can be regarded as a general data augmentation
method. Our training code and text style transfer model are open-sourced.Comment: acl2023, 14 pages, 4 figures, 19 table
Use of European open science cloud and national e-infrastructures for the long-term storage of digitised assets from natural history collections
Digitisation of Natural History Collections (NHC) has evolved from transcription of specimen catalogues in databases to web portals providing access to data, digital images, and 3D models of specimens. These portals increase global accessibility to specimens and help preserve the physical specimens by reducing their handling. The size of the NHC requires developing high-throughput digitisation workflows, as well as research into novel acquisition systems, image standardisation, curation, preservation, and publishing. Nowadays, herbarium sheet digitisation workflows (and fast digitisation stations) can digitise up to 6,000 specimens per day. Operating those digitisation stations in parallel, can increase the digitisation capacity. The high-resolution images obtained from these specimens, and their volume require substantial bandwidth, and disk space and tapes for storage of original digitised materials, as well as availability of computational processing resources for generating derivatives, information extraction, and publishing. While large institutions have dedicated digitisation teams that manage the whole workflow from acquisition to publishing, other institutions cannot dedicate resources to support all digitisation activities, in particular long-term storage. National and European e-infrastructures can provide an alternative solution by supporting different parts of the digitisation workflows. In the context of the Innovation and consolidation for large scale digitisation of natural heritage (ICEDIG Project 2018), three different e-infrastructures providing long-term storage have been analysed through three pilot studies: EUDAT-CINES, Zenodo, and National Infrastructures
Landscape Analysis for the Specimen Data Refinery
This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens
D3.2 DiSSCo Digitisation Guides Website - Consolidating Knowledge on Collections Mobilisation
In order to support the digitisation activities of DiSSCo, we have considered how best to prepare
collections for digitisation, digitise them, curate their associated data, publish those data, and
measure the outputs of projects and programmes. We have examined options and approaches for
different types and sizes of collections, when outsourcing should be considered, and what different
project management approaches are most appropriate in this range of circumstances.
This report describes the approach we have taken to developing an online community-edited
manual, our guidelines, other relevant resources and platforms, and a set of recommendations on
how to develop and this work to enhance future digitisation capacity across DiSSCo collectionholding organisations.info:eu-repo/semantics/publishedVersio
Novel Retinoic Acid Receptor Alpha Agonists for Treatment of Kidney Disease
Development of pharmacologic agents that protect podocytes from injury is a critical strategy for the treatment of kidney glomerular diseases. Retinoic acid reduces proteinuria and glomerulosclerosis in multiple animal models of kidney diseases. However, clinical studies are limited because of significant side effects of retinoic acid. Animal studies suggest that all trans retinoic acid (ATRA) attenuates proteinuria by protecting podocytes from injury. The physiological actions of ATRA are mediated by binding to all three isoforms of the nuclear retinoic acid receptors (RARs): RARα, RARβ, and RARγ. We have previously shown that ATRA exerts its renal protective effects mainly through the agonism of RARα. Here, we designed and synthesized a novel boron-containing derivative of the RARα-specific agonist Am580. This new derivative, BD4, binds to RARα receptor specifically and is predicted to have less toxicity based on its structure. We confirmed experimentally that BD4 binds to RARα with a higher affinity and exhibits less cellular toxicity than Am580 and ATRA. BD4 induces the expression of podocyte differentiation markers (synaptopodin, nephrin, and WT-1) in cultured podocytes. Finally, we confirmed that BD4 reduces proteinuria and improves kidney injury in HIV-1 transgenic mice, a model for HIV-associated nephropathy (HIVAN). Mice treated with BD4 did not develop any obvious toxicity or side effect. Our data suggest that BD4 is a novel RARα agonist, which could be used as a potential therapy for patients with kidney disease such as HIVAN
A Non-Iterative Method Combined with Neural Network Embedded in Physical Model to Solve the Imaging of Electromagnetic Inverse Scattering Problem
The main purpose of this paper is to solve the electromagnetic inverse scattering problem (ISP). Compared with conventional tomography technology, it considers the interaction between the internal structure of the scene and the electromagnetic wave in a more realistic manner. However, due to the nonlinearity of ISP, the conventional calculation scheme usually has some problems, such as the unsatisfactory imaging effect and high computational cost. To solve these problems and improve the imaging quality, this paper presents a simple method named the diagonal matrix inversion method (DMI) to estimate the distribution of scatterer contrast (DSC) and a Generative Adversarial Network (GAN) which could optimize the DSC obtained by DMI and make it closer to the real distribution of scatterer contrast. In order to make the distribution of scatterer contrast generated by GAN more accurate, the forward model is embedded in the GAN. Moreover, because of the existence of the forward model, not only is the DSC generated by the generator similar to the original distribution of the scatterer contrast in the numerical distribution, but the numerical of each point is also approximate to the original
A Non-Iterative Method Combined with Neural Network Embedded in Physical Model to Solve the Imaging of Electromagnetic Inverse Scattering Problem
The main purpose of this paper is to solve the electromagnetic inverse scattering problem (ISP). Compared with conventional tomography technology, it considers the interaction between the internal structure of the scene and the electromagnetic wave in a more realistic manner. However, due to the nonlinearity of ISP, the conventional calculation scheme usually has some problems, such as the unsatisfactory imaging effect and high computational cost. To solve these problems and improve the imaging quality, this paper presents a simple method named the diagonal matrix inversion method (DMI) to estimate the distribution of scatterer contrast (DSC) and a Generative Adversarial Network (GAN) which could optimize the DSC obtained by DMI and make it closer to the real distribution of scatterer contrast. In order to make the distribution of scatterer contrast generated by GAN more accurate, the forward model is embedded in the GAN. Moreover, because of the existence of the forward model, not only is the DSC generated by the generator similar to the original distribution of the scatterer contrast in the numerical distribution, but the numerical of each point is also approximate to the original
Automated Methods in Digitisation of Pinned Insects
Digitisation of natural history collections draws increasing attention. The digitised specimens not only facilitate the long-term preservation of biodiversity information but also boost the easy access and sharing of information. There are more than two billion specimens in the world’s natural history collections and pinned insect specimens compose of more than half of them (Tegelberg et al. 2014, Tegelberg et al. 2017). However, it is still a challenge to digitise pinned insect specimens with current state-of-art systems. The slowness of imaging pinned insects is due to the fact that they are essentially 3D objects and associated labels are pinned under the insect specimen. During the imaging process, the labels are often removed manually, which slows down the whole process. How can we avoid handling the labels pinned under often fragile and valuable specimens in order to increase the speed of digitsation?
In our work (Saarenmaa et al. 2019) for T3.1.2 task in the ICEDIG (https://www.icedig.eu) project, we first briefly reviewed the state-of-the-art approaches on small insect digitisation. Then recent promising technological advances on imaging were presented, some of which have not yet been used for insect digitisation. It seems that one single approach will not be enough to digitise all insect collections efficiently. The approach has to be optimized based on the features of the specimens and their associated labels. To obtain a breakthrough in insect digitisation, it is necessary to utilize a combination of existing and new technologies in novel workflows. To explore the options, we identified six approaches for digitising pinned insects with the goal of minimum manipulations of labels as follows.
Minimal labels: Image selected individual specimens without removing labels from the pin by using two cameras. This method suits for small insects with only one or a few well-spaced labels.
Multiple webcams: Similar to the minimal labels approach, but with multiple webcams at different positions. This has been implemented in a prototype system with 12 cameras (Hereld et al. 2017) and in the ALICE system with six DSLR cameras (Price et al. 2018).Â
Imaging of units: Similar to the multiple webcams approach, but image the entire unit (“Units” are small boxes or trays contained in drawers of collection cabinets, and are being used in most major insect collections).
Camera in robot arm: Image the individual specimen or the unit with the camera mounted at a robot arm to capture large number of images from different views.
Camera on rails: Similar to camera in robot arm approach, but the camera is mounted on rails to capture the unit. A 3D model of the insects and/or units can be created, and then labels are extracted. This is being prototyped by the ENTODIG-3D system (Ylinampa and Saarenmaa 2019).
Terahertz time-gated multispectral imaging: Image the individual specimen with terahertz time-gated multispectral imaging devices.
Experiments on selected approaches 2 and 5 are in progress and the preliminary results will be presented
The Biological Effect of Small Extracellular Vesicles on Colorectal Cancer Metastasis
Colorectal cancer (CRC) is a malignancy that seriously threatens human health, and metastasis from CRC is a major cause of death and poor prognosis for patients. Studying the potential mechanisms of small extracellular vesicles (sEVs) in tumor development may provide new options for early and effective diagnosis and treatment of CRC metastasis. In this review, we systematically describe how sEVs mediate epithelial mesenchymal transition (EMT), reconfigure the tumor microenvironment (TME), modulate the immune system, and alter vascular permeability and angiogenesis to promote CRC metastasis. We also discuss the current difficulties in studying sEVs and propose new ideas