Search CORE

21 research outputs found

Improved standardization of transcribed digital specimen data

Author: Dillen Mathias
Groom Quentin
Hardy Helen
Phillips Sarah
Willemse Luc
Wu Zhengzhe
Publication venue
Publication date: 09/12/2019
Field of study

There are more than 1.2 billion biological specimens in the world's museums and herbaria. These objects are particularly important forms of biological sample and observation. They underpin biological taxonomy but the data they contain have many other uses in the biological and environmental sciences. Nevertheless, from their conception they are almost entirely documented on paper, either as labels attached to the specimens or in catalogues linked with catalogue numbers. In order to make the best use of these data and to improve the findability of these specimens, these data must be transcribed digitally and made to conform to standards, so that these data are also interoperable and reusable. Through various digitization projects, the authors have experimented with transcription by volunteers, expert technicians, scientists, commercial transcription services and automated systems. We have also been consumers of specimen data for taxonomical, biogeographical and ecological research. In this paper, we draw from our experiences to make specific recommendations to improve transcription data. The paper is split into two sections. We first address issues related to database implementation with relevance to data transcription, namely versioning, annotation, unknown and incomplete data and issues related to language. We then focus on particular data types that are relevant to biological collection specimens, namely nomenclature, dates, geography, collector numbers and uniquely identifying people. We make recommendations to standards organizations, software developers, data scientists and transcribers to improve these data with the specific aim of improving interoperability between collection datasets.Peer reviewe

Natural History Museum Repository

British Library (BL) Shared Research Repository

Helsingin yliopiston digitaalinen arkisto

Text Style Transfer Back-Translation

Author: Chen Xiaoyu
Guo Jiaxin
Li Zongyao
Shang Hengchao
Wang Minghan
Wei Daimeng
Wu Zhanglin
Yang Hao
Yu Zhengzhe
Publication venue
Publication date: 02/06/2023
Field of study

Back Translation (BT) is widely used in the field of machine translation, as it has been proved effective for enhancing translation quality. However, BT mainly improves the translation of inputs that share a similar style (to be more specific, translation-like inputs), since the source side of BT data is machine-translated. For natural inputs, BT brings only slight improvements and sometimes even adverse effects. To address this issue, we propose Text Style Transfer Back Translation (TST BT), which uses a style transfer model to modify the source side of BT data. By making the style of source-side text more natural, we aim to improve the translation of natural inputs. Our experiments on various language pairs, including both high-resource and low-resource ones, demonstrate that TST BT significantly improves translation performance against popular BT benchmarks. In addition, TST BT is proved to be effective in domain adaptation so this strategy can be regarded as a general data augmentation method. Our training code and text style transfer model are open-sourced.Comment: acl2023, 14 pages, 4 figures, 19 table

arXiv.org e-Print Archive

Use of European open science cloud and national e-infrastructures for the long-term storage of digitised assets from natural history collections

Author: Agosti Donat
Cazenave Nicholas
Dillen Mathias
Nielsen Lars H.
Nieva De La Hidalga Abraham
Wu Zhengzhe
Publication venue: Pensoft
Publication date: 01/01/2019
Field of study

Digitisation of Natural History Collections (NHC) has evolved from transcription of specimen catalogues in databases to web portals providing access to data, digital images, and 3D models of specimens. These portals increase global accessibility to specimens and help preserve the physical specimens by reducing their handling. The size of the NHC requires developing high-throughput digitisation workflows, as well as research into novel acquisition systems, image standardisation, curation, preservation, and publishing. Nowadays, herbarium sheet digitisation workflows (and fast digitisation stations) can digitise up to 6,000 specimens per day. Operating those digitisation stations in parallel, can increase the digitisation capacity. The high-resolution images obtained from these specimens, and their volume require substantial bandwidth, and disk space and tapes for storage of original digitised materials, as well as availability of computational processing resources for generating derivatives, information extraction, and publishing. While large institutions have dedicated digitisation teams that manage the whole workflow from acquisition to publishing, other institutions cannot dedicate resources to support all digitisation activities, in particular long-term storage. National and European e-infrastructures can provide an alternative solution by supporting different parts of the digitisation workflows. In the context of the Innovation and consolidation for large scale digitisation of natural heritage (ICEDIG Project 2018), three different e-infrastructures providing long-term storage have been analysed through three pilot studies: EUDAT-CINES, Zenodo, and National Infrastructures

Online Research @ Cardiff

ZENODO

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Landscape Analysis for the Specimen Data Refinery

Author: Bánki Olaf
Cubey Robert
Drinkwater Robyn
Englund Markus
Goble Carole
Groom Quentin
Kermorvant Christopher
Livermore Laurence
Rey Isabel
Santos Celia
Scott Ben
Walton Stephanie
Williams Alan
Wu Zhengzhe
Publication venue
Publication date: 01/01/2020
Field of study

This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens

ZENODO

The University of Manchester - Institutional Repository

Digital.CSIC

ARPHA OAI-PMH Endpoint

ARPHA Preprints

D3.2 DiSSCo Digitisation Guides Website - Consolidating Knowledge on Collections Mobilisation

Author: Arsénio Pedro
Berger Frederik
Bogaerts Ann
Cubey Robert
De Smedt Sofie
Drinkwater Robyn
Figueira Rui
French Lisa
Hardy Helen
Haston Elspeth
Koivunen Anne
Livermore Laurence
Livermore Laurence
Piirainen Esko
Smith Vincent
von Mering Sabine
Wing Peter
Wu Zhengzhe
Publication venue
Publication date: 01/01/2022
Field of study

In order to support the digitisation activities of DiSSCo, we have considered how best to prepare collections for digitisation, digitise them, curate their associated data, publish those data, and measure the outputs of projects and programmes. We have examined options and approaches for different types and sizes of collections, when outsourcing should be considered, and what different project management approaches are most appropriate in this range of circumstances. This report describes the approach we have taken to developing an online community-edited manual, our guidelines, other relevant resources and platforms, and a set of recommendations on how to develop and this work to enhance future digitisation capacity across DiSSCo collectionholding organisations.info:eu-repo/semantics/publishedVersio

UTL Repository

Novel Retinoic Acid Receptor Alpha Agonists for Treatment of Kidney Disease

Author: A Suzuki
AJ Collins
BC Das
BC Das
BC Das
BC Das
Bhaskar Das
C Delescluse
C Merlet-Benichou
CM Wyatt
DM Benbrook
DM Benbrook
G Perez de Lema
GM Lucas
H de The
I Lehrke
I Torregroza
Irina Agoulnik
J Look
J Wagner
J Zhong
JC He
JC He
John Cijiang He
KA Koehler
KK Ratnam
KM Waugh
L Barisoni
L Barisoni
M Schaier
M Sunamoto
M Takemoto
MI Dawson
MR Vaughan
P Montesinos
Peter Chuang
PT Jubinsky
Q Xu
Ruijie Liu
SJ Shankland
SY Han
T Asano
TC Lu
TL Fevig
Todd Evans
TR Evans
V D'Agati
V Moreno-Manzano
V Vuligonda
W Yang
Yibang Chen
Yifei Zhong
Yingwei Wu
Zhengzhe Li
Publication venue: Public Library of Science
Publication date
Field of study

Development of pharmacologic agents that protect podocytes from injury is a critical strategy for the treatment of kidney glomerular diseases. Retinoic acid reduces proteinuria and glomerulosclerosis in multiple animal models of kidney diseases. However, clinical studies are limited because of significant side effects of retinoic acid. Animal studies suggest that all trans retinoic acid (ATRA) attenuates proteinuria by protecting podocytes from injury. The physiological actions of ATRA are mediated by binding to all three isoforms of the nuclear retinoic acid receptors (RARs): RARα, RARβ, and RARγ. We have previously shown that ATRA exerts its renal protective effects mainly through the agonism of RARα. Here, we designed and synthesized a novel boron-containing derivative of the RARα-specific agonist Am580. This new derivative, BD4, binds to RARα receptor specifically and is predicted to have less toxicity based on its structure. We confirmed experimentally that BD4 binds to RARα with a higher affinity and exhibits less cellular toxicity than Am580 and ATRA. BD4 induces the expression of podocyte differentiation markers (synaptopodin, nephrin, and WT-1) in cultured podocytes. Finally, we confirmed that BD4 reduces proteinuria and improves kidney injury in HIV-1 transgenic mice, a model for HIV-associated nephropathy (HIVAN). Mice treated with BD4 did not develop any obvious toxicity or side effect. Our data suggest that BD4 is a novel RARα agonist, which could be used as a potential therapy for patients with kidney disease such as HIVAN

Crossref

PubMed Central

A Non-Iterative Method Combined with Neural Network Embedded in Physical Model to Solve the Imaging of Electromagnetic Inverse Scattering Problem

Author: Hongsheng Wu
Liang Guo
Xuhu Ren
Zhengzhe Li
Publication venue: 'MDPI AG'
Publication date: 14/12/2021
Field of study

The main purpose of this paper is to solve the electromagnetic inverse scattering problem (ISP). Compared with conventional tomography technology, it considers the interaction between the internal structure of the scene and the electromagnetic wave in a more realistic manner. However, due to the nonlinearity of ISP, the conventional calculation scheme usually has some problems, such as the unsatisfactory imaging effect and high computational cost. To solve these problems and improve the imaging quality, this paper presents a simple method named the diagonal matrix inversion method (DMI) to estimate the distribution of scatterer contrast (DSC) and a Generative Adversarial Network (GAN) which could optimize the DSC obtained by DMI and make it closer to the real distribution of scatterer contrast. In order to make the distribution of scatterer contrast generated by GAN more accurate, the forward model is embedded in the GAN. Moreover, because of the existence of the forward model, not only is the DSC generated by the generator similar to the original distribution of the scatterer contrast in the numerical distribution, but the numerical of each point is also approximate to the original

Multidisciplinary Digital Publishing Institute

A Non-Iterative Method Combined with Neural Network Embedded in Physical Model to Solve the Imaging of Electromagnetic Inverse Scattering Problem

Author: Hongsheng Wu
Liang Guo
Xuhu Ren
Zhengzhe Li
Publication venue: MDPI AG
Publication date: 01/12/2021
Field of study

Directory of Open Access Journals

Automated Methods in Digitisation of Pinned Insects

Author: Kahanpää Jere
Koivunen Anne
Saarenmaa Hannu
Sihvonen Pasi
Wu Zhengzhe
Publication venue: Pensoft Publishers
Publication date: 01/01/2019
Field of study

Digitisation of natural history collections draws increasing attention. The digitised specimens not only facilitate the long-term preservation of biodiversity information but also boost the easy access and sharing of information. There are more than two billion specimens in the world’s natural history collections and pinned insect specimens compose of more than half of them (Tegelberg et al. 2014, Tegelberg et al. 2017). However, it is still a challenge to digitise pinned insect specimens with current state-of-art systems. The slowness of imaging pinned insects is due to the fact that they are essentially 3D objects and associated labels are pinned under the insect specimen. During the imaging process, the labels are often removed manually, which slows down the whole process. How can we avoid handling the labels pinned under often fragile and valuable specimens in order to increase the speed of digitsation? In our work (Saarenmaa et al. 2019) for T3.1.2 task in the ICEDIG (https://www.icedig.eu) project, we first briefly reviewed the state-of-the-art approaches on small insect digitisation. Then recent promising technological advances on imaging were presented, some of which have not yet been used for insect digitisation. It seems that one single approach will not be enough to digitise all insect collections efficiently. The approach has to be optimized based on the features of the specimens and their associated labels. To obtain a breakthrough in insect digitisation, it is necessary to utilize a combination of existing and new technologies in novel workflows. To explore the options, we identified six approaches for digitising pinned insects with the goal of minimum manipulations of labels as follows. Minimal labels: Image selected individual specimens without removing labels from the pin by using two cameras. This method suits for small insects with only one or a few well-spaced labels. Multiple webcams: Similar to the minimal labels approach, but with multiple webcams at different positions. This has been implemented in a prototype system with 12 cameras (Hereld et al. 2017) and in the ALICE system with six DSLR cameras (Price et al. 2018). Imaging of units: Similar to the multiple webcams approach, but image the entire unit (“Units” are small boxes or trays contained in drawers of collection cabinets, and are being used in most major insect collections). Camera in robot arm: Image the individual specimen or the unit with the camera mounted at a robot arm to capture large number of images from different views. Camera on rails: Similar to camera in robot arm approach, but the camera is mounted on rails to capture the unit. A 3D model of the insects and/or units can be created, and then labels are extracted. This is being prototyped by the ENTODIG-3D system (Ylinampa and Saarenmaa 2019). Terahertz time-gated multispectral imaging: Image the individual specimen with terahertz time-gated multispectral imaging devices. Experiments on selected approaches 2 and 5 are in progress and the preliminary results will be presented

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints

The Biological Effect of Small Extracellular Vesicles on Colorectal Cancer Metastasis

Author: Defa Huang
Jiyang Wu
Tianyu Zhong
Xiaomei Yi
Xiaoxing Wang
Zhengzhe Li
Publication venue: MDPI AG
Publication date: 01/12/2022
Field of study

Colorectal cancer (CRC) is a malignancy that seriously threatens human health, and metastasis from CRC is a major cause of death and poor prognosis for patients. Studying the potential mechanisms of small extracellular vesicles (sEVs) in tumor development may provide new options for early and effective diagnosis and treatment of CRC metastasis. In this review, we systematically describe how sEVs mediate epithelial mesenchymal transition (EMT), reconfigure the tumor microenvironment (TME), modulate the immune system, and alter vascular permeability and angiogenesis to promote CRC metastasis. We also discuss the current difficulties in studying sEVs and propose new ideas

Directory of Open Access Journals

PubMed Central