12 research outputs found
Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages
We propose an efficient modeling framework for cross-lingual named entity
recognition in semi-structured text data. Our approach relies on both knowledge
distillation and consistency training. The modeling framework leverages
knowledge from a large language model (XLMRoBERTa) pre-trained on the source
language, with a student-teacher relationship (knowledge distillation). The
student model incorporates unsupervised consistency training (with KL
divergence loss) on the low-resource target language.
We employ two independent datasets of SMSs in English and Arabic, each
carrying semi-structured banking transaction information, and focus on
exhibiting the transfer of knowledge from English to Arabic. With access to
only 30 labeled samples, our model can generalize the recognition of merchants,
amounts, and other fields from English to Arabic. We show that our modeling
approach, while efficient, performs best overall when compared to
state-of-the-art approaches like DistilBERT pre-trained on the target language
or a supervised model directly trained on labeled data in the target language.
Our experiments show that it is enough to learn to recognize entities in
English to reach reasonable performance in a low-resource language in the
presence of a few labeled samples of semi-structured data. The proposed
framework has implications for developing multi-lingual applications,
especially in geographies where digital endeavors rely on both English and one
or more low-resource language(s), sometimes mixed with English or employed
singly.Comment: 5 pages, 3 figures. Presented at the SIGIR 2023 Workshop on Knowledge
Discovery from Unstructured Data in Financial Services (KDF
Machineâlearning approach for prediction of pT3a upstaging and outcomes of localized renal cell carcinoma ( UroCCR â15)
Objectives To assess the impact of pathological upstaging from clinically localized to locally advanced pT3a on survival in patients with renal cell carcinoma (RCC), as well as the oncological safety of various surgical approaches in this setting, and to develop a machineâlearningâbased, contemporary, clinically relevant model for individual preoperative prediction of pT3a upstaging. Materials and Methods Clinical data from patients treated with either partial nephrectomy (PN) or radical nephrectomy (RN) for cT1/cT2a RCC from 2000 to 2019, included in the French multiâinstitutional kidney cancer database UroCCR, were retrospectively analysed. Seven machineâlearning algorithms were applied to the cohort after a training/testing split to develop a predictive model for upstaging to pT3a. Survival curves for diseaseâfree survival (DFS) and overall survival (OS) rates were compared between PN and RN after Gâcomputation for pT3a tumours. Results A total of 4395 patients were included, among whom 667 patients (15%, 337 PN and 330 RN) had a pT3aâupstaged RCC. The UroCCRâ15 predictive model presented an area under the receiverâoperating characteristic curve of 0.77. Survival analysis after adjustment for confounders showed no difference in DFS or OS for PN vs RN in pT3a tumours (DFS: hazard ratio [HR] 1.08, P =â0.7; OS: HR 1.03, P >â0.9). Conclusions Our study shows that machineâlearning technology can play a useful role in the evaluation and prognosis of upstaged RCC. In the context of incidental upstaging, PN does not compromise oncological outcomes, even for large tumour sizes
Supplementary material 1 from: Moraes LJCL, Almeida AP, Fraga R, Rojas RR, Pirani RM, Silva AAA, de Carvalho VT, Gordo M, Werneck FP (2017) Integrative overview of the herpetofauna from Serra da Mocidade, a granitic mountain range in Northern Brazil . ZooKeys 715: 103-159. https://doi.org/10.3897/zookeys.715.20288
The Brazilian mountain ranges from the Guiana Shield highlands are largely unexplored, with an understudied herpetofauna. Here the amphibian and reptile species diversity of the remote Serra da Mocidade mountain range, located in extreme northern Brazil, is reported upon, and biogeographical affinities and taxonomic highlights are discussed. A 22-days expedition to this mountain range was undertaken during which specimens were sampled at four distinct altitudinal levels (600, 960, 1,060 and 1,365 m above sea level) using six complementary methods. Specimens were identified through an integrated approach that considered morphological, bioacoustical, and molecular analyses. Fifty-one species (23 amphibians and 28 reptiles) were found, a comparable richness to other mountain ranges in the region. The recorded assemblage showed a mixed compositional influence from assemblages typical of other mountain ranges and lowland forest habitats in the region. Most of the taxa occupying the Serra da Mocidade mountain range are typical of the Guiana Shield or widely distributed in the Amazon. Extensions of known distribution ranges and candidate undescribed taxa are also recorded. This is the first herpetofaunal expedition that accessed the higher altitudinal levels of this mountain range, contributing to the basic knowledge of these groups in remote areas