725 research outputs found
MaLA-500: Massive Language Adaptation of Large Language Models
Large language models (LLMs) have advanced the state of the art in natural
language processing. However, their predominant design for English or a limited
set of languages creates a substantial gap in their effectiveness for
low-resource languages. To bridge this gap, we introduce MaLA-500, a novel
large language model designed to cover an extensive range of 534 languages. To
train MaLA-500, we employ vocabulary extension and continued pretraining on
LLaMA 2 with Glot500-c. Our intrinsic evaluation demonstrates that MaLA-500 is
better at predicting the given texts of low-resource languages than existing
multilingual LLMs. Moreover, the extrinsic evaluation of in-context learning
shows that MaLA-500 outperforms previous LLMs on SIB200 and Taxi1500 by a
significant margin, i.e., 11.68% and 4.82% marco-average accuracy across
languages. We release MaLA-500 at https://huggingface.co/MaLA-L
Electrophysiological and molecular genetic evidence for sympatrically occuring cryptic species in African weakly electric fishes (Teleostei : Mormyridae : Campylomormyrus)
For two sympatric species of African weakly electric fish, Campylomormyrus tamandua and Campylomormyrus numenius, we monitored ontogenetic differentiation in electric organ discharge (EOD) and established a molecular phylogeny, based on 2222 bp from cytochrome b, the S7 ribosomal protein gene, and four flanking regions of unlinked microsatellite loci. In C tamandua, there is one common EOD type, regardless of age and sex, whereas in C numenius we were able to identify three different male adult EOD waveform types, which emerged from a single common EOD observed in juveniles. Two of these EOD types formed well supported clades in our phylogenetic analysis. In an independent line of evidence, we were able to affirm the classification into three groups by microsatellite data. The correct assignment and the high pairwise FST values support our hypothesis that these groups are reproductively isolated. We propose that in C numenius there are cryptic species, hidden behind similar and, at least as juveniles, identical morphs. (c) 2005 Elsevier Inc. All rights reserved
Adaptive radiation in African weakly electric fish (Teleostei : Mormyridae : Campylomormyrus): a combined molecular and morphological approach
We combined multiple molecular markers and geometric morphometrics to revise the current taxonomy and to build a phylogenetic hypothesis for the African weakly electric fish genus Campylomormyrus. Genetic data (2039 bp DNA sequence of mitochondrial cytochrome b and nuclear S7 genes) on 106 specimens support the existence of at least six species occurring in sympatry. We were able to further confirm these species by microsatellite analysis at 16 unlinked nuclear loci and landmark-based morphometrics. We assigned them to nominal taxa by comparisons to type specimens of all Campylomormyrus species recognized so far. Additionally, we showed that the shape of the elongated trunk-like snout is the major source of morphological differentiation among them. This finding suggests that the radiation of this speciose genus might have been driven by adaptation to different food sources
Cross-Language Question Re-Ranking
We study how to find relevant questions in community forums when the language
of the new questions is different from that of the existing questions in the
forum. In particular, we explore the Arabic-English language pair. We compare a
kernel-based system with a feed-forward neural network in a scenario where a
large parallel corpus is available for training a machine translation system,
bilingual dictionaries, and cross-language word embeddings. We observe that
both approaches degrade the performance of the system when working on the
translated text, especially the kernel-based system, which depends heavily on a
syntactic kernel. We address this issue using a cross-language tree kernel,
which compares the original Arabic tree to the English trees of the related
questions. We show that this kernel almost closes the performance gap with
respect to the monolingual system. On the neural network side, we use the
parallel corpus to train cross-language embeddings, which we then use to
represent the Arabic input and the English related questions in the same space.
The results also improve to close to those of the monolingual neural network.
Overall, the kernel system shows a better performance compared to the neural
network in all cases.Comment: SIGIR-2017; Community Question Answering; Cross-language Approaches;
Question Retrieval; Kernel-based Methods; Neural Networks; Distributed
Representation
Complex Network Approach for Recurrence Analysis of Time Series
We propose a novel approach for analysing time series using complex network
theory. We identify the recurrence matrix calculated from time series with the
adjacency matrix of a complex network, and apply measures for the
characterisation of complex networks to this recurrence matrix. By using the
logistic map, we illustrate the potentials of these complex network measures
for detecting dynamical transitions. Finally we apply the proposed approach to
a marine palaeo-climate record and identify subtle changes of the climate
regime.Comment: 23 pages, 7 figure
The first Automatic Translation Memory Cleaning Shared Task
This is an accepted manuscript of an article published by Springer in Machine Translation on 21/01/2017, available online: https://doi.org/10.1007/s10590-016-9183-x
The accepted version of the publication may differ from the final published version.This paper reports on the organization and results of the rst Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at nding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys
Enhanced sensitivity to higher ozone in a pathogen-resistant tobacco cultivar
Investigations of the effects of elevated ozone (O3) on the virus–plant system were conducted to inform virus pathogen management strategies better. One susceptible cultivar of tobacco (Nicotiana tabacum L. cv. Yongding) and a resistant cultivar (Nicotiana tabacum L. cv. Vam) to Potato virus Y petiole necrosis strain (PVYN) infection were grown in open-top chambers under ambient and elevated O3 concentrations. Above-ground biomass, foliage chlorophyll, nitrogen and total non-structural carbohydrate (TNCs), soluble protein, total amino acid (TAA) and nicotine content, and peroxidase (POD) activity were measured to estimate the effects of elevated O3 on the impact of PVYN in the two cultivars. Results showed that under ambient O3, the resistant cultivar possessed greater biomass and a lower C/N ratio after infection than the susceptible cultivar; however, under elevated O3, the resistant cultivar lost its biomass advantage but maintained a lower C/N ratio. Variation of foliar POD activity could be explained as a resistance cost which was significantly correlated with biomass and C/N ratio of the tobacco cultivar. Chlorophyll content remained steady in the resistant cultivar but decreased significantly in the susceptible cultivar when stressors were applied. Foliar soluble protein and free amino acid content, which were related to resistance cost changes, are also discussed. This study indicated that a virus-resistant tobacco cultivar showed increased sensitivity to elevated O3 compared to a virus-sensitive cultivar
- …