725 research outputs found

    MaLA-500: Massive Language Adaptation of Large Language Models

    Full text link
    Large language models (LLMs) have advanced the state of the art in natural language processing. However, their predominant design for English or a limited set of languages creates a substantial gap in their effectiveness for low-resource languages. To bridge this gap, we introduce MaLA-500, a novel large language model designed to cover an extensive range of 534 languages. To train MaLA-500, we employ vocabulary extension and continued pretraining on LLaMA 2 with Glot500-c. Our intrinsic evaluation demonstrates that MaLA-500 is better at predicting the given texts of low-resource languages than existing multilingual LLMs. Moreover, the extrinsic evaluation of in-context learning shows that MaLA-500 outperforms previous LLMs on SIB200 and Taxi1500 by a significant margin, i.e., 11.68% and 4.82% marco-average accuracy across languages. We release MaLA-500 at https://huggingface.co/MaLA-L

    Electrophysiological and molecular genetic evidence for sympatrically occuring cryptic species in African weakly electric fishes (Teleostei : Mormyridae : Campylomormyrus)

    Get PDF
    For two sympatric species of African weakly electric fish, Campylomormyrus tamandua and Campylomormyrus numenius, we monitored ontogenetic differentiation in electric organ discharge (EOD) and established a molecular phylogeny, based on 2222 bp from cytochrome b, the S7 ribosomal protein gene, and four flanking regions of unlinked microsatellite loci. In C tamandua, there is one common EOD type, regardless of age and sex, whereas in C numenius we were able to identify three different male adult EOD waveform types, which emerged from a single common EOD observed in juveniles. Two of these EOD types formed well supported clades in our phylogenetic analysis. In an independent line of evidence, we were able to affirm the classification into three groups by microsatellite data. The correct assignment and the high pairwise FST values support our hypothesis that these groups are reproductively isolated. We propose that in C numenius there are cryptic species, hidden behind similar and, at least as juveniles, identical morphs. (c) 2005 Elsevier Inc. All rights reserved

    Adaptive radiation in African weakly electric fish (Teleostei : Mormyridae : Campylomormyrus): a combined molecular and morphological approach

    Get PDF
    We combined multiple molecular markers and geometric morphometrics to revise the current taxonomy and to build a phylogenetic hypothesis for the African weakly electric fish genus Campylomormyrus. Genetic data (2039 bp DNA sequence of mitochondrial cytochrome b and nuclear S7 genes) on 106 specimens support the existence of at least six species occurring in sympatry. We were able to further confirm these species by microsatellite analysis at 16 unlinked nuclear loci and landmark-based morphometrics. We assigned them to nominal taxa by comparisons to type specimens of all Campylomormyrus species recognized so far. Additionally, we showed that the shape of the elongated trunk-like snout is the major source of morphological differentiation among them. This finding suggests that the radiation of this speciose genus might have been driven by adaptation to different food sources

    Cross-Language Question Re-Ranking

    Full text link
    We study how to find relevant questions in community forums when the language of the new questions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space. The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.Comment: SIGIR-2017; Community Question Answering; Cross-language Approaches; Question Retrieval; Kernel-based Methods; Neural Networks; Distributed Representation

    Complex Network Approach for Recurrence Analysis of Time Series

    Full text link
    We propose a novel approach for analysing time series using complex network theory. We identify the recurrence matrix calculated from time series with the adjacency matrix of a complex network, and apply measures for the characterisation of complex networks to this recurrence matrix. By using the logistic map, we illustrate the potentials of these complex network measures for detecting dynamical transitions. Finally we apply the proposed approach to a marine palaeo-climate record and identify subtle changes of the climate regime.Comment: 23 pages, 7 figure

    The first Automatic Translation Memory Cleaning Shared Task

    Get PDF
    This is an accepted manuscript of an article published by Springer in Machine Translation on 21/01/2017, available online: https://doi.org/10.1007/s10590-016-9183-x The accepted version of the publication may differ from the final published version.This paper reports on the organization and results of the rst Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at nding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys

    Enhanced sensitivity to higher ozone in a pathogen-resistant tobacco cultivar

    Get PDF
    Investigations of the effects of elevated ozone (O3) on the virus–plant system were conducted to inform virus pathogen management strategies better. One susceptible cultivar of tobacco (Nicotiana tabacum L. cv. Yongding) and a resistant cultivar (Nicotiana tabacum L. cv. Vam) to Potato virus Y petiole necrosis strain (PVYN) infection were grown in open-top chambers under ambient and elevated O3 concentrations. Above-ground biomass, foliage chlorophyll, nitrogen and total non-structural carbohydrate (TNCs), soluble protein, total amino acid (TAA) and nicotine content, and peroxidase (POD) activity were measured to estimate the effects of elevated O3 on the impact of PVYN in the two cultivars. Results showed that under ambient O3, the resistant cultivar possessed greater biomass and a lower C/N ratio after infection than the susceptible cultivar; however, under elevated O3, the resistant cultivar lost its biomass advantage but maintained a lower C/N ratio. Variation of foliar POD activity could be explained as a resistance cost which was significantly correlated with biomass and C/N ratio of the tobacco cultivar. Chlorophyll content remained steady in the resistant cultivar but decreased significantly in the susceptible cultivar when stressors were applied. Foliar soluble protein and free amino acid content, which were related to resistance cost changes, are also discussed. This study indicated that a virus-resistant tobacco cultivar showed increased sensitivity to elevated O3 compared to a virus-sensitive cultivar
    corecore