15 research outputs found
Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets
Masader (Alyafeai et al., 2021) created a metadata structure to be used for
cataloguing Arabic NLP datasets. However, developing an easy way to explore
such a catalogue is a challenging task. In order to give the optimal experience
for users and researchers exploring the catalogue, several design and user
experience challenges must be resolved. Furthermore, user interactions with the
website may provide an easy approach to improve the catalogue. In this paper,
we introduce Masader Plus, a web interface for users to browse Masader. We
demonstrate data exploration, filtration, and a simple API that allows users to
examine datasets from the backend. Masader Plus can be explored using this link
https://arbml.github.io/masader. A video recording explaining the interface can
be found here https://www.youtube.com/watch?v=SEtdlSeqchk
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Large language models (LLMs) have been shown to be able to perform new tasks
based on a few demonstrations or natural language instructions. While these
capabilities have led to widespread adoption, most LLMs are developed by
resource-rich organizations and are frequently kept from the public. As a step
towards democratizing this powerful technology, we present BLOOM, a
176B-parameter open-access language model designed and built thanks to a
collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer
language model that was trained on the ROOTS corpus, a dataset comprising
hundreds of sources in 46 natural and 13 programming languages (59 in total).
We find that BLOOM achieves competitive performance on a wide variety of
benchmarks, with stronger results after undergoing multitask prompted
finetuning. To facilitate future research and applications using LLMs, we
publicly release our models and code under the Responsible AI License