4 research outputs found

    BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling

    Full text link
    The pre-training of large language models usually requires massive amounts of resources, both in terms of computation and data. Frequently used web sources such as Common Crawl might contain enough noise to make this pre-training sub-optimal. In this work, we experiment with different sampling methods from the Spanish version of mC4, and present a novel data-centric technique which we name perplexity sampling\textit{perplexity sampling} that enables the pre-training of language models in roughly half the amount of steps and using one fifth of the data. The resulting models are comparable to the current state-of-the-art, and even achieve better results for certain tasks. Our work is proof of the versatility of Transformers, and paves the way for small teams to train their models on a limited budget. Our models are available at this \href\href{https://huggingface.co/bertin-project}{URL}.Comment: Published at Procesamiento del Lenguaje Natura

    Geoeconomic variations in epidemiology, ventilation management, and outcomes in invasively ventilated intensive care unit patients without acute respiratory distress syndrome: a pooled analysis of four observational studies

    Get PDF
    Background: Geoeconomic variations in epidemiology, the practice of ventilation, and outcome in invasively ventilated intensive care unit (ICU) patients without acute respiratory distress syndrome (ARDS) remain unexplored. In this analysis we aim to address these gaps using individual patient data of four large observational studies. Methods: In this pooled analysis we harmonised individual patient data from the ERICC, LUNG SAFE, PRoVENT, and PRoVENT-iMiC prospective observational studies, which were conducted from June, 2011, to December, 2018, in 534 ICUs in 54 countries. We used the 2016 World Bank classification to define two geoeconomic regions: middle-income countries (MICs) and high-income countries (HICs). ARDS was defined according to the Berlin criteria. Descriptive statistics were used to compare patients in MICs versus HICs. The primary outcome was the use of low tidal volume ventilation (LTVV) for the first 3 days of mechanical ventilation. Secondary outcomes were key ventilation parameters (tidal volume size, positive end-expiratory pressure, fraction of inspired oxygen, peak pressure, plateau pressure, driving pressure, and respiratory rate), patient characteristics, the risk for and actual development of acute respiratory distress syndrome after the first day of ventilation, duration of ventilation, ICU length of stay, and ICU mortality. Findings: Of the 7608 patients included in the original studies, this analysis included 3852 patients without ARDS, of whom 2345 were from MICs and 1507 were from HICs. Patients in MICs were younger, shorter and with a slightly lower body-mass index, more often had diabetes and active cancer, but less often chronic obstructive pulmonary disease and heart failure than patients from HICs. Sequential organ failure assessment scores were similar in MICs and HICs. Use of LTVV in MICs and HICs was comparable (42\ub74% vs 44\ub72%; absolute difference \u20131\ub769 [\u20139\ub758 to 6\ub711] p=0\ub767; data available in 3174 [82%] of 3852 patients). The median applied positive end expiratory pressure was lower in MICs than in HICs (5 [IQR 5\u20138] vs 6 [5\u20138] cm H2O; p=0\ub70011). ICU mortality was higher in MICs than in HICs (30\ub75% vs 19\ub79%; p=0\ub70004; adjusted effect 16\ub741% [95% CI 9\ub752\u201323\ub752]; p<0\ub70001) and was inversely associated with gross domestic product (adjusted odds ratio for a US$10 000 increase per capita 0\ub780 [95% CI 0\ub775\u20130\ub786]; p<0\ub70001). Interpretation: Despite similar disease severity and ventilation management, ICU mortality in patients without ARDS is higher in MICs than in HICs, with a strong association with country-level economic status. Funding: No funding

    Neotropical freshwater fisheries : A dataset of occurrence and abundance of freshwater fishes in the Neotropics

    No full text
    The Neotropical region hosts 4225 freshwater fish species, ranking first among the world's most diverse regions for freshwater fishes. Our NEOTROPICAL FRESHWATER FISHES data set is the first to produce a large-scale Neotropical freshwater fish inventory, covering the entire Neotropical region from Mexico and the Caribbean in the north to the southern limits in Argentina, Paraguay, Chile, and Uruguay. We compiled 185,787 distribution records, with unique georeferenced coordinates, for the 4225 species, represented by occurrence and abundance data. The number of species for the most numerous orders are as follows: Characiformes (1289), Siluriformes (1384), Cichliformes (354), Cyprinodontiformes (245), and Gymnotiformes (135). The most recorded species was the characid Astyanax fasciatus (4696 records). We registered 116,802 distribution records for native species, compared to 1802 distribution records for nonnative species. The main aim of the NEOTROPICAL FRESHWATER FISHES data set was to make these occurrence and abundance data accessible for international researchers to develop ecological and macroecological studies, from local to regional scales, with focal fish species, families, or orders. We anticipate that the NEOTROPICAL FRESHWATER FISHES data set will be valuable for studies on a wide range of ecological processes, such as trophic cascades, fishery pressure, the effects of habitat loss and fragmentation, and the impacts of species invasion and climate change. There are no copyright restrictions on the data, and please cite this data paper when using the data in publications

    Geoeconomic variations in epidemiology, ventilation management, and outcomes in invasively ventilated intensive care unit patients without acute respiratory distress syndrome: a pooled analysis of four observational studies

    No full text
    Background: Geoeconomic variations in epidemiology, the practice of ventilation, and outcome in invasively ventilated intensive care unit (ICU) patients without acute respiratory distress syndrome (ARDS) remain unexplored. In this analysis we aim to address these gaps using individual patient data of four large observational studies. Methods: In this pooled analysis we harmonised individual patient data from the ERICC, LUNG SAFE, PRoVENT, and PRoVENT-iMiC prospective observational studies, which were conducted from June, 2011, to December, 2018, in 534 ICUs in 54 countries. We used the 2016 World Bank classification to define two geoeconomic regions: middle-income countries (MICs) and high-income countries (HICs). ARDS was defined according to the Berlin criteria. Descriptive statistics were used to compare patients in MICs versus HICs. The primary outcome was the use of low tidal volume ventilation (LTVV) for the first 3 days of mechanical ventilation. Secondary outcomes were key ventilation parameters (tidal volume size, positive end-expiratory pressure, fraction of inspired oxygen, peak pressure, plateau pressure, driving pressure, and respiratory rate), patient characteristics, the risk for and actual development of acute respiratory distress syndrome after the first day of ventilation, duration of ventilation, ICU length of stay, and ICU mortality. Findings: Of the 7608 patients included in the original studies, this analysis included 3852 patients without ARDS, of whom 2345 were from MICs and 1507 were from HICs. Patients in MICs were younger, shorter and with a slightly lower body-mass index, more often had diabetes and active cancer, but less often chronic obstructive pulmonary disease and heart failure than patients from HICs. Sequential organ failure assessment scores were similar in MICs and HICs. Use of LTVV in MICs and HICs was comparable (42·4% vs 44·2%; absolute difference -1·69 [-9·58 to 6·11] p=0·67; data available in 3174 [82%] of 3852 patients). The median applied positive end expiratory pressure was lower in MICs than in HICs (5 [IQR 5-8] vs 6 [5-8] cm H2O; p=0·0011). ICU mortality was higher in MICs than in HICs (30·5% vs 19·9%; p=0·0004; adjusted effect 16·41% [95% CI 9·52-23·52]; p<0·0001) and was inversely associated with gross domestic product (adjusted odds ratio for a US$10 000 increase per capita 0·80 [95% CI 0·75-0·86]; p<0·0001). Interpretation: Despite similar disease severity and ventilation management, ICU mortality in patients without ARDS is higher in MICs than in HICs, with a strong association with country-level economic status
    corecore