10 research outputs found

    Sabi\'a: Portuguese Large Language Models

    Full text link
    As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models already extensively trained on diverse corpora. More specifically, we further pretrain GPT-J and LLaMA models on Portuguese texts using 3% or less of their original pretraining budget. Few-shot evaluations on Poeta, a suite of 14 Portuguese datasets, reveal that our models outperform English-centric and multilingual counterparts by a significant margin. Our best model, Sabi\'a-65B, performs on par with GPT-3.5-turbo. By evaluating on datasets originally conceived in the target language as well as translated ones, we study the contributions of language-specific pretraining in terms of 1) capturing linguistic nuances and structures inherent to the target language, and 2) enriching the model's knowledge about a domain or culture. Our results indicate that the majority of the benefits stem from the domain-specific knowledge acquired through monolingual pretraining

    BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams

    Full text link
    One common trend in recent studies of language models (LMs) is the use of standardized tests for evaluation. However, despite being the fifth most spoken language worldwide, few such evaluations have been conducted in Portuguese. This is mainly due to the lack of high-quality datasets available to the community for carrying out evaluations in Portuguese. To address this gap, we introduce the Brazilian Leading Universities Entrance eXams (BLUEX), a dataset of entrance exams from the two leading universities in Brazil: UNICAMP and USP. The dataset includes annotated metadata for evaluating the performance of NLP models on a variety of subjects. Furthermore, BLUEX includes a collection of recently administered exams that are unlikely to be included in the training data of many popular LMs as of 2023. The dataset is also annotated to indicate the position of images in each question, providing a valuable resource for advancing the state-of-the-art in multimodal language understanding and reasoning. We describe the creation and characteristics of BLUEX and establish a benchmark through experiments with state-of-the-art LMs, demonstrating its potential for advancing the state-of-the-art in natural language understanding and reasoning in Portuguese. The data and relevant code can be found at https://github.com/Portuguese-Benchmark-Datasets/BLUE

    BIODIGESTOR PARA O GÁS DO LIXO ORGÂNICO

    Get PDF
    Este artigo tem como objetivo apresentar uma alternativa de reutilização do gás produzido pelo lixo, por meio da construção de um biodigestor, visando a preservação do meio ambiente por meio de energia renovável. A energia produzida por esse sistema, obtida da decomposição do lixo orgânico, é o biogás, formado por gases, tais como o metano (CH4) e o dióxido de carbono (CO2). Esse experimento verifica a possibilidade de utilizar o gás metano como alternativa para o funcionamento de um fogão doméstico

    Cardiomiopatia de Takotsubo: uma breve revisão sistemática: Takotsubo Cardiomiopathy: a brief systematic review

    Get PDF
    A cardiomiopatia de Takotsubo é uma nova cardiomiopatia que foi noticiada pela primeira vez em 2001. A doença é definida por disfunção reversível do ventrículo esquerdo e manifesta-se normalmente como uma síndrome coronária aguda. Este estudo teve como objetivo discutir as principais características da cardiomiopatia takotsubo. Para isso, foi desenvolvida uma revisão sistemática de literatura, recorrendo-se às bases de dados Scielo, Medline e Lilacs, selecionando-se estudos publicados nos últimos 5 anos. A partir da análise e interpretação dos dados das fontes foi possível concluir que a cardiomiopatia de takotsubo é desencadeada pelo estresse físico e é vista como uma complicação para outras doenças não cardíacas, se apresentando geralmente em mulheres pós-menopausa, acima dos 70 anos de idade. Manifesta-se com início súbito de dor torácica e dispneia, após um evento emocional estressante que precede o início dos sintomas. Com principais consequências, tem-se choque cardiogênico, obstrução da via de saída do ventrículo esquerdo, trombo da parede ventricular esquerda, arritmias ventriculares, ruptura da parede ventricular e paragem cardíaca, registrando-se alguns casos de morte súbita

    Diagnóstico diferencial da Síndrome de Takotsubo e infarto agudo do miocárdio: uma revisão sistemática: Differential diagnosis of Takotsubo Syndrome and acute myocardial infarction: a systematic review

    Get PDF
    A cardiomiopatia de Takotsubo e o infarto agudo do miocárdio compartilham apresentação clínica e risco de morte semelhantes, embora uma das diferenças mais importantes seja a ausência de doença coronariana obstrutiva na cardiomiopatia de Takotsubo. Neste estudo, tem-se como objetivo analisar a literatura disponível avaliando o diagnóstico diferencial entre pacientes com CTT em comparação com pacientes com infarto agudo do miocárdio. Para isso, foi realizada uma revisão sistemática, utilizando-se a Pubmed e a Medline como base de dados. A partir da análise dos estudos e interpretação de suas principais descobertas, concluiu-se que para pacientes com CTT, outras condições e comorbidades, em vez de apenas dislipidemia e/ou outros fatores de risco estabelecidos, sejam responsáveis por um risco de morte comparável ao de IAM. No entanto, as conclusões desse estudo têm várias limitaçõe

    Optimization of Skid Trails and Log Yards on the Amazon Forest

    No full text
    Research highlights: We used Dijkstra Algorithm (DA) to define optimal allocation of yards in order to minimize total skid-trail’s distance in the Amazon Forest. DA minimized trails’ distances and associated transportation costs, leading to an even smaller value when the current planning was disregarded and suggesting the reduction of deleterious environmental externalities. Background and objectives: We sought to answer if it is possible to optimize distances and intrinsic costs in the management of Amazonian forests using DA. The objective was to minimize skid trails distances by best allocating yards using DA and to compare four scenarios of forest harvest planning in the Brazilian Amazon. Materials and methods: Tree census data from Gênesis-Salém Farm, state of Pará, Brazil, were used. The yards and roads located by Grupo Arboris (scenario 1) were compared to three alternative scenarios in terms of total skid distance, trails and road densities, and skidding costs for three successive harvests, seeking to minimize total skid-trails’ distance. Alternative scenarios were to keep the number of yards within work units (WU) and place them in the edge of existing roads (scenario 2); keep the number of yards within each WU (scenario 3); and place 23 yards, disregarding the current planning (scenario 4). Results: Total skid-trail’s distance, number of trees above optimal extraction distance and densities of skid trails and roads were smaller in scenarios 2, 3, and 4, compared to the current yard allocation (scenario 1). Scenario 4, with fewer restrictions, reduced skid-trails’ distances by 23%. Harvest costs decreased from scenario 1 to 4 in all three harvest cycles. Conclusions: DA allowed optimized distribution of yards and skid trails and generated efficient results for harvest planning. This reinforces the importance of optimized planning, which establishes satisfactory results in the effort to reduce costs and environmental impact keeping high efficiency

    NEOTROPICAL ALIEN MAMMALS: a data set of occurrence and abundance of alien mammals in the Neotropics

    No full text
    Biological invasion is one of the main threats to native biodiversity. For a species to become invasive, it must be voluntarily or involuntarily introduced by humans into a nonnative habitat. Mammals were among first taxa to be introduced worldwide for game, meat, and labor, yet the number of species introduced in the Neotropics remains unknown. In this data set, we make available occurrence and abundance data on mammal species that (1) transposed a geographical barrier and (2) were voluntarily or involuntarily introduced by humans into the Neotropics. Our data set is composed of 73,738 historical and current georeferenced records on alien mammal species of which around 96% correspond to occurrence data on 77 species belonging to eight orders and 26 families. Data cover 26 continental countries in the Neotropics, ranging from Mexico and its frontier regions (southern Florida and coastal-central Florida in the southeast United States) to Argentina, Paraguay, Chile, and Uruguay, and the 13 countries of Caribbean islands. Our data set also includes neotropical species (e.g., Callithrix sp., Myocastor coypus, Nasua nasua) considered alien in particular areas of Neotropics. The most numerous species in terms of records are from Bos sp. (n = 37,782), Sus scrofa (n = 6,730), and Canis familiaris (n = 10,084); 17 species were represented by only one record (e.g., Syncerus caffer, Cervus timorensis, Cervus unicolor, Canis latrans). Primates have the highest number of species in the data set (n = 20 species), partly because of uncertainties regarding taxonomic identification of the genera Callithrix, which includes the species Callithrix aurita, Callithrix flaviceps, Callithrix geoffroyi, Callithrix jacchus, Callithrix kuhlii, Callithrix penicillata, and their hybrids. This unique data set will be a valuable source of information on invasion risk assessments, biodiversity redistribution and conservation-related research. There are no copyright restrictions. Please cite this data paper when using the data in publications. We also request that researchers and teachers inform us on how they are using the data

    Brazilian Flora 2020: Leveraging the power of a collaborative scientific network

    No full text
    International audienceThe shortage of reliable primary taxonomic data limits the description of biological taxa and the understanding of biodiversity patterns and processes, complicating biogeographical, ecological, and evolutionary studies. This deficit creates a significant taxonomic impediment to biodiversity research and conservation planning. The taxonomic impediment and the biodiversity crisis are widely recognized, highlighting the urgent need for reliable taxonomic data. Over the past decade, numerous countries worldwide have devoted considerable effort to Target 1 of the Global Strategy for Plant Conservation (GSPC), which called for the preparation of a working list of all known plant species by 2010 and an online world Flora by 2020. Brazil is a megadiverse country, home to more of the world's known plant species than any other country. Despite that, Flora Brasiliensis, concluded in 1906, was the last comprehensive treatment of the Brazilian flora. The lack of accurate estimates of the number of species of algae, fungi, and plants occurring in Brazil contributes to the prevailing taxonomic impediment and delays progress towards the GSPC targets. Over the past 12 years, a legion of taxonomists motivated to meet Target 1 of the GSPC, worked together to gather and integrate knowledge on the algal, plant, and fungal diversity of Brazil. Overall, a team of about 980 taxonomists joined efforts in a highly collaborative project that used cybertaxonomy to prepare an updated Flora of Brazil, showing the power of scientific collaboration to reach ambitious goals. This paper presents an overview of the Brazilian Flora 2020 and provides taxonomic and spatial updates on the algae, fungi, and plants found in one of the world's most biodiverse countries. We further identify collection gaps and summarize future goals that extend beyond 2020. Our results show that Brazil is home to 46,975 native species of algae, fungi, and plants, of which 19,669 are endemic to the country. The data compiled to date suggests that the Atlantic Rainforest might be the most diverse Brazilian domain for all plant groups except gymnosperms, which are most diverse in the Amazon. However, scientific knowledge of Brazilian diversity is still unequally distributed, with the Atlantic Rainforest and the Cerrado being the most intensively sampled and studied biomes in the country. In times of “scientific reductionism”, with botanical and mycological sciences suffering pervasive depreciation in recent decades, the first online Flora of Brazil 2020 significantly enhanced the quality and quantity of taxonomic data available for algae, fungi, and plants from Brazil. This project also made all the information freely available online, providing a firm foundation for future research and for the management, conservation, and sustainable use of the Brazilian funga and flora
    corecore