Search CORE

5 research outputs found

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

Author: Jeronymo Vitor
Lotufo Roberto
Nogueira Rodrigo
Publication venue
Publication date: 28/03/2023
Field of study

This paper reports on a study of cross-lingual information retrieval (CLIR) using the mT5-XXL reranker on the NeuCLIR track of TREC 2022. Perhaps the biggest contribution of this study is the finding that despite the mT5 model being fine-tuned only on query-document pairs of the same language it proved to be viable for CLIR tasks, where query-document pairs are in different languages, even in the presence of suboptimal first-stage retrieval performance. The results of the study show outstanding performance across all tasks and languages, leading to a high number of winning positions. Finally, this study provides valuable insights into the use of mT5 in CLIR tasks and highlights its potential as a viable solution. For reproduction refer to https://github.com/unicamp-dl/NeuCLIR22-mT

arXiv.org e-Print Archive

In Defense of Cross-Encoders for Zero-Shot Retrieval

Author: Abonizio Hugo
Bonifacio Luiz
Fadaee Marzieh
Jeronymo Vitor
Lotufo Roberto
Nogueira Rodrigo
Rosa Guilherme
Publication venue
Publication date: 12/12/2022
Field of study

Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines. In this work we study the generalization ability of these two types of architectures on a wide range of parameter count on both in-domain and out-of-domain scenarios. We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models. Our experiments show that increasing model size results in marginal gains on in-domain test sets, but much larger gains in new domains never seen during fine-tuning. Furthermore, we show that cross-encoders largely outperform bi-encoders of similar size in several tasks. In the BEIR benchmark, our largest cross-encoder surpasses a state-of-the-art bi-encoder by more than 4 average points. Finally, we show that using bi-encoders as first-stage retrievers provides no gains in comparison to a simpler retriever such as BM25 on out-of-domain tasks. The code is available at https://github.com/guilhermemr04/scaling-zero-shot-retrieval.gitComment: arXiv admin note: substantial text overlap with arXiv:2206.0287

arXiv.org e-Print Archive

InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval

Author: Abonizio Hugo
Bonifacio Luiz
Fadaee Marzieh
Jeronymo Vitor
Lotufo Roberto
Nogueira Rodrigo
Zavrel Jakub
Publication venue
Publication date: 26/05/2023
Field of study

Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents. These synthetic query-document pairs can then be used to train a retriever. However, InPars and, more recently, Promptagator, rely on proprietary LLMs such as GPT-3 and FLAN to generate such datasets. In this work we introduce InPars-v2, a dataset generator that uses open-source LLMs and existing powerful rerankers to select synthetic query-document pairs for training. A simple BM25 retrieval pipeline followed by a monoT5 reranker finetuned on InPars-v2 data achieves new state-of-the-art results on the BEIR benchmark. To allow researchers to further improve our method, we open source the code, synthetic data, and finetuned models: https://github.com/zetaalphavector/inPars/tree/master/tp

arXiv.org e-Print Archive

No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval

Author: Abonizio Hugo
Bonifacio Luiz
Fadaee Marzieh
Jeronymo Vitor
Lotufo Roberto
Nogueira Rodrigo
Rosa Guilherme Moraes
Publication venue
Publication date: 12/12/2022
Field of study

Recent work has shown that small distilled language models are strong competitors to models that are orders of magnitude larger and slower in a wide range of information retrieval tasks. This has made distilled and dense models, due to latency constraints, the go-to choice for deployment in real-world retrieval applications. In this work, we question this practice by showing that the number of parameters and early query-document interaction play a significant role in the generalization ability of retrieval models. Our experiments show that increasing model size results in marginal gains on in-domain test sets, but much larger gains in new domains never seen during fine-tuning. Furthermore, we show that rerankers largely outperform dense ones of similar size in several tasks. Our largest reranker reaches the state of the art in 12 of the 18 datasets of the Benchmark-IR (BEIR) and surpasses the previous state of the art by 3 average points. Finally, we confirm that in-domain effectiveness is not a good indicator of zero-shot effectiveness. Code is available at https://github.com/guilhermemr04/scaling-zero-shot-retrieval.gi

arXiv.org e-Print Archive

Catálogo Taxonômico da Fauna do Brasil: setting the baseline knowledge on the animal diversity in Brazil

Author: Abbate Daniel
Abreu Júnior Edson F de
Adriano Edson A
Agne Carlos EQ
Agrain Federico A
Agudo-Padrón Aisur I
Akama Alberto
Albornoz Pablo CL
Albuquerque Flávio
Ale-Rocha Rosaly
Aleixo Alexandre LP
Almeida Ana CS
Almeida Julia C
Almeida Juliana C
Almeida Lucia M
Almeida Sérgio M de
Almeida Thaís M de
Aloquio Sergio
Alvarenga Thiago M
Alves Paulo P
Alvim Juliana
Amaral Antonia CZ
Amaral Diogo C
Amaral Fabio R
Amaral Vanessa S do
Ament Danilo C
Amorim Dalton de S
Andena Sérgio R
Andrade Andrey J de
Andrade Cristiano L
Andrade Manuella FG
Andrade Sonia CS
Andriolo Artur
Anichtchenko Alexander
Aquino Daniel A
Aragão Allan C
Araujo Enilma M de
Araujo Rodrigo de O
Araújo Marcel S de
Araújo Paula B
Araújo Rodrigo C
Arruda Eliane P de
Arruda Janine O
Avendaño Jose M
Azevedo Celso O
Azevedo Leonardo HG
Barbo Fausto E
Barbosa Diego N
Barbosa Felipe F
Barbosa Julianna F
Barbosa Marina F de C
Barboza Carlos A de M
Barreto Carlos
Barreto Luana B
Barros Luana M
Barros Lurdiana D
Barros Rodolfo C de
Barros Ávyla R de A
Bartholomay Pedro R
Bartlett Charles R
Bartz Marie LC
Barão Kim R
Basílio Daniel S
Baêta Délio
Becerril María de los AM
Bellini Bruno C
Benaim Natalia P
Beneti Julia S
Benetti Cesar J
Bento Matheus
Bená Daniela de C
Bernardi Leopoldo FO
Bertaco Vinicius de A
Bevilaqua Marcus
Bezerra Alexandra MR
Bezerra Luis EA
Bicho Carla de L
Biffi Gabriel
Birindelli José LO
Bitencourt Jamille de A
Bizarro Gabriel L
Boeger Walter A
Boldrini Rafael
Boll Piter K
Bologna Marco A
Bonecker Claudia C
Bonecker Sergio LC
Bonvicino Cibele
Borges Michela
Borges-Martins Márcio
Botelho Marcia JC
Botero Juan P
Brandão Simone N
Brescovit Antonio
Brito Ayrton do R
Brito Guilherme RR
Brito Rosangela
Britto Marcelo R de
Brown George
Brugnera Ricaro
Bueno Sergio L de S
Bulirsch Petr
Burbano Alejandro L
Burckhardt Daniel
Bérnils Renato S
Caetano Carlos HS
Caires Rodrigo A
Calor Adolfo R
Camargo Alexssandro
Camargos Lucas M de
Campião Karla M
Campos Luiz A
Canedo Clarissa
Capellari Renato S
Caramaschi Ulisses
Carbayo Fernando
Cardenas Melissa Q
Cardoso Irene A
Carmignotto Ana P
Carneiro Eduardo
Caron Edilson
Carrenho Renan
Carvalho Filho Fernando
Carvalho Claudio JB de
Carvalho Edrielly
Carvalho Gervásio S
Carvalho Leonardo S
Carvalho Marcelo R de
Carvalho Pedro F de
Carvalho Thiago R de
Carvalho Tiago P
Casagrande Mirna M
Castilho Raphael de C
Castro Elizeu B de
Castro Luiz AS de
Castro-Guedes Camila F de
Cavalcanti Fernanda F
Cavallari Daniel C
Cavichioli Rodney R
Chagas Júnior Amazonas
Chandler Donald S
Chao Ning L
Cherem Jorge J
Cherman Mariana A
Chiquito Elisandra A
Christo Susete W
Christofersen Martin L
Clarkson Bruno
Coelho Paulo RS
Cohen Simone C
Cohn-Haft Mario
Colley Eduardo
Colombo Wesley D
Colpani Daniara
Constantin Robert
Constantino Reginaldo
Contreras Eugenia F
Cordeiro Danilo P
Cordeiro Ralf TS
Corgosinho Paulo HC
Correia Maira A
Corrêa Caio CD
Costa Cleide
Costa Paulo MS
Costa Sávio C
Costa Valmir A
Costa-Silva Vinicius da
Couri Márcia S
Cruz Luiza S da
Cunha Amanda
Cunha Carlo M
Cunha Suzan BZ
Cupello Mario
Câmara Josenir T
Dalapicolla Jeronymo
Dario Fabio Di
Degallier Nicolas
Delabie Jacques HC
Dellapé Pablo M
Demite Peterson R
Dias Cristina de O
Dias Fernando MS
Dias Ricardo M
Digiani Celina
Digiani María C
Dios Rodrigo de VP
Dolibaina Diego R
Domahovski Alexandre C
Domingues Marcus V
Donahue Julian P
Dornellas Ana PS
Duarte José M Barbanti
Duarte Marcelo
Duarte Mércia E
Duarte Paulo RM
Duarte Tácio
Dumas Leandro L
Dávila Stephane
Eizirik Eduardo
Escalona Hermes E
Espíndola Vinicius C
Esteves André M
Evangelista Olivia
Fachin Diego A
Falaschi Rafaela L
Farache Fernando HA
Farias Júnior José AS
Feitosa Rodrigo M
Felippe-Bauer Maria L
Felix Márcio
Ferla Noeli J
Fernandes Agatha CS
Fernandes André S
Fernandes Daniell RR
Fernandes Itanna O
Fernandes Jose AM
Fernandez Monica A
Ferraz Bernardo R
Ferreira Júnior Augusto L
Ferreira Júnior Nelson
Ferreira André da S
Ferreira Denise NM
Ferreira Vinicius S
Ferrer Juliano
Flechtmann Carlos HW
Flores Gustavo E
Forzza Rafaela C
Frable Benjamin W
Franco Francisco L
Franz Ismael
Freitas Thales RO
Gadelha Sian de S
Galati Eunice AB
Gallardo Fabiana
Galvão Filho Hilton de C
Galvão Cleber
Gama Emanuel RR
Garbino Guilherme ST
Garcia Deivys MA
Garcia Paulo C de A
Garda Adrian A
Gardner Scott L
Garraffoni André RS
Gerstmeier Roland
Gibson David I
Gil-Santana Helcio R
Gnaspini Pedro
Godoi Fábio SP de
Golfetti Ivan F
Gomes Renata S
Gomes Suzete R
Gondim Anne I
Gonçalves Clayton C
Gonçalves Igor de S
Gonçalves Julia P
Gonçalves Pablo R
Gottschalk Marco S
Graciolli Gustavo
Grazia Jocélia
Gregorin Renato
Grossi Paschoal
Gudin Filipe M
Guedes Reinaldo C
Haddad Célio FB
Hamada Neusa
Heleodoro Raphael A
Henriques Augusto L
Henriques-Oliveira Ana L
Henry Thomas
Hermes Marcel G
Hernandes Fabio A
Hrycyna Gabriela
Hutchings Roger W
Hutchings Rosa SG
Hájek Jiri
Iniesta Luiz FM
Ivie Michael A
Jacinavicius Fernando de C
Johann Liana
Johnsson Rodrigo
Justino Cíntia EL
Justo Marcia CN
Kawada Ricardo
Kitahara Marcelo V
Knoff Marcelo
Kohler Julia
Krolow Tiago K
Kuabara Kamila MD
Kury Adriano B
Köhler Andreas
Labruna Marcelo B
Lamas Carlos JE
Lansac-Tôha Fábio A
Lara Rogéria IR
Leal Sebastián A Muñoz
Lecci Lucas
Lees Alexander C
Leite Yuri LR
Leivas Fernando
Leschen Richard AB
Letana Sócrates D
Leviski Gabriela
Lhano Marcos G
Libardi Gustavo S
Lima Aurino F de
Lima Lucas RC
Lima Luciano
Lima Patricia OV
Lima Sheila P
Lima Tarcilla C de
Lima Élison FB
Linardi Pedro M
Linzmeier Adelita M
Loeb Marina V
Lofego Antônio C
Lopez Guilherme EL
Lotufo Tito M da C
Loureiro Lourdes MA El-moor
Lourido Gilcélia M
Lucatelli Débora
Lucena Carlos AS de
Lucena Daercio A de A
Lucinda Paulo
Maccagnan Douglas HB
Macedo Antonio
Machado Fabrizio M
Machado Renato JP
Madeira-Ott Taís
Maia Valéria C
Malabarba Luiz R
Manfio Daiara
Mantellato Fernando
Marceniuk Alexandre P
Marconato Gláucia
Margaría Cecilia B
Maria Tatiana F
Marinho Thais A
Marinoni Luciane
Marques Antonio C
Marques Dayse WA
Marques Rodrigo C
Marques Taísa
Martin João PI
Martins André L
Martins Caleb C
Martins Inês X
Martins Luciana R
Martins Marlúcia B
Martins Maurício L
Martins Thiago F
Mascarenhas Carolina S
Mathis Wayne N
Mattox George MT
Mauro Fabio
Maurício Giovanni N
McElrath Thomas C
McHugh Joe
Mello Francisco de AG de
Mello Ramon JCL
Melo Francisco T de V
Melo Marcelo RS de
Mendes Amanda C
Menezes Aleksandra
Menezes Naércio A
Mermudes José RM
Mielke Carlos GC
Mielke Olaf HH
Migotto Alvaro E
Mincarone Michael M
Miranda Gil FG
Miranda Lucília S
Miranda Marcel S
Miranda Thaís P
Miyahira Igor C
Molin Anamaria Dal
Molina Flavio B
Monné Marcela L
Monné Miguel A
Montingelli Giovanna G
Moraes Gilberto J de
Morandini André C
Moreira Felipe FF
Moreira Gilson RP
Morse Geoffrey E
Morselli João P
Moura Luciano de A
Moura Rafael B de
Muniz David B
Muricy Guilherme
Naka Luciano N
Narita João P
Nascimento Elynton A do
Nascimento Fabio O do
Nascimento Francisco E de L
Nascimento Karine B
Nascimento Maria C do
Neves Gilmar P
Newton Alfred F
Nishiyama Eric Y
Nogueira David S
Nogueira Marcelo
Ochoa Ronald
Oliveira Junior Evaldo C de
Oliveira Bruno G de
Oliveira Cléo DC de
Oliveira Francisco L de
Oliveira Ismael B de
Oliveira Ivo de S
Oliveira Jader de
Oliveira Jéssica P
Oliveira Livia de M
Oliveira Marcio L de
Oliveira Otto MP
Oliveira Sarah S de
Olmos Fabio
Orlandin Elton
Ovando Ximena MC
Pacheco José F
Pacheco Thaynara L
Padula Vinícius
Paiva Sandra V
Paladini Andressa
Pardiñas Ulyses FJ
Passos Flávio D
Passos Maria I da S dos
Passos Paulo GH
Patton James L
Paula Alexandre S de
Pavan Ana C
Peck Stewart B
Pecly Nathalia H
Pedro Natan C
Pellens Roseli
Percequillo Alexandre R
Pereira Cristiano M
Pereira Edson HL
Pereira Rachel MM
Pereira Thalles PL
Pereira-Colavite Alessandre
Perioto Nelson W
Peronti Ana LBG
Pes Ana MO
Piacentini Vítor de Q
Pikart Tiago G
Pimenta Alexandre D
Pinheiro Francielly F
Pinheiro Ulisses dos S
Pinho Luiz C
Pinto Hudson A
Pinto Roberta R
Pinto Ângelo P
Piovesan Mônica
Pitombo Fabio B
Polizei Thiago TS
Pollock Darren A
Porto Tiago J
Posso Sergio R
Powell Gareth S
Praciano Daniel L
Prado Daniel de C Schelesky
Prado Joyce R do
Prando Jadila S
Proctor Heather C
Prudente Ana L
Pujol-Luz José R
Puker Anderson
Pádua Diego G de
Queiroz Dalva L de
Queiroz Dalva L de
Queiroz Gabriel C
Querino Ranyse B
Quijano Freddy RB
Quintino Hingrid YS
Rafael José A
Rainho Hugo L
Ramos Robson T da C
Razzolini Emanuel
Reategui Natália S
Reis Roberto E dos
Rezende José M
Ribeiro Felipe B
Ribeiro José R Inácio
Ribeiro Síria LB
Ribeiro-Costa Cibele S
Riccardi Paula R
Robbins Robert
Rocha Carlos
Rocha Leonardo SG
Rocha Matheus dos S
Rocha Rosana M da
Rodrigues Diego F
Rodrigues Higor DD
Rodrigues Patrícia E da S
Rodrigues Thaiana
Roell Talita
Romera Bárbara LV
Rondón Antonio AA
Rosa Brunno B
Rosa Simone P
Rossi Rogério V
Roth Paulo R de O
Roza André S
Rueda-Ramírez Diana
Ruschel Tatiana P
S. Campos-Filho Ivanklin
Salik Lucy M
Salles Anna C de A
Salles Frederico F
Sallum Maria AM
Salvador Rodrigo B
Salvatierra Lidianne
Sampaio Brunno HL
Sampaio Cristiano
Sandoval-Gómez Vivian E
Santana Carlos DCM de
Santarém Maria CA
Santis Marcelo D de
Santos Adalberto J
Santos Aline B dos
Santos Allan PM
Santos Bernardo F
Santos Charles MD dos
Santos Cinthya SG
Santos Cláudia P
Santos Eduardo F dos
Santos Fábio L dos
Santos Geane B dos
Santos Jandir C
Santos Josenilson R dos
Santos Juliano F dos
Santos Luziany Q
Santos Marcus TT
Santos Maria EA dos
Santos Paula B dos
Santos Sabrina R dos
Santos Sandra D dos
Santos Sandro
Santos Sonia B dos
Santos Érika CL dos
Sant’Anna Vivianne B de
Schmitz Hermes J
Schoeninger Karine
Schunck Fabio
Schwertner Cristiano F
Segadilha Juliana L
Segura Melissa O
Sekerka Lukas
Sepulveda Tatiana A
Serejo Cristiana S
Shibatta Oscar A
Shimabukuro Paloma HF
Shimbori Eduardo M
Shockley Floyd W
Siewert Ricardo
Silva Júnior Nelson J da
Silva Neto Alberto M da
Silva Aline AS da
Silva Darliane E
Silva Fenanda S
Silva Guilherme L da
Silva Janaina M
Silva Paula KR
Silva Rafael APF
Silva Rafaela A da
Silva Ricardo B
Silva Vera C
Silva Vinícius B da
Silva Vitor CP da
Silveira Luis F
Silveira Luiz FL
Simone Luiz RL
Singer Randal A
Skoracki Maciej
Slipinski Adam
Slobodian Verônica
Soares Karla
Somavilla Alexandre
Sousa Ana AT de
Sousa Laura D do NM de
Sousa Viviane R
Souto Camilla
Souza Diego de S
Souza Leonardo S de
Souza Wesley de O
Souza-Dias Pedro GB
Specht Alexandre
Spiessberger Erich L
Stampar Sérgio N
Straube Fernando C
Suárez-Morales Eduardo
Takiya Daniela M
Tassi Aline D
Tavares Marcelo T
Tavares Marcos DS
Tavares Valeria
Teixeira Joyce A
Teixeira Larissa
Tencatt Luiz FC
Terossi Mariana
Thiengo Silvana C
Tidon Rosana
Toledo Daniela GP de
Tomaszewska Wioletta
Tourinho Ana L
Trevine Vivian
Uchoa Manoel A
Valderrama Jefferson S
Valim Michel P
Varella Henrique R
Vasconcelos Beatriz MC de
Vaz Stéphanie
Vaz-de-Mello Fernando Z
Velazco Paul
Viegas Eduarda FG
Vieira Leandro M
Vieira Letícia M
Waichert Cecilia
Wengrat Ana PG da Silva
Wingert Juliana M
Wolff Vera R dos S
Wosiacki Wolmar B
Zacca Thamara
Zaher Hussam
Zanella Fernando CV
Zilch Kássia
Zimbrão Geraldo
Publication venue: FapUNIFESP (SciELO)
Publication date: 26/08/2024
Field of study

The limited temporal completeness and taxonomic accuracy of species lists, made available in a traditional manner in scientific publications, has always represented a problem. These lists are invariably limited to a few taxonomic groups and do not represent up-to-date knowledge of all species and classifications. In this context, the Brazilian megadiverse fauna is no exception, and the Catálogo Taxonômico da Fauna do Brasil (CTFB) (http://fauna.jbrj.gov.br/), made public in 2015, represents a database on biodiversity anchored on a list of valid and expertly recognized scientific names of animals in Brazil. The CTFB is updated in near real time by a team of more than 800 specialists. By January 1, 2024, the CTFB compiled 133,691 nominal species, with 125,138 that were considered valid. Most of the valid species were arthropods (82.3%, with more than 102,000 species) and chordates (7.69%, with over 11,000 species). These taxa were followed by a cluster composed of Mollusca (3,567 species), Platyhelminthes (2,292 species), Annelida (1,833 species), and Nematoda (1,447 species). All remaining groups had less than 1,000 species reported in Brazil, with Cnidaria (831 species), Porifera (628 species), Rotifera (606 species), and Bryozoa (520 species) representing those with more than 500 species. Analysis of the CTFB database can facilitate and direct efforts towards the discovery of new species in Brazil, but it is also fundamental in providing the best available list of valid nominal species to users, including those in science, health, conservation efforts, and any initiative involving animals. The importance of the CTFB is evidenced by the elevated number of citations in the scientific literature in diverse areas of biology, law, anthropology, education, forensic science, and veterinary science, among others

E-space: Manchester Metropolitan University's Research Repository