Search CORE

12 research outputs found

La logique algorithmique confrontée à l'organisation de l'administration publique française

Author: Pistilli Giada
Publication venue
Publication date: 01/01/2021
Field of study

Cet article montre comment la logique algorithmique d’un agent conversationnel peut aider l’organisation des connaissances au sein d’une organisation de l’administration publique française, notamment une collectivité territoriale. Par le bias d’une recherche sur le terrain, je cherche à montrer comment il existe deux différentes adoptions de la technologie de la part de l’administration publique : une complexifiante et une simplifiante

PhilPapers

What lies behind AGI: ethical concerns related to LLMs

Author: Pistilli Giada
Publication venue
Publication date: 01/01/2022
Field of study

This paper opens the philosophical debate around the notion of Artificial General Intelligence (AGI) and its application in Large Language Models (LLMs). Through the lens of moral philosophy, the paper raises questions about these AI systems' capabilities and goals, the treatment of humans behind them, and the risk of perpetuating a monoculture through language

PhilPapers

Revues Scientifiques Marocaines

Stronger Together: on the Articulation of Ethical Charters, Legal Tools, and Technical Documentation in ML

Author: Ferrandis Carlos Munoz
Jernite Yacine
Mitchell Margaret
Pistilli Giada
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/05/2023
Field of study

The growing need for accountability of the people behind AI systems can be addressed by leveraging processes in three fields of study: ethics, law, and computer science. While these fields are often considered in isolation, they rely on complementary notions in their interpretation and implementation. In this work, we detail this interdependence and motivate the necessary role of collaborative governance tools in shaping a positive evolution of AI. We first contrast notions of compliance in the ethical, legal, and technical fields; we outline both their differences and where they complement each other, with a particular focus on the roles of ethical charters, licenses, and technical documentation in these interactions. We then focus on the role of values in articulating the synergies between the fields and outline specific mechanisms of interaction between them in practice. We identify how these mechanisms have played out in several open governance fora: an open collaborative workshop, a responsible licensing initiative, and a proposed regulatory framework. By leveraging complementary notions of compliance in these three domains, we can create a more comprehensive framework for governing AI systems that jointly takes into account their technical capabilities, their impact on society, and how technical specifications can inform relevant regulations. Our analysis thus underlines the necessity of joint consideration of the ethical, legal, and technical in AI ethics frameworks to be used on a larger scale to govern AI systems and how the thinking in each of these areas can inform the others

arXiv.org e-Print Archive

Debating AI in Archaeology: applications, implications, and ethical considerations

Author: Bransden Alex
Pistilli Giada
Shenfield Alex
Tenzer Martina
Publication venue: Council for British Archaeology
Publication date: 29/02/2024
Field of study

Artificial Intelligence (AI) is not a recent development. However, with increasing computational capabilities, AI has developed into Natural Language Processing and Machine Learning, technologies particularly good at detecting correlations and patterns, and categorising, predicting, or extracting information. Within archaeology, AI can process big data accumulated over decades of research and deposited in archives. By combining these capabilities, AI offers new insights and exciting opportunities to create knowledge from archaeological archives for contemporary and future research. However, the ethical implications and human costs are not yet fully understood. Therefore, we question whether AI in archaeology is a blessing or a curse

Sheffield Hallam University Research Archive

BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model

Author: Akiki Christopher
Gallé Matthias
Ilić Suzana
Jernite Yacine
Mieskes Margot
Pistilli Giada
Wolf Thomas
Publication venue
Publication date: 09/12/2022
Field of study

The BigScience Workshop was a value-driven initiative that spanned one and half years of interdisciplinary research and culminated in the creation of ROOTS, a 1.6TB multilingual dataset that was used to train BLOOM, one of the largest multilingual language models to date. In addition to the technical outcomes and artifacts, the workshop fostered multidisciplinary collaborations around large models, datasets, and their analysis. This in turn led to a wide range of research publications spanning topics from ethics to law, data governance, modeling choices and distributed training. This paper focuses on the collaborative research aspects of BigScience and takes a step back to look at the challenges of large-scale participatory research, with respect to participant diversity and the tasks required to successfully carry out such a project. Our main goal is to share the lessons we learned from this experience, what we could have done better and what we did well. We show how the impact of such a social approach to scientific research goes well beyond the technical artifacts that were the basis of its inception.Comment: Presented at the 2022 NeurIPS Workshop on Broadening Research Collaborations in M

arXiv.org e-Print Archive

The Ghost in the Machine has an American accent: value conflict in GPT-3

Author: Bertulfo Donald Jay
Dias Duran Leslye Denisse
Johnson Rebecca
Kalpokiene Julija
Menedez-Gonzalez Natalia
Panai Enrico
Pistilli Giada
Publication venue
Publication date
Field of study

The alignment problem in the context of large language models must consider the plurality of human values in our world. Whilst there are many resonant and overlapping values amongst the world’s cultures, there are also many conflicting, yet equally valid, values. It is important to observe which cultural values a model exhibits, particularly when there is a value conflict between input prompts and generated outputs. We discuss how the co- creation of language and cultural value impacts large language models (LLMs). We explore the constitution of the training data for GPT-3 and compare that to the world’s language and internet access demographics, as well as to reported statistical profiles of dominant values in some Nation-states. We stress tested GPT-3 with a range of value-rich texts representing several languages and nations; including some with values orthogonal to dominant US public opinion as reported by the World Values Survey. We observed when values embedded in the input text were mutated in the generated outputs and noted when these conflicting values were more aligned with reported dominant US values. Our discussion of these results uses a moral value pluralism (MVP) lens to better understand these value mutations. Finally, we provide recommendations for how our work may contribute to other current work in the field

PhilPapers

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Author: :
Abdollahi Arezoo
Abdulmumin Idris
Abrar Nafis
Adelani David Ifeoluwa
Aghagol Arash
Aji Alham Fikri
Ajibade Benjamin
Akiki Christopher
Akinlolu Martha
Al-shaibani Maged S.
Albanie Samuel
Alfassy Amit
Alizadeh Samira
allal Loubna Ben
Almubarak Khalid
Altay Gabriel
Alyafeai Zaid
Ammanamanchi Pawan Sasanka
Amuok Priscilla
An Ran
Antverg Omer
Bach Stephen H.
Bajaj Yash Shailesh
Bamberger Zachary
Bari M Saiful
Barth Fabio
Baruwa Ahmed
Bawden Rachel
Baylor Emi
Bayrak Giyaseddin
Behroozi Bahareh
Beilharz Benjamin
Bekman Stas
Belinkov Yonatan
Belkada Younes
Bello Imane
Beltagy Iz
Ben-David Srulik
Benyamina Hamza
Bers Tali
Bharati Sushil
Bhattacharjee Joydeep
Bhattacharya Indrani
Biderman Stella
Bogdanov Eli
Bommasani Rishi
Bose Shamik
Bourfoune Hatim
Bras Mathilde
Brito Caio
Broad Nicholas Michio
Brody Shaked
Bulchandani Lokesh
Burns Gully
Burynok Mykola
Cahyawijaya Samuel
Callahan Alison
Canalli Rodrigo
Carpuat Marine
Casper Jared
Castagné Roman
Castillo Maria A
Chaffin Antoine
Chandrasekhar Ramya
Chang Jonathan
Chen Kimbo
Cheng Newton
Cheveleva Anastasia
Chhablani Gunjan
Chim Jenny
Chung Hyung Won
Clinciu Miruna
Clive Jordan
Coavoux Maximin
Colombo Pierre
Contractor Danish
Cornette Pierre
Cullan Michael
Dahlberg Nathan
Danchev Valentin
Dash Ishani
Datta Debajyoti
David Davis
de Bykhovetz Madeleine Hahn
de Gibert Ona
de la Rosa Javier
De Toni Francesco
De Wolf Michiel
del Moral Albert Villanova
Deshmukh Shlok S
Dettmers Tim
Dey Manan
Dodge Jesse
Dupont Gérard
Dutra Livia
Eisenberg Renata
Elbadri Maraim
Elkott Nour
Elsahar Hady
Emezue Chris
Espejel Omar
Fahmy Nour
Fan Angela
Faranak Amy
Feizpour Amir
Ferrandis Carlos Muñoz
Fevry Thibault
Forde Jessica Zosa
Fourrier Clémentine
Freidank Moritz
Fries Jason Alan
Frohberg Jörg
Fuhrimann Florian
Fung Pascale
Gallé Matthias
Gandhi Sanchit
Gao Leo
Garda Samuele
Garrette Dan
Gehrmann Sebastian
Gerchick Marissa
Ghaleb Mustafa
Ghauri Muhammed
Gigant Théo
Giorgi John
Gokaslan Aaron
Golde Jonas
Gonzalez-Dios Itziar
Grandury María
HajiHosseini Azadeh
Haller Patrick
Hao Ryan
Harliman Rheza
Hazan Liam
Heinzerling Benjamin
Henderson Peter
Hesslow Daniel
Hevia Anthony
Huang Max
Ilić Suzana
Jain Chirag
Jauhar Mohammad A.
Jernite Yacine
Jiang Mike Tian-Jian
Johnson Isaac
Jones Hessie
Kainuma Tomoya
Kalo Jan-Christoph
Kang Jihyun
Kang Myungsun
Kasai Jungo
Kashyap Abhinav Ramesh
Kasner Zdeněk
Kassner Nora
Kawamura Ken
Khamis Nurulaqilla
Khan Ammar
Kiblawi Sid
Kiela Douwe
Kim Ethan
Kim Najoung
Kim Taewoon
Klamm Christopher
Kromann Rasmus
Kruszewski Germán
Kumar Srishti
Kusa Wojciech
Labrak Yanis
Lacroix Rémi
Laippala Veronika
Lansky David
Laud Tanmay
Launay Julien
Laurençon Hugo
Lavallée Pierre François
Le Thanh
Le Trieu
Lee Wilson Y.
Leong Colin
Lepercq Violette
Levkovizh Efrat
Lhoest Quentin
Li Conglong
Ligozat Anne-Laure
Limisiewicz Tomasz
Liu Lu
Liu Minna
Lo Kyle
Longpre Shayne
Lovering Charles
Luccioni Alexandra Sasha
López Roberto Luis
Manica Matteo
Manjavacas Enrique
Martin Robert
Masoud Maraim
McKenna Michael
McMillan-Major Angelina
Mielke Sabrina J.
Mieskes Margot
Mihaljcic Mina
Mikhailov Vladislav
Miranda-Escalada Antonio
Mirkin Shachar
Mirza Fatima
Mishra Mayank
Mishra Shubhanshu
Mitchell Margaret
Molano Daniel
Mou Chenghao
Muellner Nikolaus
Muennighoff Niklas
Muhammad Shamsuddeen Hassan
Muñoz Manuel Romero
Nagel Sebastian
Narayanan Deepak
Natan Eyal Bar
Nayak Nihal
Neeraj Trishala
Nejadgholi Isar
Nezhurina Marianna
Nguyen Duong A.
Nguyen Huu
Nguyen Olivier
Nguyen Zach
Nikoulina Vassilina
Nikpoor Somaieh
Nitzav Ariel Kreisberg
Novikova Jekaterina
Névéol Aurélie
Ononiwu Frankline
Osei Salomey
Ott Simon
Oyebade Tobi
Ozoani Ezinwanne
Pai Suhas
Pais Shani
Palasciano Alfredo
Pandey Harshit
Passmore Jesse
Patil Suraj
Patry Nicolas
Pavlick Ellie
Periñán Daniel León
Pestana Amanda
Peyrounette Myriam
Phan Long
Phang Jason
Pistilli Giada
Ponferrada Eduardo González
Posada Jose David
Prabhu Vrinda
Press Ofir
Protasov Vitaly
Pruksachatkun Yada
Pyysalo Sampo
Pàmies Marc
Qiu Mike
Radev Dragomir
Raffel Colin
Raja Arun
Rajani Nazneen
Rajbhandari Samyam
Rasley Jeff
Raunak Vikas
Reiter Ehud
Requena Stéphane
Rezanejad Habib
Ribeiro Rui
Rieser Verena
Roberts Adam
Rogers Anna
Roy Sourav
Rozen Jos
Rueda Alice
Rush Alexander M.
Ruwase Olatunji
Ryabinin Max
Sagot Benoît
Salesky Elizabeth
Samagaio Mairon
Samuel Olanrewaju
Samwald Matthias
Sang-aroonsiri Sinee
Sanh Victor
Sanseviero Omar
Santilli Andrea
Santos Ana
Sanz Julio Bonis
Saulnier Lucile
Saxena Bharat
Scao Teven Le
Schick Timo
Schoelkopf Hailey
Schweter Stefan
Scialom Thomas
Sedenko Irina
Seelam Natasha
Seltzer Josh
Serikov Oleg
Sharma Abheesht
Sharma Shanya
Shavrina Tatiana
Shen Sheng
Shinzato Luisa
Shoeybi Mohammad
Shubber Sarmad
Shukla Anima
Si Chenglei
Silberberg Stanislav
Simhi Adi
Singh Amanpreet
Singh Ayush
Singh Mayank
Sivaraman Karthik Rangasai
Smith Shaden
Solaiman Irene
Soroa Aitor
Stiegler Arnaud
Strobelt Hendrik
Su Rosaline
Su Ruisi
Suarez Pedro Ortiz
Subramani Nishant
Subramonian Arjun
Sun Zhiqing
Sutawika Lintang
Szczechla Eliza
Sänger Mario
Tae Jaesung
Takeuchi Maiko
Taktasheva Ekaterina
Talat Zeerak
Tammour Aycha
Tan Edward
Tan Samson
Tan Zhe
Tang Xiangru
Tanguy Ludovic
Tazi Nouamane
Taşar Davut Emre
Teehan Ryan
Thakker Urmish
Thrush Tristan
Tobing Joseph
Tojarieh Hadar
Torrent Tiago Timponi
Tow Jonathan
Tran Hieu
Tunuguntla Deepak
Unldreaj Antigona
Uri Yallow
van der Wal Oskar
van Strien Daniel
Venkatraman Yash
Viguier Sylvain
Villegas Paulo
Voloshina Ekaterina
von Platen Patrick
Von Werra Leandro
Vrabec Helena U.
Vu Minh Chien
Wang Bo
Wang Han
Wang Silas
Wang Thomas
Weber Leon
Webson Albert
Weinberg Michael
Winata Genta Indra
Wolf Thomas
Workshop BigScience
Xie Zhongli
Xu Canwen
Xu Chuxin
Xu Yifan
Xu Yingxin
Xu Yu
Yang Yoyo
Ye Zifan
Yong Zheng-Xin
Yu Dian
Yu Ian
Yun Tian
Yvon François
Zhang Minjia
Zhang Rui
Zhang Ruochen
Zhou Chenxi
Zhu Jian
Zink Sydney
Šaško Mario
Publication venue
Publication date: 10/12/2022
Field of study

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License

arXiv.org e-Print Archive

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Author: Adelani David,
Akiki Christopher
Almubarak Khalid
Alyafeai Zaid
Ben Allal Loubna
Biderman Stella
Bose Shamik
Chien Vu Minh
Chim Jenny
Colombo Pierre
de la Rosa Javier
de Toni Francesco
Dey Manan
Dupont Gérard
Frohberg Jörg
Gokaslan Aaron
Gonzalez-Dios Itziar
González Ponferrada Eduardo
Ilić Suzana
Jernite Yacine
Laurençon Hugo
Le Scao Teven
Lepercq Violette
Lhoest Quentin
Lo Kyle
Longpre Shayne
Luccioni Sasha
Masoud Maraim
Mcmillan-Major Angelina
Mitchell Margaret
Mou Chenghao
Nagel Sebastian
Nguyen Huu
Nguyen Olivier
Nikpoor Somaieh
Ortiz Suarez Pedro
Pai Suhas
Phan Long
Pistilli Giada
Rogers Anna
Romero Muñoz Manuel
Saulnier Lucile
Soroa Aitor
Thrush Tristan
Tran Hieu
van Strien Daniel
Villanova del Moral Albert
Villegas Paulo
von Werra Leandro
Wang Thomas
Weber Leon
Yu Ian
Zhu Jian
Šaško Mario
Publication venue: HAL CCSD
Publication date: 28/11/2022
Field of study

International audienceAs language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of ethics, harm, and governance in the foreground. This paper documents the data creation and curation efforts undertaken by BigScience to assemble the Responsible Open-science Open-collaboration Text Sources (ROOTS) corpus, a 1.6TB dataset spanning 59 languages that was used to train the 176-billion-parameter BigScience Large Open-science Open-access Multilingual (BLOOM) language model. We further release a large initial subset of the corpus and analyses thereof, and hope to empower large-scale monolingual and multilingual modeling projects with both the data and the processing tools, as well as stimulate research around this large multilingual corpus

HAL-CentraleSupelec

HAL-Rennes 1