12 research outputs found

    La logique algorithmique confrontée à l'organisation de l'administration publique française

    Get PDF
    Cet article montre comment la logique algorithmique d’un agent conversationnel peut aider l’organisation des connaissances au sein d’une organisation de l’administration publique française, notamment une collectivité territoriale. Par le bias d’une recherche sur le terrain, je cherche à montrer comment il existe deux différentes adoptions de la technologie de la part de l’administration publique : une complexifiante et une simplifiante

    What lies behind AGI: ethical concerns related to LLMs

    Get PDF
    This paper opens the philosophical debate around the notion of Artificial General Intelligence (AGI) and its application in Large Language Models (LLMs). Through the lens of moral philosophy, the paper raises questions about these AI systems' capabilities and goals, the treatment of humans behind them, and the risk of perpetuating a monoculture through language

    Stronger Together: on the Articulation of Ethical Charters, Legal Tools, and Technical Documentation in ML

    Full text link
    The growing need for accountability of the people behind AI systems can be addressed by leveraging processes in three fields of study: ethics, law, and computer science. While these fields are often considered in isolation, they rely on complementary notions in their interpretation and implementation. In this work, we detail this interdependence and motivate the necessary role of collaborative governance tools in shaping a positive evolution of AI. We first contrast notions of compliance in the ethical, legal, and technical fields; we outline both their differences and where they complement each other, with a particular focus on the roles of ethical charters, licenses, and technical documentation in these interactions. We then focus on the role of values in articulating the synergies between the fields and outline specific mechanisms of interaction between them in practice. We identify how these mechanisms have played out in several open governance fora: an open collaborative workshop, a responsible licensing initiative, and a proposed regulatory framework. By leveraging complementary notions of compliance in these three domains, we can create a more comprehensive framework for governing AI systems that jointly takes into account their technical capabilities, their impact on society, and how technical specifications can inform relevant regulations. Our analysis thus underlines the necessity of joint consideration of the ethical, legal, and technical in AI ethics frameworks to be used on a larger scale to govern AI systems and how the thinking in each of these areas can inform the others

    Debating AI in Archaeology: applications, implications, and ethical considerations

    Get PDF
    Artificial Intelligence (AI) is not a recent development. However, with increasing computational capabilities, AI has developed into Natural Language Processing and Machine Learning, technologies particularly good at detecting correlations and patterns, and categorising, predicting, or extracting information. Within archaeology, AI can process big data accumulated over decades of research and deposited in archives. By combining these capabilities, AI offers new insights and exciting opportunities to create knowledge from archaeological archives for contemporary and future research. However, the ethical implications and human costs are not yet fully understood. Therefore, we question whether AI in archaeology is a blessing or a curse

    BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model

    Full text link
    The BigScience Workshop was a value-driven initiative that spanned one and half years of interdisciplinary research and culminated in the creation of ROOTS, a 1.6TB multilingual dataset that was used to train BLOOM, one of the largest multilingual language models to date. In addition to the technical outcomes and artifacts, the workshop fostered multidisciplinary collaborations around large models, datasets, and their analysis. This in turn led to a wide range of research publications spanning topics from ethics to law, data governance, modeling choices and distributed training. This paper focuses on the collaborative research aspects of BigScience and takes a step back to look at the challenges of large-scale participatory research, with respect to participant diversity and the tasks required to successfully carry out such a project. Our main goal is to share the lessons we learned from this experience, what we could have done better and what we did well. We show how the impact of such a social approach to scientific research goes well beyond the technical artifacts that were the basis of its inception.Comment: Presented at the 2022 NeurIPS Workshop on Broadening Research Collaborations in M

    The Ghost in the Machine has an American accent: value conflict in GPT-3

    Get PDF
    The alignment problem in the context of large language models must consider the plurality of human values in our world. Whilst there are many resonant and overlapping values amongst the world’s cultures, there are also many conflicting, yet equally valid, values. It is important to observe which cultural values a model exhibits, particularly when there is a value conflict between input prompts and generated outputs. We discuss how the co- creation of language and cultural value impacts large language models (LLMs). We explore the constitution of the training data for GPT-3 and compare that to the world’s language and internet access demographics, as well as to reported statistical profiles of dominant values in some Nation-states. We stress tested GPT-3 with a range of value-rich texts representing several languages and nations; including some with values orthogonal to dominant US public opinion as reported by the World Values Survey. We observed when values embedded in the input text were mutated in the generated outputs and noted when these conflicting values were more aligned with reported dominant US values. Our discussion of these results uses a moral value pluralism (MVP) lens to better understand these value mutations. Finally, we provide recommendations for how our work may contribute to other current work in the field

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Full text link
    Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License

    The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

    No full text
    International audienceAs language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of ethics, harm, and governance in the foreground. This paper documents the data creation and curation efforts undertaken by BigScience to assemble the Responsible Open-science Open-collaboration Text Sources (ROOTS) corpus, a 1.6TB dataset spanning 59 languages that was used to train the 176-billion-parameter BigScience Large Open-science Open-access Multilingual (BLOOM) language model. We further release a large initial subset of the corpus and analyses thereof, and hope to empower large-scale monolingual and multilingual modeling projects with both the data and the processing tools, as well as stimulate research around this large multilingual corpus
    corecore