512 research outputs found

    Unsupervised classification to improve the quality of a bird song recording dataset

    Full text link
    Open audio databases such as Xeno-Canto are widely used to build datasets to explore bird song repertoire or to train models for automatic bird sound classification by deep learning algorithms. However, such databases suffer from the fact that bird sounds are weakly labelled: a species name is attributed to each audio recording without timestamps that provide the temporal localization of the bird song of interest. Manual annotations can solve this issue, but they are time consuming, expert-dependent, and cannot run on large datasets. Another solution consists in using a labelling function that automatically segments audio recordings before assigning a label to each segmented audio sample. Although labelling functions were introduced to expedite strong label assignment, their classification performance remains mostly unknown. To address this issue and reduce label noise (wrong label assignment) in large bird song datasets, we introduce a data-centric novel labelling function composed of three successive steps: 1) time-frequency sound unit segmentation, 2) feature computation for each sound unit, and 3) classification of each sound unit as bird song or noise with either an unsupervised DBSCAN algorithm or the supervised BirdNET neural network. The labelling function was optimized, validated, and tested on the songs of 44 West-Palearctic common bird species. We first showed that the segmentation of bird songs alone aggregated from 10% to 83% of label noise depending on the species. We also demonstrated that our labelling function was able to significantly reduce the initial label noise present in the dataset by up to a factor of three. Finally, we discuss different opportunities to design suitable labelling functions to build high-quality animal vocalizations with minimum expert annotation effort

    Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need

    Full text link
    Self-Supervised Learning (SSL) has emerged as the solution of choice to learn transferable representations from unlabeled data. However, SSL requires to build samples that are known to be semantically akin, i.e. positive views. Requiring such knowledge is the main limitation of SSL and is often tackled by ad-hoc strategies e.g. applying known data-augmentations to the same input. In this work, we generalize and formalize this principle through Positive Active Learning (PAL) where an oracle queries semantic relationships between samples. PAL achieves three main objectives. First, it unveils a theoretically grounded learning framework beyond SSL, that can be extended to tackle supervised and semi-supervised learning depending on the employed oracle. Second, it provides a consistent algorithm to embed a priori knowledge, e.g. some observed labels, into any SSL losses without any change in the training pipeline. Third, it provides a proper active learning framework yielding low-cost solutions to annotate datasets, arguably bringing the gap between theory and practice of active learning that is based on simple-to-answer-by-non-experts queries of semantic relationships between inputs.Comment: 8 main pages, 20 totals, 10 figure

    Workshop Proceedings of the 12th edition of the KONVENS conference

    Get PDF
    The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years

    Multi-annotator Deep Learning: A Probabilistic Framework for Classification

    Full text link
    Solving complex classification tasks using deep neural networks typically requires large amounts of annotated data. However, corresponding class labels are noisy when provided by error-prone annotators, e.g., crowd workers. Training standard deep neural networks leads to subpar performances in such multi-annotator supervised learning settings. We address this issue by presenting a probabilistic training framework named multi-annotator deep learning (MaDL). A ground truth and an annotator performance model are jointly trained in an end-to-end learning approach. The ground truth model learns to predict instances' true class labels, while the annotator performance model infers probabilistic estimates of annotators' performances. A modular network architecture enables us to make varying assumptions regarding annotators' performances, e.g., an optional class or instance dependency. Further, we learn annotator embeddings to estimate annotators' densities within a latent space as proxies of their potentially correlated annotations. Together with a weighted loss function, we improve the learning from correlated annotation patterns. In a comprehensive evaluation, we examine three research questions about multi-annotator supervised learning. Our findings indicate MaDL's state-of-the-art performance and robustness against many correlated, spamming annotators

    Racism Pays: How Racial Exploitation Gets Innovation Off the Ground

    Get PDF
    Recent work on the history of capitalism documents the key role that racial exploitation played in the launch of the global cotton economy and the construction of the transcontinental railroad. But racial exploitation is not a thing of the past. Drawing on three case studies, this Paper argues that some of our most celebrated innovations in the digital economy have gotten off the ground by racially exploiting workers of color, paying them less than the marginal revenue product of their labor for their essential contributions. Innovators like Apple and Uber have been able to racially exploit workers of color because they have monopsony power to do so. Workers of color have far fewer outside options than white workers, owing to intentional and structural discrimination against workers on the basis of their race. In the emerging digital economy, racial exploitation has paid off by giving innovators a workforce that is cheap, easy to scale, flexible, and productive—the kind of workforce that is especially useful in digital markets, where a first-mover advantage often translates to winner-take-all. This Paper argues that these workers should be paid the marginal revenue product of their labor, and it proposes a number of potential ways to do so: by increasing worker compensation or worker power. More generally, I argue that we should value the essential contributions of workers of color and immigrant workers who make innovation possible

    A Comprehensive Study Of Bills Of Materials For Software Systems

    Get PDF
    Software Bills of Materials (SBOMs) have emerged as tools to facilitate the management of software dependencies, vulnerabilities, licenses, and the supply chain. Significant effort has been devoted to increasing SBOM awareness and developing SBOM formats and tools. Despite this effort, recent studies have shown that SBOMs are still an early technology not adequately adopted in practice yet, mainly due to limited SBOM tooling and lack of industry consensus on SBOM content, tool usage, and practical benefits. Expanding on previous research, this paper reports a comprehensive study that first investigates the current challenges stakeholders encounter when creating and using SBOMs. The study surveyed 138 practitioners belonging to five groups of stakeholders (practitioners familiar with SBOMs, members of critical open source projects, AI/ML practitioners, experts of cyber-physical systems, and legal professionals), using differentiated questionnaires. We interviewed eight survey respondents to gather further insights about their experience. We identified fourteen major challenges facing the creation and use of SBOMs, including those related to the material included in SBOMs, deficiencies in SBOM tools, SBOM maintenance and verification, and domain-specific challenges. We propose and discuss six actionable solutions to the identified challenges and present the major avenues for future research and development. We hope these solutions can be adopted by the community to improve SBOM formats, tools, and adoption, and thus, enable the full potential of SBOMs

    Numerous faces of feminism and their meaning in contemporary South Korea through the lens of the country’s young adults

    Get PDF
    Beginning with contextualizing and historicizing South Korean feminism, this work aims to provide an inclusive perspective that recognizes numerous social, political, as well as economic conditions that have influenced the feminist resurgence in contemporary South Korea, but also the increasing backlash it has been recently receiving there. Further, the research continues by introducing the diverse forms of contemporary feminist activism in the country, focusing the examination on several of the most influential and publicized ones. In addition to the wide range of previously conducted studies on the subject, several secondary quantitative data sources, such as surveys, are used to thoroughly demonstrate the currently prevailing trends and phenomena. Finally, relying primarily on qualitative data in the form of seven semi-structured interviews with young South Koreans, the research explores the meanings, personal experiences, and attitudinal orientations, as well as ultimately, the feminist identity or lack thereof, as expressed by this group of interviewees. The study concludes that the major impediment to the endorsement of the feminist identity of its participants seems to be the negative cultural stereotypes attributed to the country’s feminism that either make individuals reluctant to manifest such a label or influence their complete rejection of this ideology as harmful and deviant. The difference between the two orientations can be potentially associated with the experiences of gender discrimination of both the participants themselves and of people in their social environment, as well as their perception of the nature of the existing inequalities as an either structural, systemic, and collective social problem or as a highly individual, unshared incidents

    Proceedings of the 8th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2023)

    Get PDF
    This volume gathers the papers presented at the Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), Tampere, Finland, during 21–22 September 2023
    • …
    corecore