15 research outputs found

    A Provably Improved Algorithm for Crowdsourcing with Hard and Easy Tasks

    Full text link
    Crowdsourcing is a popular method used to estimate ground-truth labels by collecting noisy labels from workers. In this work, we are motivated by crowdsourcing applications where each worker can exhibit two levels of accuracy depending on a task's type. Applying algorithms designed for the traditional Dawid-Skene model to such a scenario results in performance which is limited by the hard tasks. Therefore, we first extend the model to allow worker accuracy to vary depending on a task's unknown type. Then we propose a spectral method to partition tasks by type. After separating tasks by type, any Dawid-Skene algorithm (i.e., any algorithm designed for the Dawid-Skene model) can be applied independently to each type to infer the truth values. We theoretically prove that when crowdsourced data contain tasks with varying levels of difficulty, our algorithm infers the true labels with higher accuracy than any Dawid-Skene algorithm. Experiments show that our method is effective in practical applications

    Matrix Estimation for Individual Fairness

    Full text link
    In recent years, multiple notions of algorithmic fairness have arisen. One such notion is individual fairness (IF), which requires that individuals who are similar receive similar treatment. In parallel, matrix estimation (ME) has emerged as a natural paradigm for handling noisy data with missing values. In this work, we connect the two concepts. We show that pre-processing data using ME can improve an algorithm's IF without sacrificing performance. Specifically, we show that using a popular ME method known as singular value thresholding (SVT) to pre-process the data provides a strong IF guarantee under appropriate conditions. We then show that, under analogous conditions, SVT pre-processing also yields estimates that are consistent and approximately minimax optimal. As such, the ME pre-processing step does not, under the stated conditions, increase the prediction error of the base algorithm, i.e., does not impose a fairness-performance trade-off. We verify these results on synthetic and real data.Comment: 23 pages, 3 figures, ICML 202

    Big Data in Bioeconomy

    Get PDF
    This edited open access book presents the comprehensive outcome of The European DataBio Project, which examined new data-driven methods to shape a bioeconomy. These methods are used to develop new and sustainable ways to use forest, farm and fishery resources. As a European initiative, the goal is to use these new findings to support decision-makers and producers – meaning farmers, land and forest owners and fishermen. With their 27 pilot projects from 17 countries, the authors examine important sectors and highlight examples where modern data-driven methods were used to increase sustainability. How can farmers, foresters or fishermen use these insights in their daily lives? The authors answer this and other questions for our readers. The first four parts of this book give an overview of the big data technologies relevant for optimal raw material gathering. The next three parts put these technologies into perspective, by showing useable applications from farming, forestry and fishery. The final part of this book gives a summary and a view on the future. With its broad outlook and variety of topics, this book is an enrichment for students and scientists in bioeconomy, biodiversity and renewable resources

    The Nexus Between Security Sector Governance/Reform and Sustainable Development Goal-16

    Get PDF
    This Security Sector Reform (SSR) Paper offers a universal and analytical perspective on the linkages between Security Sector Governance (SSG)/SSR (SSG/R) and Sustainable Development Goal-16 (SDG-16), focusing on conflict and post-conflict settings as well as transitional and consolidated democracies. Against the background of development and security literatures traditionally maintaining separate and compartmentalized presence in both academic and policymaking circles, it maintains that the contemporary security- and development-related challenges are inextricably linked, requiring effective measures with an accurate understanding of the nature of these challenges. In that sense, SDG-16 is surely a good step in the right direction. After comparing and contrasting SSG/R and SDG-16, this SSR Paper argues that human security lies at the heart of the nexus between the 2030 Agenda of the United Nations (UN) and SSG/R. To do so, it first provides a brief overview of the scholarly and policymaking literature on the development-security nexus to set the background for the adoption of The Agenda 2030. Next, it reviews the literature on SSG/R and SDGs, and how each concept evolved over time. It then identifies the puzzle this study seeks to address by comparing and contrasting SSG/R with SDG-16. After making a case that human security lies at the heart of the nexus between the UN’s 2030 Agenda and SSG/R, this book analyses the strengths and weaknesses of human security as a bridge between SSG/R and SDG-16 and makes policy recommendations on how SSG/R, bolstered by human security, may help achieve better results on the SDG-16 targets. It specifically emphasizes the importance of transparency, oversight, and accountability on the one hand, and participative approach and local ownership on the other. It concludes by arguing that a simultaneous emphasis on security and development is sorely needed for addressing the issues under the purview of SDG-16

    Intelligent Data Analytics using Deep Learning for Data Science

    Get PDF
    Nowadays, data science stimulates the interest of academics and practitioners because it can assist in the extraction of significant insights from massive amounts of data. From the years 2018 through 2025, the Global Datasphere is expected to rise from 33 Zettabytes to 175 Zettabytes, according to the International Data Corporation. This dissertation proposes an intelligent data analytics framework that uses deep learning to tackle several difficulties when implementing a data science application. These difficulties include dealing with high inter-class similarity, the availability and quality of hand-labeled data, and designing a feasible approach for modeling significant correlations in features gathered from various data sources. The proposed intelligent data analytics framework employs a novel strategy for improving data representation learning by incorporating supplemental data from various sources and structures. First, the research presents a multi-source fusion approach that utilizes confident learning techniques to improve the data quality from many noisy sources. Meta-learning methods based on advanced techniques such as the mixture of experts and differential evolution combine the predictive capacity of individual learners with a gating mechanism, ensuring that only the most trustworthy features or predictions are integrated to train the model. Then, a Multi-Level Convolutional Fusion is presented to train a model on the correspondence between local-global deep feature interactions to identify easily confused samples of different classes. The convolutional fusion is further enhanced with the power of Graph Transformers, aggregating the relevant neighboring features in graph-based input data structures and achieving state-of-the-art performance on a large-scale building damage dataset. Finally, weakly-supervised strategies, noise regularization, and label propagation are proposed to train a model on sparse input labeled data, ensuring the model\u27s robustness to errors and supporting the automatic expansion of the training set. The suggested approaches outperformed competing strategies in effectively training a model on a large-scale dataset of 500k photos, with just about 7% of the images annotated by a human. The proposed framework\u27s capabilities have benefited various data science applications, including fluid dynamics, geometric morphometrics, building damage classification from satellite pictures, disaster scene description, and storm-surge visualization

    Considering stakeholders’ preferences for scheduling slots in capacity constrained airports

    Get PDF
    Airport slot scheduling has attracted the attention of researchers as a capacity management tool at congested airports. Recent research work has employed multi-objective approaches for scheduling slots at coordinated airports. However, the central question on how to select a commonly accepted airport schedule remains. The various participating stakeholders may have multiple and sometimes conflicting objectives stemming from their decision-making needs. This complex decision environment renders the identification of a commonly accepted solution rather difficult. In this presentation, we propose a multi-criteria decision-making technique that incorporates the priorities and preferences of the stakeholders in order to determine the best compromise solution
    corecore