19 research outputs found

    Design problems in crowdsourcing: improving the quality of crowd-based data collection

    Get PDF
    Text, images, and other types of information objects can be described in many ways. Having detailed metadata and various people's interpretations of the object helps in providing better access and use. While collecting novel descriptions is challenging, crowdsourcing is presenting new opportunities to do so. Large-scale human contributions open the door to latent information, subjective judgments, and other encoding of data that is otherwise difficult to infer algorithmically. However, such contributions are also subject to variance from the inconsistencies of human interpretation. This dissertation studies the problem of variance in crowdsourcing and investigates how it can be controlled both through post-collection modeling and better collection-time design decisions. Crowd-contributed data is affected by many inconsistencies that differ from automated processes: differences in attention, interpretation, skill, and engagement. The types of tasks that we require of humans are also more inherently abstract and more difficult to agree on. Particularly, qualitative or judgment-based tasks may be subjective, affected by contributor opinions and tastes. Approaches to understanding contribution variance and improve data quality are studied in three spaces. First, post-collection modeling is pursued as a way of improving crowdsourced data quality, looking at whether factors including time, experience, and agreement with others provide indicators of contributions quality. Secondly, collection-time design problems are studied, comparing design manipulations for a controlled set of tasks. Since crowdsourcing is borne out of an interaction, not all crowdsourcing data corrections are posterior: it also matters how you collect that data. Finally, designing for subjective contexts is studied. Crowds are well-positioned to teach us about how information can be adapted to different person-specific needs, but treating subjective tasks similarly to other tasks results in unnecessary error. The primary contribution of this work is an understanding of crowd data quality improvements from non-adversarial perspectives: that is, focusing on sources of variance or errors beyond poor contributors. This includes findings that: 1. Collection interface design has a vital influence on the quality of collected data, and better guiding contributors can improve crowdsourced contribution quality without greatly raising the cost of collection nor impeding other quality control strategies. 2. Different interpretations of instructions threaten reliability and accuracy in crowdsourcing. This source of problems even affects trustworthy, attentive contributors. However, contributor quality can be inferred very early in an interaction for possible interventions. 3. Certain design choices improve the quality of contributions in tasks that call for them. Anchoring reduces contributor-specific error, training affirms or corrects contributors' understanding of the task, and performance feedback can motivate middling contributors to exercise more care. Particularly notable due to its simplicity, an intervention that forefronts instructions behind an explicitly dismissable window improves contribution quality greatly. 4. Paid crowdsourcing, often used for tasks with an assumed ground truth, can be also be applied in subjective contexts. It is promising for on-demand personalization contexts, such as recommendation without prior data for training. 5. Two approaches are found to improve the quality of tasks for subjective crowdsourcing. Matching contributors to a target person based on similarity is good for long-term interactions or for bootstrapping multi-target systems. Alternately, explicitly asking contributors to make sense of a target person and customize work for them is especially good for tasks with broad decision spaces and is more enjoyable to perform. The findings in this dissertation contribute to the crowdsourcing research space as well as providing practical improvements to crowd collection best practices

    Unmonitorability of Artificial Intelligence

    Get PDF
    Artificially Intelligent (AI) systems have ushered in a transformative era across various domains, yet their inherent traits of unpredictability, unexplainability, and uncontrollability have given rise to concerns surrounding AI safety. This paper aims to demonstrate the infeasibility of accurately monitoring advanced AI systems to predict the emergence of certain capabilities prior to their manifestation. Through an analysis of the intricacies of AI systems, the boundaries of human comprehension, and the elusive nature of emergent behaviors, we argue for the impossibility of reliably foreseeing some capabilities. By investigating these impossibility results, we shed light on their potential implications for AI safety research and propose potential strategies to overcome these limitations

    Unsolved Problems in ML Safety

    Full text link
    Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.Comment: Position Pape

    Leveraging Mixed Expertise in Crowdsourcing.

    Full text link
    Crowdsourcing systems promise to leverage the "wisdom of crowds" to help solve many kinds of problems that are difficult to solve using only computers. Although a crowd of people inherently represents a diversity of skill levels, knowledge, and opinions, crowdsourcing system designers typically view this diversity as noise and effectively cancel it out by aggregating responses. However, we believe that by embracing crowd workers' diverse expertise levels, system designers can better leverage that knowledge to increase the wisdom of crowds. In this thesis, we propose solutions to a limitation of current crowdsourcing approaches: not accounting for a range of expertise levels in the crowd. The current body of work in crowdsourcing does not systematically examine this, suggesting that researchers may not believe the benefits of using mixed expertise warrants the complexities of supporting it. This thesis presents two systems, Escalier and Kurator, to show that leveraging mixed expertise is a worthwhile endeavor because it materially benefits system performance, at scale, for various types of problems. We also demonstrate an effective technique, called expertise layering, to incorporate mixed expertise into crowdsourcing systems. Finally, we show that leveraging mixed expertise enables researchers to use crowdsourcing to address new types of problems.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133307/1/afdavid_1.pd

    Characterizing and Diagnosing Architectural Degeneration of Software Systems from Defect Perspective

    Get PDF
    The architecture of a software system is known to degrade as the system evolves over time due to change upon change, a phenomenon that is termed architectural degeneration. Previous research has focused largely on structural deviations of an architecture from its baseline. However, another angle to observe architectural degeneration is software defects, especially those that are architecturally related. Such an angle has not been scientifically explored until now. Here, we ask two relevant questions: (1) What do defects indicate about architectural degeneration? and (2) How can architectural degeneration be diagnosed from the defect perspective? To answer question (1), we conducted an exploratory case study analyzing defect data over six releases of a large legacy system (of size approximately 20 million source lines of code and age over 20 years). The relevant defects here are those that span multiple components in the system (called multiple-component defects - MCDs). This case study found that MCDs require more changes to fix and are more persistent across development phases and releases than other types of defects. To answer question (2), we developed an approach (called Diagnosing Architectural Degeneration - DAD) from the defect perspective, and validated it in another, confirmatory, case study involving three releases of a commercial system (of size over 1.5 million source lines of code and age over 13 years). This case study found that components of the system tend to persistently have an impact on architectural degeneration over releases. Especially, such impact of a few components is substantially greater than that of other components. These results are new and they add to the current knowledge on architectural degeneration. The key conclusions from these results are: (i) analysis of MCDs is a viable approach to characterizing architectural degeneration; and (ii) a method such as DAD can be developed for diagnosing architectural degeneration

    Convolutional neural networks for the segmentation of small rodent brain MRI

    Get PDF
    Image segmentation is a common step in the analysis of preclinical brain MRI, often performed manually. This is a time-consuming procedure subject to inter- and intra- rater variability. A possible alternative is the use of automated, registration-based segmentation, which suffers from a bias owed to the limited capacity of registration to adapt to pathological conditions such as Traumatic Brain Injury (TBI). In this work a novel method is developed for the segmentation of small rodent brain MRI based on Convolutional Neural Networks (CNNs). The experiments here presented show how CNNs provide a fast, robust and accurate alternative to both manual and registration-based methods. This is demonstrated by accurately segmenting three large datasets of MRI scans of healthy and Huntington disease model mice, as well as TBI rats. MU-Net and MU-Net-R, the CCNs here presented, achieve human-level accuracy while eliminating intra-rater variability, alleviating the biases of registration-based segmentation, and with an inference time of less than one second per scan. Using these segmentation masks I designed a geometric construction to extract 39 parameters describing the position and orientation of the hippocampus, and later used them to classify epileptic vs. non-epileptic rats with a balanced accuracy of 0.80, five months after TBI. This clinically transferable geometric approach detects subjects at high-risk of post-traumatic epilepsy, paving the way towards subject stratification for antiepileptogenesis studies

    Deep learning of the dynamics of complex systems with its applications to biochemical molecules

    Get PDF
    Recent advancements in deep learning have revolutionized method development in several scientific fields and beyond. One central application is the extraction of equilibrium structures and long- timescale kinetics from molecular dynamics simulations, i.e. the well-known sampling problem. Previous state-of-the art methods employed a multi-step handcrafted data processing pipeline resulting in Markov state models (MSM), which can be understood as an approximation of the underlying Koopman operator. However, this approach demands choosing a set of features characterizing the molecular structure, methods and their parameters for dimension reduction to collective variables and clustering, and estimation strategies for MSMs throughout the processing pipeline. As this requires specific expertise, the approach is ultimately inaccessible to a broader community. In this thesis we apply deep learning techniques to approximate the Koopman operator in an end-to-end learning framework by employing the variational approach for Markov processes (VAMP). Thereby, the framework bypasses the multi-step process and automates the pipeline while yielding a model similar to a coarse-grained MSM. We further transfer advanced techniques from the MSM field to the deep learning framework, making it possible to (i) include experimental evidence into the model estimation, (ii) enforce reversibility, and (iii) perform coarse-graining. At this stage, post-analysis tools from MSMs can be borrowed to estimate rates of relevant rare events. Finally, we extend this approach to decompose a system into its (almost) independent subsystems and simultaneously estimate dynamical models for each of them, making it much more data efficient and enabling applications to larger proteins. Although our results solely focus on protein dynamics, the application to climate, weather, and ocean currents data is an intriguing possibility with potential to yield new insights and improve predictive power in these fields
    corecore