84 research outputs found

    Aligning Robot Behaviors with Human Intents by Exposing Learned Behaviors and Resolving Misspecifications

    Get PDF

    Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example

    Full text link
    Post-hoc explanation methods are gaining popularity for interpreting, understanding, and debugging neural networks. Most analyses using such methods explain decisions in response to inputs drawn from the test set. However, the test set may have few examples that trigger some model behaviors, such as high-confidence failures or ambiguous classifications. To address these challenges, we introduce a flexible model inspection framework: Bayes-TrEx. Given a data distribution, Bayes-TrEx finds in-distribution examples with a specified prediction confidence. We demonstrate several use cases of Bayes-TrEx, including revealing highly confident (mis)classifications, visualizing class boundaries via ambiguous examples, understanding novel-class extrapolation behavior, and exposing neural network overconfidence. We use Bayes-TrEx to study classifiers trained on CLEVR, MNIST, and Fashion-MNIST, and we show that this framework enables more flexible holistic model analysis than just inspecting the test set. Code is available at https://github.com/serenabooth/Bayes-TrEx.Comment: Accepted at AAAI 202

    Quality-Diversity Generative Sampling for Learning with Synthetic Data

    Full text link
    Generative models can serve as surrogates for some real data sources by creating synthetic training datasets, but in doing so they may transfer biases to downstream tasks. We focus on protecting quality and diversity when generating synthetic training datasets. We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space, despite the data coming from a biased generator. QDGS is a model-agnostic framework that uses prompt guidance to optimize a quality objective across measures of diversity for synthetically generated data, without fine-tuning the generative model. Using balanced synthetic datasets generated by QDGS, we first debias classifiers trained on color-biased shape datasets as a proof-of-concept. By applying QDGS to facial data synthesis, we prompt for desired semantic concepts, such as skin tone and age, to create an intersectional dataset with a combined blend of visual features. Leveraging this balanced data for training classifiers improves fairness while maintaining accuracy on facial recognition benchmarks. Code available at: https://github.com/Cylumn/qd-generative-sampling.Comment: Accepted at AAAI 2024; 7 pages main, 12 pages total, 9 figure

    Models of human preference for learning reward functions

    Full text link
    The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments, a type of reinforcement learning from human feedback (RLHF). These human preferences are typically assumed to be informed solely by partial return, the sum of rewards along each segment. We find this assumption to be flawed and propose modeling human preferences instead as informed by each segment's regret, a measure of a segment's deviation from optimal decision-making. Given infinitely many preferences generated according to regret, we prove that we can identify a reward function equivalent to the reward function that generated those preferences, and we prove that the previous partial return model lacks this identifiability property in multiple contexts. We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting. Additionally, we find that our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned. Overall, this work establishes that the choice of preference model is impactful, and our proposed regret preference model provides an improvement upon a core assumption of recent research. We have open sourced our experimental code, the human preferences dataset we gathered, and our training and preference elicitation interfaces for gathering a such a dataset.Comment: 16 pages (40 pages with references and appendix), 23 figure

    The Role of the Reducible Dopant in Solid Electrolyte-Lithium Metal Interfaces

    Get PDF
    Garnet solid electrolytes, of the form Li7La3Zr2O12 (LLZO), remain an enticing prospect for solid-state batteries owing to their chemical and electrochemical stability in contact with metallic lithium. Dopants, often employed to stabilize the fast ion conducting cubic garnet phase, typically have no effect on the chemical stability of LLZO in contact with Li metal but have been found recently to impact the properties of the Li/garnet interface. For dopants more “reducible” than Zr (e.g., Nb and Ti), contradictory reports of either raised or reduced Li/garnet interfacial resistances have been attributed to the dopant. Here, we investigate the Li/LLZO interface in W-doped Li7La3Zr2O12 (LLZWO) to determine the influence of a “reducible” dopant on the electrochemical properties of the Li/garnet interface. Single-phase LLZWO is synthesized by a new sol–gel approach and densified by spark plasma sintering. Interrogating the resulting Li/LLZWO interface/interphase by impedance, muon spin relaxation and X-ray absorption spectroscopies uncover the significant impact of surface lithiation on electrochemical performance. Upon initial contact, an interfacial reaction occurs between LLZWO and Li metal, leading to the reduction of surface W6+ centers and an initial reduction of the Li/garnet interfacial resistance. Propagation of this surface reaction, driven by the high mobility of Li+ ions through the grain surfaces, thickens the resistive interphases throughout the material and impedes Li+ ion transport between the grains. The resulting high resistance accumulating in the system impedes cycling at high current densities. These insights shed light on the nature of lithiated interfaces in garnet solid electrolytes containing a reducible dopant where high Li+ ion mobility and the reducible nature of the dopant can significantly affect electrochemical performance

    Rest-frame UV line emission from the intergalactic medium at 2<z<5

    Full text link
    Rest-frame UV emission lines offer the possibility to directly image the gas around high-redshift galaxies with upcoming optical instruments. We use a suite of large, hydrodynamical simulations to predict the nature and detectability of emission lines from the intergalactic medium at 2<z<5. The brightest emission comes from HI Ly-alpha and the strongest metal line, CIII, is about an order of magnitude fainter, although HI Ly-alpha may be fainter if the gas is self-shielded to the UV background or if dust is important. The highest surface brightness regions for CIV, SiIII, SiIV and OVI are fainter than CIII by factors of a few. The NV and NeVIII lines, as well as HeII H-alpha, are substantially weaker but their maximum surface brightnesses still exceed 100 photon/cm^2/s/sr at z=2 (for 2" pixels). Lower ionisation lines arise in denser and colder gas that produces clumpier emission. The brightest HI Ly-alpha emission arises in highly overdense gas, but the highest surface brightness emission from high-ionisation metal lines traces a wider range of overdensities. Bright metal-line emission traces gas with temperatures close to the peak of the corresponding emissivity curve. While HI Ly-alpha, HeII H-alpha, CIII, SiIII, and SiIV are excellent probes of cold accretion flows and the colder parts of outflows, CIV, NV, OVI, and NeVIII are powerful tracers of the diffuse WHIM and galactic winds. A comparison of results from simulations with varying physical prescriptions demonstrates that the predictions for the brighter metal-line emission are robust to within factors of a few. Several emission lines from the high-redshift IGM will become detectable in the near future, possibly starting with the Cosmic Web Imager on Palomar. MUSE and the Keck Cosmic Web Imager have the potential to revolutionise studies of the interactions between high-redshift galaxies and their environment. (Abridged)Comment: 21 pages, 17 figures. Accepted for publication by MNRA

    Anion-polarisation--directed short-range-order in antiperovskite Li2_2FeSO

    Get PDF
    Short-range ordering in cation-disordered cathodes can have a significant effect on their electrochemical properties. Here, we characterise the cation short-range order in the antiperovskite cathode material Li2_2FeSO, using density functional theory, Monte Carlo simulations, and synchrotron X-ray pair-distribution-function data. We predict partial short-range cation-ordering, characterised by favourable OLi4_4Fe2_2 oxygen coordination with a preference for polar cis-OLi4_4Fe2_2 over non-polar trans-OLi4_4Fe2_2 configurations. This preference for polar cation configurations produces long-range disorder, in agreement with experimental data. The predicted short-range-order preference contrasts with that for a simple point-charge model, which instead predicts preferential trans-OLi4_4Fe2_2 oxygen coordination and corresponding long-range crystallographic order. The absence of long-range order in Li2_2FeSO can therefore be attributed to the relative stability of cis-OLi4_4Fe2_2 and other non-OLi4_4Fe2_2 oxygen-coordination motifs. We show that this effect is associated with the polarisation of oxide and sulfide anions in polar coordination environments, which stabilises these polar short-range cation orderings. We propose similar anion-polarisation-directed short-range-ordering may be present in other heterocationic materials that contain cations with different formal charges. Our analysis also illustrates the limitations of using simple point-charge models to predict the structure of cation-disordered materials, where other factors, such as anion polarisation, may play a critical role in directing both short- and long-range structural correlations

    IEEE P7001: A proposed standard on transparency

    Get PDF
    This paper describes IEEE P7001, a new draft standard on transparency of autonomous systems. In the paper, we outline the development and structure of the draft standard. We present the rationale for transparency as a measurable, testable property. We outline five stakeholder groups: users, the general public and bystanders, safety certification agencies, incident/accident investigators and lawyers/expert witnesses, and explain the thinking behind the normative definitions of “levels” of transparency for each stakeholder group in P7001. The paper illustrates the application of P7001 through worked examples of both specification and assessment of fictional autonomous systems
    corecore