1,657 research outputs found

    Stubborn Lexical Bias in Data and Models

    Full text link
    In NLP, recent work has seen increased focus on spurious correlations between various features and labels in training data, and how these influence model behavior. However, the presence and effect of such correlations are typically examined feature by feature. We investigate the cumulative impact on a model of many such intersecting features. Using a new statistical method, we examine whether such spurious patterns in data appear in models trained on the data. We select two tasks -- natural language inference and duplicate-question detection -- for which any unigram feature on its own should ideally be uninformative, which gives us a large pool of automatically extracted features with which to experiment. The large size of this pool allows us to investigate the intersection of features spuriously associated with (potentially different) labels. We then apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations, and examine how doing so affects models trained on the reweighted data. Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models, including worsened bias for slightly more complex features (bigrams). We close with discussion about the implications of our results on what it means to "debias" training data, and how issues of data quality can affect model bias.Comment: ACL Findings 202

    Diode Laser-Induced Fluorescence of Xenon Ion Velocity Distributions

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/77011/1/AIAA-2005-4406-547.pd

    Retrofitting Word Vectors to Semantic Lexicons

    Full text link
    Vector space word representations are learned from distributional information of words in large corpora. Although such statistics are semantically informative, they disregard the valuable information that is contained in semantic lexicons such as WordNet, FrameNet, and the Paraphrase Database. This paper proposes a method for refining vector space representations using relational information from semantic lexicons by encouraging linked words to have similar vector representations, and it makes no assumptions about how the input vectors were constructed. Evaluated on a battery of standard lexical semantic evaluation tasks in several languages, we obtain substantial improvements starting with a variety of word vector models. Our refinement method outperforms prior techniques for incorporating semantic lexicons into the word vector training algorithms.Comment: Proceedings of NAACL 201

    Insights into the structure and self-assembly of organic-semiconductor/quantum-dot blends

    Get PDF
    Controlling the dispersibility of crystalline inorganic quantum dots (QD) within organic-QD nanocomposite films is critical for a wide range of optoelectronic devices. A promising way to control nanoscale structure in these nanocomposites is via the use of appropriate organic ligands on the QD, which help to compatibilize them with the organic host, both electronically and structurally. Here, using combined small-angle X-ray and neutron scattering, the authors demonstrate and quantify the incorporation of such a compatibilizing, electronically active, organic semiconductor ligand species into the native oleic acid ligand envelope of lead sulphide, QDs, and how this ligand loading may be easily controlled. Further more, in situ grazing incidence wide/small angle X-ray scattering demonstrate how QD ligand surface chemistry has a pronounced effect on the self-assembly of the nanocomposite film in terms of both small-molecule crystallization and QD dispersion versus ordering/aggregation. The approach demonstrated here shows the important role which the degree of incorporation of an active ligand, closely related in chemical structure to the host small-molecule organic matrix, plays in both the self-assembly of the QD and small-molecule components and in determining the final optoelectronic properties of the system

    pyveg: A Python package for analysing the time evolution of patterned vegetation using Google Earth Engine

    Get PDF
    Periodic vegetation patterns (PVP) arise from the interplay between forces that drive the growth and mortality of plants. Inter-plant competition for resources, in particular water, can lead to the formation of PVP. Arid and semi-arid ecosystems may be under threat due to changing precipitation dynamics driven by macroscopic changes in climate. These regions display some noteable examples of PVP, for example the “tiger bush” patterns found in West Africa. The morphology of the periodic pattern has been suggested to be linked to the resilience of the ecosystem (Mander et al., 2017; Trichon et al., 2018). Using remote sensing techniques, vegetation patterns in these regions can be studied, and an analysis of the resilience of the ecosystem can be performed. The pyveg package implements functionality to download and process data from Google Earth Engine (GEE), and to subsequently perform a resilience analysis on the aquired data. PVP images are quantified using network centrality metrics. The results of the analysis can be used to search for typical early warning signals of an ecological collapse (Dakos et al., 2008). Google Earth Engine Editor scripts are also provided to help researchers discover locations of ecosystems which may be in decline. pyveg is being developed as part of a research project looking for evidence of early warning signals of ecosystem collapse using remote sensing data. pyveg allows such research to be carried out at scale, and hence can be an important tool in understanding changing arid and semi-arid ecosystem dynamics. An evolving list of PVP locations, obtained through both literature and manual searches, is included in the package at pyveg/coordinates.py. The structure of the package is outlined in Figure 1, and is discussed in more detail in the following sections
    • …
    corecore