15 research outputs found

    Reusing Data and Metadata to Create New Metadata Through Machine-Learning & Other Programmatic Methods

    Get PDF
    Recent improvements in natural language processing (NLP) enable metadata to be created programmatically from reused original metadata or even the dataset itself. Transfer-learning applied to NLP has greatly improved performance and reduced training data requirements. In this talk, well compare machine-generated metadata to human-generated metadata and discuss characteristics of metadata and data archives that affect suitability for machine-learning reuse of metadata. Where as human-generated metadata is often populated once, populated from the perspective of data supplier, populated by many individuals with different words for the same thing, and limited in length, machine-generated metadata can be updated any number of times, generated from the perspective of any user, constrained to a standardized set of terms that can be evolved over time, and be any length required. Machine-learning generated metadata offers benefits but also additional needs in terms of version control, process transparency, human-computer interaction, and IT requirements. As a successful example, well discuss how a dataset of abstracts and associated human-tagged keywords from a standardized list of several thousand keywords were used to create a machine-learning model that predicted keyword metadata for open-source code projects on code.nasa.gov. Well also discuss a less successful example from data.nasa.gov to show how data archive architecture and characteristics of initial metadata can be strong controls on how easy it is to leverage programmatic methods to reuse metadata to create additional metadata

    Determining Research Priorities for Astronomy Using Machine Learning

    No full text
    Abstract We summarize the first exploratory investigation into whether Machine Learning techniques can augment science strategic planning. We find that an approach based on Latent Dirichlet Allocation using abstracts drawn from high-impact astronomy journals may provide a leading indicator of future interest in a research topic. We show two topic metrics that correlate well with the high-priority research areas identified by the 2010 National Academies’ Astronomy and Astrophysics Decadal Survey. One metric is based on a sum of the fractional contribution to each topic by all scientific papers (“counts”) while the other is the Compound Annual Growth Rate of counts. These same metrics also show the same degree of correlation with the whitepapers submitted to the same Decadal Survey. Our results suggest that the Decadal Survey may under-emphasize fast growing research. A preliminary version of our work was presented by Thronson et al.</jats:p

    Towards Adaptive Metadata Enrichment

    No full text
    Identifying data sources within a given paper remains a crucial, yet labor-intensive task. We’ve made a system to identify papers that use SOHO (Solar and Heliospheric Observatory) data. Due to the generality of our approach, the system can be easily adapted to other data sources and metadata more generally

    Autoregressive Conditional Neural Processes

    No full text
    Conditional neural processes (CNPs; Garnelo et al., 2018a) are attractive meta-learning models which produce well-calibrated predictions and are trainable via a simple maximum likelihood procedure. Although CNPs have many advantages, they are unable to model dependencies in their predictions. Various works propose solutions to this, but these come at the cost of either requiring approximate inference or being limited to Gaussian predictions. In this work, we instead propose to change how CNPs are deployed at test time, without any modifications to the model or training procedure. Instead of making predictions independently for every target point, we autoregressively define a joint predictive distribution using the chain rule of probability, taking inspiration from the neural autoregressive density estimator (NADE) literature. We show that this simple procedure allows factorised Gaussian CNPs to model highly dependent, non-Gaussian predictive distributions. Perhaps surprisingly, in an extensive range of tasks with synthetic and real data, we show that CNPs in autoregressive (AR) mode not only significantly outperform non-AR CNPs, but are also competitive with more sophisticated models that are significantly more computationally expensive and challenging to train. This performance is remarkable given that AR CNPs are not trained to model joint dependencies. Our work provides an example of how ideas from neural distribution estimation can benefit neural processes, and motivates research into the AR deployment of other neural process model

    Patch Testing to Methyldibromoglutaronitrile/Phenoxyethanol

    No full text
    Background Methyldibromoglutaronitrile/phenoxyethanol (MDBGN/PE) is a broad-spectrum preservative mixture used in consumer and industrial products. Objectives The aims of the study were (1) to characterize the prevalence and clinical relevance of patch test reactions to MDBGN/PE and the epidemiology of positive patients and (2) to determine the frequency of concomitant reactions of MDBGN/PE and its components. Methods This study used a retrospective analysis of cross-sectional data compiled by the North American Contact Dermatitis Group from 1994 to 2018. Results Of 55,477 tested patients, 2674 (4.8%) had positive patch test reactions to MDBGN/PE (1.0%-2.5% petrolatum [pet]); most were + (63.3%) or ++ (22.3%). Clinical relevance was considered definite in 3.0% and probable in 19.3% of reactions. Common dermatitis sites included the hands (26.4%), scattered/generalized distribution (24.7%), and the face (18.3%). Patients with a positive reaction to MDBGN/PE and/or MDBGN and/or PE were significantly more likely to be male and older than 40 years and/or had hand dermatitis (P ≤ 0.0033). Positivity to MDBGN/PE 2.0% pet decreased significantly over time (from 6.0% in 1998-2000 to 2.5% in 2017-2018, P \u3c 0.0001). Personal care products were the most common exposure source (53.2%). Conclusions Over time, positivity to MDBGN/PE 2.0% pet decreased significantly from 6.0% (in 1998-2000) to 2.5% (in 2017-2018). The high proportion of weak (63.3%) reactions underscore the need for careful interpretation of patch test sites. Important demographic associations included male sex and age older than 40 years

    Patch Testing to Methyldibromoglutaronitrile/Phenoxyethanol: North American Contact Dermatitis Group Experience, 1994-2018

    No full text
    Background Methyldibromoglutaronitrile/phenoxyethanol (MDBGN/PE) is a broad-spectrum preservative mixture used in consumer and industrial products. Objectives The aims of the study were (1) to characterize the prevalence and clinical relevance of patch test reactions to MDBGN/PE and the epidemiology of positive patients and (2) to determine the frequency of concomitant reactions of MDBGN/PE and its components. Methods This study used a retrospective analysis of cross-sectional data compiled by the North American Contact Dermatitis Group from 1994 to 2018. Results Of 55,477 tested patients, 2674 (4.8%) had positive patch test reactions to MDBGN/PE (1.0%-2.5% petrolatum [pet]); most were + (63.3%) or ++ (22.3%). Clinical relevance was considered definite in 3.0% and probable in 19.3% of reactions. Common dermatitis sites included the hands (26.4%), scattered/generalized distribution (24.7%), and the face (18.3%). Patients with a positive reaction to MDBGN/PE and/or MDBGN and/or PE were significantly more likely to be male and older than 40 years and/or had hand dermatitis (P ≤ 0.0033). Positivity to MDBGN/PE 2.0% pet decreased significantly over time (from 6.0% in 1998-2000 to 2.5% in 2017-2018, P \u3c 0.0001). Personal care products were the most common exposure source (53.2%). Conclusions Over time, positivity to MDBGN/PE 2.0% pet decreased significantly from 6.0% (in 1998-2000) to 2.5% (in 2017-2018). The high proportion of weak (63.3%) reactions underscore the need for careful interpretation of patch test sites. Important demographic associations included male sex and age older than 40 years
    corecore