8,474 research outputs found

    Artificial intelligence exceeds humans in epidemiological job coding

    Get PDF
    BACKGROUND: Work circumstances can substantially negatively impact health. To explore this, large occupational cohorts of free-text job descriptions are manually coded and linked to exposure. Although several automatic coding tools have been developed, accurate exposure assessment is only feasible with human intervention. METHODS: We developed OPERAS, a customizable decision support system for epidemiological job coding. Using 812,522 entries, we developed and tested classification models for the Professions et Catégories Socioprofessionnelles (PCS)2003, Nomenclature d'Activités Française (NAF)2008, International Standard Classifications of Occupation (ISCO)-88, and ISCO-68. Each code comes with an estimated correctness measure to identify instances potentially requiring expert review. Here, OPERAS' decision support enables an increase in efficiency and accuracy of the coding process through code suggestions. Using the Formaldehyde, Silica, ALOHA, and DOM job-exposure matrices, we assessed the classification models' exposure assessment accuracy. RESULTS: We show that, using expert-coded job descriptions as gold standard, OPERAS realized a 0.66-0.84, 0.62-0.81, 0.60-0.79, and 0.57-0.78 inter-coder reliability (in Cohen's Kappa) on the first, second, third, and fourth coding levels, respectively. These exceed the respective inter-coder reliability of expert coders ranging 0.59-0.76, 0.56-0.71, 0.46-0.63, 0.40-0.56 on the same levels, enabling a 75.0-98.4% exposure assessment accuracy and an estimated 19.7-55.7% minimum workload reduction. CONCLUSIONS: OPERAS secures a high degree of accuracy in occupational classification and exposure assessment of free-text job descriptions, substantially reducing workload. As such, OPERAS significantly outperforms both expert coders and other current coding tools. This enables large-scale, efficient, and effective exposure assessment securing healthy work conditions

    WP 2018-392

    Full text link
    Due to advances in computing power and the increase in coverage of longitudinal datasets in the Health and Retirement Study (HRS) that provide information about detailed occupations, demand has increased among researchers for improved occupation and industry data. The detailed data are currently hard to use because they were coded at different times, and the codeframes are, therefore, not consistent over time. Additionally, the HRS gathers new occupation and industry information from respondents every two years, and coding of new data at each wave is costly and time-consuming. In this project, we tested the NIOSH Industry and Occupation Computerized Coding System (NIOCCS) to see if it could improve processes for coding data from the HRS. We tested results from NIOCCS against results from a human coder for multiple datasets. NIOCCs does reasonably well compared to coding results from a highly-trained, professional occupation and industry coder, with kappa inter-rater reliability on detailed codes of just under 70 percent and agreement rates on broader codes of around 80 percent; however, code rates for NIOCCS for the datasets tested ranged from 60 percent to 72 percent, as compared to a professional coder’s ability to code those same datasets that ranged from 95 percent to 100 percent. In its current form, we find that NIOCCS is a tool that might be best used to reduce the number of cases human coders must code, either in coding historical data to a consistent codeframe or in coding data from future HRS waves. However, it is not yet ready to fully replace human coders.U.S. Social Security Administration, Award number RRC08098401-10, R-UM18-06https://deepblue.lib.umich.edu/bitstream/2027.42/148129/1/wp392.pd

    Occupation-Based Measures: an Overview and Discussion

    Get PDF
    Berufsbezogene Angaben gehören zu den vielseitigsten personenbezogenen Informationen, die in quantitativen Datensätzen zur Verfügung stehen. Ziel dieses Beitrags ist es, einen thematisch möglichst umfassenden Überblick über berufsbasierte Skalen und Instrumente zu geben. Im Mittelpunkt der Ausführungen stehen nicht nur die weit verbreiteten berufsbasierten Instrumente zur Analyse sozialer Schichtung, wie z. B. Prestigeskalen, sozioökonomische Indizes oder Klassenschemata, sondern wir behandeln auch Instrumente zur Erhebung beruflicher Tätigkeitsinhalte sowie Indikatoren zur Erfassung berufsspezifischer Gesundheitsrisiken, beruflicher Geschlechtersegregation oder beruflicher Schließung. Da die Qualität und Aussagekraft solcher Maßzahlen auch von der Qualität und Art der zugrunde liegenden Berufsinformationen abhängt, geben wir außerdem einen Überblick darüber, wie Berufe in Umfragen erfasst und codiert werden und welche Berufsklassifikationen dabei typischerweise zum Einsatz kommen. Wir hoffen, dadurch das Bewusstsein unserer Leserinnen und Leser für das Potenzial berufsbezogener Analysen zu schärfen sowie ihr Wissen über den richtigen Umgang mit berufsbasierten Skalen bei der Anwendung in empirischen Forschungsprojekten zu erhöhen.Occupational information is among the most versatile categories of information about a person available in quantitative data. The goal of this paper is to provide an overview of occupation-based measures in different topic areas. These include not only measures for analyzing social stratification, such as prestige scales, socioeconomic indices and class schemes but also measures of workplace tasks, occupation-specific health risks, gender segregation, and occupational closure. Moreover, as the quality of such data depends on the quality of the underlying occupational information, we also provide an overview of how to collect occupational information in surveys, how to code this information, and how occupational classifications are commonly used. By doing so, we hope to increase researchers’ awareness of the potential of occupation-based analyses, as well as their knowledge of how to properly handle such measures in empirical analyses

    Occupational Trends in New Zealand: 1991-2001

    Get PDF
    This research provides a useful insight into the occupational evolution of the New Zealand labour market. Our presentation looks at three different areas and the research paper is divided accordingly. The paper begins with an analysis of the conceptual basis of occupational classifications used in New Zealand. This is done because the classification system forms the basis of the quality and amount of occupational employment information that can be used for historical trends. The NZSC099 is a skills-based classification system therefore the paper examines the strengths and limitations of the way that the NZSC099 uses skills information. The paper then follows with an explanation of how the research team constructed a time series of occupational employment using data from the 1991, 1996 and 2001 Census of Population and Dwellings. The paper concludes with some initial results from an analysis of trends in the occupational structure of the New Zealand labour market between 1991 and 2001 using this Census data. This section comprises key explanatory figures and charts of longitudinal trends

    Three Methods for Occupation Coding Based on Statistical Learning

    Get PDF
    Occupation coding, an important task in official statistics, refers to coding a respondent's text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches

    Machine learning classification of entrepreneurs in British historical census data

    Get PDF
    This paper presents a binary classification of entrepreneurs in British historical data based on the recent availability of big data from the I-CeM dataset. The main task of the paper is to attribute an employment status to individuals that did not fully report entrepreneur status in earlier censuses (1851-1881). The paper assesses the accuracy of different classifiers and machine learning algorithms, including Deep Learning, for this classification problem. We first adopt a ground-truth dataset from the later censuses to train the computer with a Logistic Regression (which is standard in the literature for this kind of binary classification) to recognize entrepreneurs distinct from non-entrepreneurs (i.e. workers). Our initial accuracy for this base-line method is 0.74. We compare the Logistic Regression with ten optimized machine learning algorithms: Nearest Neighbors, Linear and Radial Support Vector Machine, Gaussian Process, Decision Tree, Random Forest, Neural Network, AdaBoost, Naive Bayes, and Quadratic Discriminant Analysis. The best results are boosting and ensemble methods. AdaBoost achieves an accuracy of 0.95. Deep-Learning, as a standalone category of algorithms, further improves accuracy to 0.96 without using the rich text-data that characterizes the OccString feature, a string of up to 500 characters with the full occupational statement of each individual collected in the earlier censuses. Finally, and now using this OccString feature, we implement both shallow (bag-of-words algorithm) learning and Deep Learning (Recurrent Neural Network with a Long Short-Term Memory layer) algorithms. These methods all achieve accuracies above 0.99 with Deep Learning Recurrent Neural Network as the best model with an accuracy of 0.9978. The results show that standard algorithms for classification can be outperformed by machine learning algorithms. This confirms the value of extending the techniques traditionally used in the literature for this type of classification problem.ESRC Leverhulme Trust Isaac Newton Trus

    Inferring Mechanisms for Global Constitutional Progress

    Full text link
    Constitutions help define domestic political orders, but are known to be influenced by two international mechanisms: one that reflects global temporal trends in legal development, and another that reflects international network dynamics such as shared colonial history. We introduce the provision space; the growing set of all legal provisions existing in the world's constitutions over time. Through this we uncover a third mechanism influencing constitutional change: hierarchical dependencies between legal provisions, under which the adoption of essential, fundamental provisions precedes more advanced provisions. This third mechanism appears to play an especially important role in the emergence of new political rights, and may therefore provide a useful roadmap for advocates of those rights. We further characterise each legal provision in terms of the strength of these mechanisms

    European Values Study (EVS) 2017: Methodological Guidelines

    Get PDF
    The EVS 2017 Methodological Guidelines comprise the recommendations and standards designed for the lifecycle phases of the EVS wave 2017 and agreed with the participating countries

    Opportunities and challenges in new survey data collection methods using apps and images.

    Get PDF
    Surveys are well established as an effective way of collecting social science data. However, they may lack the detail, or not measure the concepts, necessary to answer a wide array of social science questions. Supplementing survey data with data from other sources offer opportunities to overcome this. The use of mobile technologies offers many such new opportunities for data collection. New types of data might be able to be collected, or it may be possible to collect existing data types in new and innovative ways .As well as these new opportunities, there are new challenges. Again, these can both be unique to mobile data collection, or existing data collection challenges that are altered by using mobile devices to collect the data.The data used is from a study that makes use of an app for mobile devices to collect data about household spending, the Understanding Society Spending Study One. Participants were asked to report their spending by submitting a photo of a receipt, entering information about a purchase manually, or reporting that they had not spent anything that day. Each substantive chapter offers a piece of research exploring a different challenge posed by this particular research context. Chapter one explores the challenge presented by respondent burden in the context of mobile data collection. Chapter two considers the challenge of device effects. Chapter three examines the challenge of coding large volumes of organic data. The thesis concludes by reflecting on how the lessons learnt throughout might inform survey practice moving forward. Whilst this research focuses on one particular application it is hoped that this serves as a microcosm for contributing to the discussion of the wider opportunities and challenges faced by survey research as a field moving forward
    corecore