    How valid can data fusion be?

    "Data fusion techniques typically aim to achieve a complete data file from different sources which do not contain the same units. Traditionally, this is done on the basis of variables common to all files. It is well known that those approaches establish conditional independence of the specific variables given the common variables, although they may be conditionally dependent in reality. We discuss the objectives of data fusion in the light of their feasibility and distinguish four levels of validity that a fusion technique may achieve. For a rather general situation, we derive the feasible set of correlation matrices for the variables not jointly observed and suggest a new quality index for data fusion. Finally, we present a suitable and effcient multiple imputation procedure to make use of auxiliary information and to overcome the conditional independence assumption." (Author's abstract, IAB-Doku) ((en))Datenfusion, Datenaufbereitung, Datenqualität, Korrelation, Validität, angewandte Statistik, mathematische Statistik, Imputationsverfahren

    Beat the heap: An imputation strategy for valid inferences from rounded income data

    Questions on income in surveys are prone to two sources of errors that can cause bias if not addressed adequately at the analysis stage. On the one hand, income is considered sensitive information and response rates on income questions generally tend to be lower than response rates for other non-sensitive questions. On the other hand respondents usually don't remember their exact income and thus tend to provide a rounded estimate. The negative effects of item nonresponse are well studied and most statistical agencies have developed sophisticated imputation methods to correct for this potential source of bias. However, to our knowledge the effects of rounding are hardly ever considered in practice, despite the fact that several studies have found strong evidence that most of the respondents round their reported income values. In this paper we illustrate the substantial impact that rounding can have on important measures derived from the income variable such as the poverty rate. To obtain unbiased estimates, we propose a two stage imputation strategy that estimates the posterior probability for rounding given the observed income values at the first stage and re-imputes the observed income values given the rounding probabilities at the second stage. A simulation study shows that the proposed imputation model can help overcome the possible negative effects of rounding. We also present results based on the household income variable from the German panel study 'Labour Market and Social Security.'Befragungen zu Einkommensverhältnissen sind typischerweise von zwei Fehlerquellen betroffen, die zu Verzerrungen führen können, wenn sie bei der Analyse nicht berücksichtigt werden: Auf der einen Seite gilt das Einkommen als sensible Information und die Antwortraten zum Einkommen liegen in der Regel niedriger als Antwortraten bei anderen nicht sensiblen Fragen. Auf der anderen Seiten können sich die Befragten in aller Regel nicht genau an ihr exaktes Einkommen erinnern und geben daher einen gerundeten Wert an. Die negativen Auswirkungen des Antwortausfalls sind bereits gründlich untersucht worden und die meisten datenbereitstellenden Institutionen haben bereits Imputationsmethoden implementiert um möglichen Verzerrungen durch den Ausfall entegegenzuwirken. Im Gegensatz dazu werden die Auswirkungen des Rundens nach unserer Kenntnis bisher in der Praxis weitestgehend vernachlässigt, obwohl etliche Studien deutlich gezeigt haben, dass die meisten Befragten Ihrer Einkommensangaben runden. In diesem Papier veranschaulichen wir den starken Einfluss, den dieses Runden auf wichtige Kennziffern wie die Armutsquote haben kann. Um unverzerrte Schätzergebnisse zu erhalten, stellen wir ein zweistufiges Imputationsverfahren vor, bei dem in einem ersten Schritt gegeben das beobachtete Einkommen die a posteriori Wahrscheinlichkeit zu Runden geschätzt wird. In einem zweiten Schritt wird dann das tatsächliche Einkommen unter den bestimmten Rundungswahrscheinlichkeiten imputiert. Anhand einer Simulationsstudie illustrieren wir, dass es mit diesem Verfahren möglich ist, unverzerrte Schätzergebnisse zu gewinnen. Darüber hinaus präsentieren wir Ergebnisse auf Basis der IAB Längsschnittstudie 'Panel Arbeitsmarkt und Soziale Sicherung (PASS)'

    Data fusion techniques typically aim to achieve a complete data file from different sources which do not contain the same units. Traditionally, this is done on the basis of variables common to all files. It is well known that those approaches establish conditional independence of the specific variables given the common variables, although they may be conditionally dependent in reality. We discuss the objectives of data fusion in the light of their feasibility and distinguish four levels of validity that a fusion technique may achieve. For a rather general situation, we derive the feasible set of correlation matrices for the variables not jointly observed and suggest a new quality index for data fusion. Finally, we present a suitable and effcient multiple imputation procedure to make use of auxiliary information and to overcome the conditional independence assumption

    MI Double Feature: Multiple Imputation to Address Nonresponse and Rounding Errors in Income Questions

    Obtaining reliable income information in surveys is difficult for two reasons. On the one hand, many survey respondents consider income to be sensitive information and thus are reluctant to answer questions regarding their income. If those survey participants that do not provide information on their income are systematically different from the respondents - and there is ample of research indicating that they are - results based only on the observed income values will be misleading. On the other hand, respondents tend to round their income. Especially this second source of error is usually ignored when analyzing the income information. In a recent paper, Drechsler and Kiesl (2014) illustrated that inferences based on the collected information can be biased if the rounding is ignored and suggested a multiple imputation strategy to account for the rounding in reported income. In this paper we extend their approach to also address the nonresponse problem. We illustrate the approach using the household income variable from the German panel study "Labor Market and Social Security''

    Revision der IAB-Stellenerhebung: Hintergründe, Methode und Ergebnisse

    Die Stellenerhebung des Instituts für Arbeitsmarkt- und Berufsforschung (IAB) liefert quartalsweise repräsentative Daten über Anzahl und Struktur der offenen Stellen, die aus anderen Quellen nicht verfügbar und deshalb einmalig sind. Einbezogen sind gemeldete und nicht gemeldete offene Stellen. Umfangreiche Überprüfungen und Tests führten zur Entwicklung eines neuen Hochrechnungsverfahrens. Im Ergebnis kommt es zu einer Abwärtsrevision beim gesamtwirtschaftlichen Stellenangebot. Der hier vorgelegte Forschungsbericht gibt zunächst einen Überblick über Ziele und Inhalte der IAB-Stellenerhebung und beschreibt anschließend die einzelnen Schritte bei der Entwicklung eines neuen Hochrechnungsverfahrens, ausgehend vom bisher angewendeten Verfahren. Er präsentiert die neue Hochrechnungsmethode und zeigt, dass sich durch ihre Anwendung die Qualität der Befragungsergebnisse verbessert. Mit der neuen Hochrechnung werden für das gesamtwirtschaftliche Stellenangebot revidierte Zeitreihen bis zum Jahr 2000 zurück vorgelegt, wobei die Vergleichbarkeit zwischen dem Zeitraum vor 2010 und ab 2010 eingeschränkt ist. Der Forschungsbericht präsentiert die Ergebnisse für beide Zeiträume und jeweils für den Vergleich von neuer und alter Hochrechnung.The German Job Vacancy Survey delivers representative data on the number and structure of vacancies in Germany. Such data cannot be derived from other sources and are therefore unique. The survey includes registered and non-registered vacancies. In course of extensive tests and reviews a new extrapolation procedure has been developed. As a result, the aggregate vacancy supply is revised downwards. The research report is organised as follows: Firstly, an overview about the aim and content of the German Job Vacancy Survey is given. Subsequently, the evolution of the new extrapolation procedure is described. Thirdly, the new method is presented and it is shown that the adaption of it significantly improves the quality of the survey results. Along with the new extrapolation procedure a revised time series dating back to 2000 is given. However, figures before and after 2010 cannot be directly compared. The research report presents the results for both time periods and compares the new and old extrapolation methods

    Codebook and documentation of the panel study 'Labour Market and Social Security' (PASS) : Volume I: Introduction and overview. Wave 2 (2007/2008)

    "The panel study 'Labour Market and Social Security' (PASS), established by the Institute for Employment Research (IAB), is a new dataset for labour market, welfare state and poverty research in Germany, creating a new empirical basis for the scientific community and for policy advice. This Datenreport provides an overview of the second survey wave, for which 12,487 individuals were interviewed in 8,429 households between December 2007 and July 2008. 10,114 individuals and 7,342 households were interviewed for the second time in the context of PASS. The spectrum of questions and the design of PASS are intended to close gaps in the existing stock of data. PASS has three main characteristics that extend analysis potential beyond that of the Federal Employment Agency's administrative data: 1. The panel takes the household context into account - including the situation before and after receipt of Unemployment Benefit II. 2. The panel is complete in that it covers all groups of persons and all employment biographies, not only people in dependent employment, unemployed people and those in need of assistance. The dataset also provides information on the status during phases of economic inactivity, self-employment or employment as civil servants. 3. The panel collects additional or significantly more detailed data on relevant characteristics such as attitudes, employment potential or job-search behaviour." (Author's abstract, IAB-Doku) ((en)) Additional Information Questionnaires of the second wave. Here you can find the German version. Further information about the panel study "Labour Market and Social Security".IAB-Haushaltspanel, Datengewinnung, Erhebungsmethode, Stichprobe, Panel - Methode, Datenaufbereitung

    Codebuch und Dokumentation des 'Panel Arbeitsmarkt und soziale Sicherung' (PASS) : Welle 2 (2007/2008)

    "The panel study 'Labour Market and Social Security' (PASS), established by the Institute for Employment Research (IAB), is a new dataset for labour market, welfare state and poverty research in Germany, creating a new empirical basis for the scientific community and for policy advice. This "Datenreport" written in German provides an overview of the second survey wave, for which 12,487 individuals were interviewed in 8,429 households between December 2007 and July 2008. 10,114 individuals and 7,342 households were interviewed for the second time in the context of PASS. The spectrum of questions and the design of PASS are intended to close gaps in the existing stock of data. PASS has three main characteristics that extend analysis potential beyond that of the Federal Employment Agency's administrative data: 1. The panel takes the household context into account - including the situation before and after receipt of Unemployment Benefit II. 2. The panel is complete in that it covers all groups of persons and all employment biographies, not only people in dependent employment, unemployed people and those in need of assistance. The dataset also provides information on the status during phases of economic inactivity, self-employment or employment as civil servants. 3. The panel collects additional or significantly more detailed data on relevant characteristics such as attitudes, employment potential or job-search behaviour." (Author's abstract, IAB-Doku) ((en)) The english version of this "Datenreport" you can find here: http://fdz.iab.de/187/section.aspx/Publikation/k100607a04 Additional Information Hier finden Sie Band I des Datenreports: Einführung und Überblick Hier finden Sie Band II: Codebuch Haushaltsdatensatz Hier finden Sie Band III: Codebuch Personendatensatz Hier finden Sie Band IV: Codebuch Spelldaten, Registerdaten und Gewichte Fragebögen der 2. Welle Hier finden Sie die englische Version des Datenreports. Weitere Informationen zum Panel "Arbeitsmarkt und Soziale Sicherung".IAB-Haushaltspanel, Datengewinnung, Erhebungsmethode, Stichprobe, Panel - Methode, Datenaufbereitung

    An artificial intelligence algorithm is highly accurate for detecting endoscopic features of eosinophilic esophagitis

    The endoscopic features associated with eosinophilic esophagitis (EoE) may be missed during routine endoscopy. We aimed to develop and evaluate an Artificial Intelligence (AI) algorithm for detecting and quantifying the endoscopic features of EoE in white light images, supplemented by the EoE Endoscopic Reference Score (EREFS). An AI algorithm (AI-EoE) was constructed and trained to differentiate between EoE and normal esophagus using endoscopic white light images extracted from the database of the University Hospital Augsburg. In addition to binary classification, a second algorithm was trained with specific auxiliary branches for each EREFS feature (AI-EoE-EREFS). The AI algorithms were evaluated on an external data set from the University of North Carolina, Chapel Hill (UNC), and compared with the performance of human endoscopists with varying levels of experience. The overall sensitivity, specificity, and accuracy of AI-EoE were 0.93 for all measures, while the AUC was 0.986. With additional auxiliary branches for the EREFS categories, the AI algorithm (AI-EoE-EREFS) performance improved to 0.96, 0.94, 0.95, and 0.992 for sensitivity, specificity, accuracy, and AUC, respectively. AI-EoE and AI-EoE-EREFS performed significantly better than endoscopy beginners and senior fellows on the same set of images. An AI algorithm can be trained to detect and quantify endoscopic features of EoE with excellent performance scores. The addition of the EREFS criteria improved the performance of the AI algorithm, which performed significantly better than endoscopists with a lower or medium experience level