484 research outputs found

    From raw numbers to robust evidence: finding fact, avoiding fiction

    Get PDF
    Data are key to empirical research. But data by themselves are not yet information. Raw numbers need to be transformed into measurements and, finally, into robust evidence, which can be used to help designing evidence-based policies. In this thesis, three different steps in this transformation are examined: (i) collecting good-quality data; (ii) quantifying concepts and (iii) accounting for the imperfections in quantified concepts to obtain robust evidence. Different challenges are encountered at every step. This thesis focuses on household survey data from developing countries collected by universities, NGOs or (inter)national institutions with the explicit objective of `enhancing the evidence base'. Household surveys are still the most important source of information in developing countries where administrative data are often incomplete and where `big data', such as data from mobile phones, are still in their infancy. This is unlikely to change in the near future. Monitoring the implementation of the Sustainable Development Goals is likely to increase the demand for household surveys even further. More awareness about the process of transforming raw numbers from household survey into robust evidence is therefore indispensable. The first critical step towards robust evidence is collecting high quality data since using `wrong numbers' will lead to the `wrong results'. It is often argued that the lack of data in developing countries impedes the design of sensible policies. Perhaps even more critical, however, are data of poor quality that are used to design policies or to support far-reaching reforms. The first case study in this thesis illustrates that this is indeed a real threat. Different datasets that purport to measure the impact of large-scale and controversial agricultural reforms on yields in Rwanda provide very different results. However, only the most positive estimates have been incorporated into the international data management system of the FAO, amplifying the risk that these numbers will be accepted as the `truth' and possibly used for policy design elsewhere. The second step in the transformation of raw numbers into robust evidence requires quantifying theoretical concepts. The difficulty here is that these concepts are often not directly observable. Household surveys, for instance, are frequently designed to measure the concepts of poverty or food security. Yet, these concepts are not directly observable and require the development of measurement instruments. These measurement instruments are based on a set of rules that define how observable household characteristics should be translated into the unobservable concept. The development of such measurement instruments is challenging and involves making many different assumptions. Moreover, one can always question whether the final measurement instrument measures the concept it is intended to measure and under what circumstances it measures the concept precisely and accurately. Addressing these questions in the social sciences is notoriously difficult because of the lack of gold standards or the absence of benchmarks against which a newly developed measurement instrument can be assessed. Moreover, the validity of measurement instruments should ideally be tested in many different contexts. However, in practice, social scientists work outside of a laboratory and cannot manipulate the context in which they operate. In this thesis, the challenge of quantifying concepts is illustrated by evaluating the validity of four measurement instruments: GPS to measure the directly observable concept of land area and three poverty and food insecurity indicators, which quantify unobservable concepts. The evaluation of GPS measurement of land area is straightforward as it can be assessed against the gold standard of compass and rope measurement. The evaluation of food security and poverty indicators requires more creativity since gold standards are unavailable. The three case studies of poverty and food security indicators are used to illustrate three different aspect of validity: cross-sectional validity, inter-temporal validity and internal validity. The first indicator, the Progress out of Poverty Index (PPI) in Rwanda, is benchmarked against expenditure data. It turns out that this indicator is cross-sectionally valid, that is, it consistently distinguishes poor from non-poor households. The second indicator, the Household Food Insecurity Access Scale (HFIAS), is benchmarked against total agricultural production. This indicator is cross-sectionally valid, but its inter-temporal validity is questionable. While total food production decreased over a period of five years, the HFIAS pointed towards an improved food security situation over the same period. This implies that the indicator cannot be used to monitor the evolution of food security over time. The third food security indicator, the Household Dietary Diversity Score (HDDS), is not assessed against an external benchmark. Instead, its internal validity is evaluated using Rasch models. In other words, it is analyzed if the different food groups included in the HDDS measure a single underlying concept. This is not the case, raising the question of what the HDDS actually measures. Even with good-quality data and excellent measurement instruments, concepts may still be imprecisely or inaccurately measured. Hence, the third and final step of the transformation of raw numbers into robust evidence consists of accounting for these imperfections when establishing (causal) relations between two (or more) imperfectly measured concepts. To illustrate the relevance of accounting for measurement error, it is shown that imprecise measurement of the harvest at plot level can generate a spurious, negative correlation between productivity and plot size. This has implications for the stylized fact of the inverse productivity-size relationship. The transformation of raw numbers into robust evidence is a long journey with several steps along the way, all of which are decisive for the final outcome. At every step, new challenges need to be tackled. This requires skilful interventions by researchers and an open discussion about the minimum set of assumptions needed to overcome the challenges. These steps also hold some implications for the interpretation of the final outcome of the journey: robust evidence. A first policy implication is that the academic community pays more attention to the issue of data quality. The compulsory publication of the data alongside journal articles would be an important first step in this process. In addition, studying systematic measurement error can help to limit bias in empirical work and to improve survey design. A second implication has to do with the development of measurement instruments, and in particular, poverty and food security indicators. There is definitely a demand for indicators that can quickly estimate the prevalence of poverty and food insecurity at a regional level in order to monitor development programmes, target the most vulnerable household and design policies. Yet, with so many indicators in existence, choosing the one that is most useful for the purpose at hand is complicated since every indicator has its own strengths and weaknesses. More validation exercises of existing indicators could help to clarify the circumstances under which a particular indicator works and/or is useful. An important advantage of these `validity exercises' is that researchers will remain keenly aware of the shortcomings of a particular indicator, which are likely to be context-specific. Given the existence of so many indicators one can argue that the validation of existing indicators should be prioritized over the development of yet more indicators. Finally, we should remain aware that the principal driver for funding the collection and interpretation of raw numbers is the call for more `evidence-based policy'. The main –- and perhaps unexpected –- lesson of this thesis is that `quantitative evidence' should not be considered the gold standard for the design of evidence-based policies. Quantitative evidence is man-made and needs to be complemented by other sets of evidence when designing policies. Researchers should be at the forefront of weighing the quality of different evidence bases and of attempting to synthesize them

    Labour costs and the decision to hire the first employee

    Get PDF

    How effective are hiring subsidies to reduce long-term unemployment among prime-aged jobseekers? Evidence from Belgium

    Get PDF
    Hiring subsidies are widely used to create (stable) employment for the long-term unemployed. This paper exploits the abolition of a hiring subsidy targeted at long-term unemployed jobseekers over 45 years of age in Belgium to evaluate its effectiveness in the short and medium run. Based on a triple difference methodology the hiring subsidy is shown to increase the job finding rate by 13% without any evidence of spill-over effects. This effect is driven by a positive effect on individuals with at least a bachelor’s degree. However, the hiring subsidy mainly created temporary short-lived employment: eligible jobseekers were not more likely to find employment that lasted at least twelve consecutive months than ineligible jobseekers

    mspire: mass spectrometry proteomics in Ruby

    Get PDF
    Summary: Mass spectrometry-based proteomics stands to gain from additional analysis of its data, but its large, complex datasets make demands on speed and memory usage requiring special consideration from scripting languages. The software library ‘mspire’—developed in the Ruby programming language—offers quick and memory-efficient readers for standard xml proteomics formats, converters for intermediate file types in typical proteomics spectral-identification work flows (including the Bioworks .srf format), and modules for the calculation of peptide false identification rates

    The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection

    Get PDF
    The 19th annual Database Issue of Nucleic Acids Research features descriptions of 92 new online databases covering various areas of molecular biology and 100 papers describing recent updates to the databases previously described in NAR and other journals. The highlights of this issue include, among others, a description of neXtProt, a knowledgebase on human proteins; a detailed explanation of the principles behind the NCBI Taxonomy Database; NCBI and EBI papers on the recently launched BioSample databases that store sample information for a variety of database resources; descriptions of the recent developments in the Gene Ontology and UniProt Gene Ontology Annotation projects; updates on Pfam, SMART and InterPro domain databases; update papers on KEGG and TAIR, two universally acclaimed databases that face an uncertain future; and a separate section with 10 wiki-based databases, introduced in an accompanying editorial. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and now lists 1380 databases. Brief machine-readable descriptions of the databases featured in this issue, according to the BioDBcore standards, will be provided at the http://biosharing.org/biodbcore web site. The full content of the Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/)

    Statistical profiling of unemployed jobseekers : the increasing availability of big data allows for the profiling of unemployed jobseekers via statistical models

    Get PDF
    Statistical models can help public employment services to identify factors associated with long-term unemployment and to identify groups at risk. Statistical profiling models will probably even become more prominent as new machine learning techniques in combination with the increasing availability of big data will improve their predictive power. However, a policy maker cannot just define an outcome variable at the start of the project and walk away: a continuous dialogue between data analysts, policy makers and caseworkers is very important. Indeed, throughout the process, normative decisions are to be made: profiling practices misclassify many individuals. They can reinforce but also prevent existing patterns of discrimination

    MAPU 2.0: high-accuracy proteomes mapped to genomes

    Get PDF
    The MAPU 2.0 database contains proteomes of organelles, tissues and cell types measured by mass spectrometry (MS)-based proteomics. In contrast to other databases it is meant to contain a limited number of experiments and only those with very high-resolution and -accuracy data. MAPU 2.0 displays the proteomes of organelles, tissues and body fluids or conversely displays the occurrence of proteins of interest in all these proteomes. The new release addresses MS-specific problems including ambiguous peptide-to-protein assignments and it provides insight into general functional features on the protein level ranging from gene ontology classification to comprehensive SwissProt annotation. Moreover, the derived proteomic data are used to annotate the genomes using Distributed Annotation Service (DAS) via EnsEMBL services. MAPU 2.0 is a model for a database specifically designed for high-accuracy proteomics and a member of the ProteomExchange Consortium. It is available on line at http://www.mapuproteome.com

    The PeptideAtlas project

    Get PDF
    The completion of the sequencing of the human genome and the concurrent, rapid development of high-throughput proteomic methods have resulted in an increasing need for automated approaches to archive proteomic data in a repository that enables the exchange of data among researchers and also accurate integration with genomic data. PeptideAtlas (http://www.peptideatlas.org/) addresses these needs by identifying peptides by tandem mass spectrometry (MS/MS), statistically validating those identifications and then mapping identified sequences to the genomes of eukaryotic organisms. A meaningful comparison of data across different experiments generated by different groups using different types of instruments is enabled by the implementation of a uniform analytic process. This uniform statistical validation ensures a consistent and high-quality set of peptide and protein identifications. The raw data from many diverse proteomic experiments are made available in the associated PeptideAtlas repository in several formats. Here we present a summary of our process and details about the Human, Drosophila and Yeast PeptideAtlas build
    corecore