95 research outputs found

    Protecting micro-data by micro-aggregation: the experience in Eurostat

    Get PDF

    Impact of base dataset design on few-shot image classification

    Full text link
    The quality and generality of deep image features is crucially determined by the data they have been trained on, but little is known about this often overlooked effect. In this paper, we systematically study the effect of variations in the training data by evaluating deep features trained on different image sets in a few-shot classification setting. The experimental protocol we define allows to explore key practical questions. What is the influence of the similarity between base and test classes? Given a fixed annotation budget, what is the optimal trade-off between the number of images per class and the number of classes? Given a fixed dataset, can features be improved by splitting or combining different classes? Should simple or diverse classes be annotated? In a wide range of experiments, we provide clear answers to these questions on the miniImageNet, ImageNet and CUB-200 benchmarks. We also show how the base dataset design can improve performance in few-shot classification more drastically than replacing a simple baseline by an advanced state of the art algorithm.Comment: 23 pages, 11 figures, to appear in ECCV 202

    The inspection of soil-disinfection equipment in Belgium.

    Get PDF
    In Belgium, the mandatory inspection of field and orchard sprayers was already started up in 1995. At that time, there were only inspection protocols available for those two types of sprayers. From 2008 on, two new inspection protocols were developed: one for greenhouse sprayers and one for soil-disinfection machines. Those inspection protocols were added to the Belgian legislation and implemented since 2011. The inspection protocol for greenhouse sprayers was mainly based on the two existing protocols (field and orchard sprayers) as the working principle of those machines was similar.Soil disinfection machines used on Belgian territory needed another approach because of the differences in pressurising and application technique compared to classical spraying machines. Soil disinfection machines use a closed tank containing the vaporous disinfectant. The tank is pressurised by a compressor or a diving cylinder. As concerns the injector side of those machines there are different possibilities. Some are using a manifold with restrictor plates or a small tap per injector, others use narrow tubes towards the injectors, and sometimes nozzles are used. As one can see, there are no standard inspection methods available for those types of machines. Neither a standard spray pattern measurement, nor a separate pressure and nozzle testing is possible on most of those machines. On top there are some important safety aspects that need special attention due to the hazardous products used. The Belgian inspection protocol was almost completely developed in-house and makes it possible to inspect soil-disinfection machines in an accurate, safe and economical way

    Wireless flow-sensor to inspect spray rate controllers

    Get PDF
    In Belgium, the mandatory inspection of sprayers was already started up in 1996 and the 8th inspection cycle (2017-2018-2019) is currently running. The inspection of sprayers is performed by official and mobile teams ruled by two inspection authorities and the management is done by the Federal Ministry for Consumer Protection, Public Health and the Environment (FAVV). In the Flemish region the inspection is delegated to the Institute for Agricultural and Fisheries Research (ILVO). In the past decade the number of field crop sprayers equipped with a spray rate controller increased significantly. In the first inspection cycle (1996-1998), only 4.58% of the field crop sprayers were equipped with a spray rate controller in Flanders. In the 7th inspection cycle (2014-2016), this percentage increased significantly to 26.92%. As the original inspection method for spray rate controllers showed some lacks and was time consuming, ILVO developed a simple and reliable method to test rate controllers on field crop and orchard sprayers.In Belgium, the mandatory inspection of sprayers was already started up in 1996 and the 8th inspection cycle (2017-2018-2019) is currently running. The inspection of sprayers is performed by official and mobile teams ruled by two inspection authorities and the management is done by the Federal Ministry for Consumer Protection, Public Health and the Environment (FAVV). In the Flemish region the inspection is delegated to the Institute for Agricultural and Fisheries Research (ILVO). In the past decade the number of field crop sprayers equipped with a spray rate controller increased significantly. In the first inspection cycle (1996-1998), only 4.58% of the field crop sprayers were equipped with a spray rate controller in Flanders. In the 7th inspection cycle (2014-2016), this percentage increased significantly to 26.92%. As the original inspection method for spray rate controllers showed some lacks and was time consuming, ILVO developed a simple and reliable method to test rate controllers on field crop and orchard sprayers

    RefConcile – automated online reconciliation of bibliographic references

    Get PDF
    Comprehensive bibliographies often rely on community contributions. In such a setting, de-duplication is mandatory for the bibliography to be useful. Ideally, it works online, i.e., during the addition of new references, so the bibliography remains duplicate-free at all times. While de-duplication is well researched, generic approaches do not achieve the result quality required for automated reconciliation. To overcome this problem, we propose a new duplicate detection and reconciliation technique called RefConcile. Aimed specifically at bibliographic references, it uses dedicated blocking and matching techniques tailored to this type of data. Our evaluation based on a large real-world collection of bibliographic references shows that RefConcile scales well, and that it detects and reconciles duplicates highly accurately

    De-identifying a public use microdata file from the Canadian national discharge abstract database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records.</p> <p>Methods</p> <p>Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy.</p> <p>Results</p> <p>Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression.</p> <p>Conclusions</p> <p>The strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.</p

    Single cell dissection of plasma cell heterogeneity in symptomatic and asymptomatic myeloma

    Get PDF
    Multiple myeloma, a plasma cell malignancy, is the second most common blood cancer. Despite extensive research, disease heterogeneity is poorly characterized, hampering efforts for early diagnosis and improved treatments. Here, we apply single cell RNA sequencing to study the heterogeneity of 40 individuals along the multiple myeloma progression spectrum, including 11 healthy controls, demonstrating high interindividual variability that can be explained by expression of known multiple myeloma drivers and additional putative factors. We identify extensive subclonal structures for 10 of 29 individuals with multiple myeloma. In asymptomatic individuals with early disease and in those with minimal residual disease post-treatment, we detect rare tumor plasma cells with molecular characteristics similar to those of active myeloma, with possible implications for personalized therapies. Single cell analysis of rare circulating tumor cells allows for accurate liquid biopsy and detection of malignant plasma cells, which reflect bone marrow disease. Our work establishes single cell RNA sequencing for dissecting blood malignancies and devising detailed molecular characterization of tumor cells in symptomatic and asymptomatic patients

    Unsupervised morphological segmentation of tissue compartments in histopathological images

    Get PDF
    Algorithmic segmentation of histologically relevant regions of tissues in digitized histopathological images is a critical step towards computer-assisted diagnosis and analysis. For example, automatic identification of epithelial and stromal tissues in images is important for spatial localisation and guidance in the analysis and characterisation of tumour micro-environment. Current segmentation approaches are based on supervised methods, which require extensive training data from high quality, manually annotated images. This is often difficult and costly to obtain. This paper presents an alternative data-independent framework based on unsupervised segmentation of oropharyngeal cancer tissue micro-arrays (TMAs). An automated segmentation algorithm based on mathematical morphology is first applied to light microscopy images stained with haematoxylin and eosin. This partitions the image into multiple binary ‘virtual-cells’, each enclosing a potential ‘nucleus’ (dark basins in the haematoxylin absorbance image). Colour and morphology measurements obtained from these virtual-cells as well as their enclosed nuclei are input into an advanced unsupervised learning model for the identification of epithelium and stromal tissues. Here we exploit two Consensus Clustering (CC) algorithms for the unsupervised recognition of tissue compartments, that consider the consensual opinion of a group of individual clustering algorithms. Unlike most unsupervised segmentation analyses, which depend on a single clustering method, the CC learning models allow for more robust and stable detection of tissue regions. The proposed framework performance has been evaluated on fifty-five hand-annotated tissue images of oropharyngeal tissues. Qualitative and quantitative results of the proposed segmentation algorithm compare favourably with eight popular tissue segmentation strategies. Furthermore, the unsupervised results obtained here outperform those obtained with individual clustering algorithms
    corecore