Search CORE

6 research outputs found

Automatic error localisation for categorical, continuous and integer data

Author: Waal Ton de
Publication venue: Institut d'Estadística de Catalunya
Publication date: 01/01/2005
Field of study

Data collected by statistical offices generally contain errors, which have to be corrected before reliable data can be published. This correction process is referred to as statistical data editing. At statistical offices, certain rules, so-called edits, are often used during the editing process to determine whether a record is consistent or not. Inconsistent records are considered to contain errors, while consistent records are considered error-free. In this article we focus on automatic error localisation based on the Fellegi-Holt paradigm, which says that the data should be made to satisfy all edits by changing the fewest possible number of fields. Adoption of this paradigm leads to a mathematical optimisation problem. We propose an algorithm for solving this optimisation problem for a mix of categorical, continuous and integer-valued data. We also propose a heuristic procedure based on the exact algorithm. For five realistic data sets involving only integer-valued variables we evaluate the performance of this heuristic procedure.Peer Reviewe

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Revistes Catalanes amb Accés Obert

Diposit Digital de Documents de la UAB

Secretaría de Estado de Cultura

Integer Affine Transformations of Parametric Z-polytopes and Applications to Loop Nest Optimization

Author: Loechner Vincent
Meister Benoit
Seghir Rachid
Publication venue: HAL CCSD
Publication date: 17/05/2010
Field of study

The polyhedral model is a well-known compiler optimization framework for the analysis and transformation of affine loop nests. We present a new method concerning a difficult geometric operation that is raised by this model: the integer affine transformation of parametric Z-polytopes. The result of such a transformation is given by a worst-case exponential union of Z-polytopes. We also propose a polynomial algorithm (for fixed dimension), to count points in arbitrary unions of a fixed number of parametric Z-polytopes. We implemented these algorithms and compared them to other existing algorithms, for a set of applications to loop nest analysis and optimization

INRIA a CCSD electronic archive server

Types for DSP Assembler Programs

Author: Larsen Ken
Publication venue: Technical University of Denmark
Publication date: 01/01/2006
Field of study

Online Research Database In Technology

Processing of Erroneous and Unsafe Data

Author: Waal A.G. de
Publication venue: Statistical offices have to overcome many problems before they can publish reliable data. Two of these problems are examined in this thesis. The first problem is the occurrence of errors in the collected data. Due to these errors publication figures cannot be directly based on the collected data. Before publication the errors in the data have to be localised and corrected. In this thesis we focus on the localisation of errors in a mix of categorical and numerical data. The problem is formulated as a mathematical optimisation problem. Several new algorithms for solving this problem are proposed, and computational results of the most promising algorithms are compared to each other. The second problem that is examined in this thesis is the occurrence of unsafe data, i.e. data that would reveal too much sensitive information about individual respondents. Before publication of data, such unsafe data need to be protected. In the thesis we examine various aspects of the protection of unsafe data.
Publication date: 19/06/2003
Field of study

Statistical offices have to overcome many problems before they can publish reliable data. Two of these problems are examined in this thesis. The first problem is the occurrence of errors in the collected data. Due to these errors publication figures cannot be directly based on the collected data. Before publication the errors in the data have to be localised and corrected. In this thesis we focus on the localisation of errors in a mix of categorical and numerical data. The problem is formulated as a mathematical optimisation problem. Several new algorithms for solving this problem are proposed, and computational results of the most promising algorithms are compared to each other. The second problem that is examined in this thesis is the occurrence of unsafe data, i.e. data that would reveal too much sensitive information about individual respondents. Before publication of data, such unsafe data need to be protected. In the thesis we examine various aspects of the protection of unsafe data.Statistische bureaus dienen tal van problemen te overwinnen voordat zij de resultaten van hun onderzoeken kunnen publiceren. In het proefschrift wordt ingegaan op twee van deze problemen. Het eerste probleem is dat verzamelde gegevens foutief kunnen zijn. Door de mogelijke aanwezigheid van fouten in de gegevens moeten deze gegevens eerst worden gecontroleerd en indien nodig worden gecorrigeerd voordat tot publicatie van resultaten wordt overgegaan. In het proefschrift wordt vooral aandacht besteed aan het opsporen van de foutieve gegevens. Door te veronderstellen dat er zo min mogelijk fouten zijn gemaakt kan het opsporen van de foutieve waarden als een wiskundig optimaliseringsprobleem worden geformuleerd. In het proefschrift wordt een aantal methoden ontwikkeld om dit complexe probleem efficient op te lossen. Het tweede probleem dat in het proefschrift onderzocht wordt is dat geen gegevens gepubliceerd mogen worden die de privacy van individuele respondenten of kleine groepen respondenten schaden. Om gegevens van individuele of kleine groepen respondenten te beschermen moeten beveiligingsmaatregelen, zoals het niet publiceren van bepaalde informatie, worden getroffen. In het proefschrift wordt ingegaan op de wiskundige problemen die het beveiligen van gevoelige gegevens met zich mee brengt. Voor een aantal problemen, zoals het berekenen van het informatieverlies ten gevolge van het beveiligen van gevoelige gegevens en het minimaliseren van de informatie die niet gepubliceerd wordt, worden oplossingen beschreven

EUR Research Repository

Erasmus University Digital Repository

Experiences with Constraint-based Array Dependence Analysis

Author: David Wonnacott
William Pugh
Publication venue
Publication date: 01/01/1994
Field of study

Array data dependence analysis provides important information for optimization of scientific programs. Array dependence testing can be viewed as constraint analysis, although traditionally general-purpose constraint manipulation algorithms have been thought to be too slow for dependence analysis. We have explored the use of exact constraint analysis, based on Fourier's method, for array data dependence analysis. We have found these techniques can be used without a great impact on total compile time. Furthermore, the use of general-purpose algorithms has allowed us to address problems beyond traditional dependence analysis. In this paper, we summarize some of the constraint manipulation techniques we use for dependence analysis, and discuss some of the reasons for our performance results

CiteSeerX

Haverford College: Haverford Scholarship

Experiences with constraint-based array dependence analysis

Author: D. C. Cooper
G.B. Dantzig
H.P. Williams
J. Jaffar
M. Berry
M. J. Wolfe
R. E. Shostak
V. Chandru
W. Kelly
W. Pugh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref