Search CORE

27 research outputs found

Efficiently Mining Maximal Diverse Frequent Itemsets

Author: B Mallick
B Vo
D Burdick
DI Lin
G Pyun
H Ryang
Jiawei Han
K Gouda
K Srikumar
M Kumara Swamy
N Pasquier
RJ Bayardo Jr
SK Tanbeer
T Hu
Y Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Crossref

VBN

The SAIL Databank: building a national architecture for e-health research and evaluation

Author: B Schneier
C Safran
C Skinner
Caroline J Brooks
David V Ford
Department of Health
Gareth John
Ginevra Brown
Jean-Philippe Verplancke
Ken Leake
Kerina H Jones
MJ Elliot
N Black
Owen Bodger
P Boyd
R Lyons
RC-W Wong
RJ Bayardo
Ronan A Lyons
SE Rodgers
Simon Thompson
Tony Couch
UK Clinical Research Collaboration (UKCRC)
UKCRC
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Cronfa at Swansea University

A Polly Cracker system based on Satisfiability

Author: A Odlyzko
A Shamir
B Selman
D Bayer
D Hofheinz
J-C Faugère
L Ly Van
M Davis
M Fellows
N Koblitz
O Dubois
R Impagliazzo
R Steinwandt
RJ Bayardo Jr.
S Cocco
SA Cook
T Becker
T Okamoto
W Geiselmann
Publication venue: HAL CCSD
Publication date: 01/01/2003
Field of study

This paper presents a public-key cryptosystem based on a subclass of the well-known satisfiability problem from propositional logic, namely the doubly-balanced 3-sat problem. We first describe the construction of an instance of our system starting from such a 3-sat formula. Then we discuss security issues: this is achieved on the one hand by exploring best methods to date for solving this particular problem, and on the other hand by studying (systems of multivariate) polynomial equation solving algorithms in this particular setting. The result of our investigations is that both types of method fail to break our instances. We end the paper with some complexity considerations and implementation results

Crossref

INRIA a CCSD electronic archive server

De-identifying a public use microdata file from the Canadian national discharge abstract database

Author: A Dale
A de Waal
A Gionis
A Hundepool
A Hundepool
A Machanavajjhala
A Machanavajjhala
A Meyerson
A Narayanan
Agency for Healthcare Research and Quality
B Hore
B Yolles
B-C Chen
BCM Fung
BCM Fung
BCM Fung
C Hogue
C Mackie
C Marsh
C Marsh
C Skinner
C Skinner
Canada Statistics
Canadian Institute for Health Information
Canadian Institute for Health Information
CE Shannon
CE Shannon
CK Liew
D Altman
D Defays
D Defays
D Hutchon
D Lafky
David Paton
DB Rubin
Department of Health and Human Services
Department of Health and Human Services
E Boyko
Federal Court (Canada)
Fida Dankar
G Aggarwal
G Duncan
G Loukides
G Sande
G Sullivan
G Sullivan
GD Smith
GR Heer
Gunes Koru
H Kargupta
J Castro
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Domingo-Ferrer
J Jimenez
J Schoenman
J Xu
JJ Kim
JP Gouweleeuw
K Abraham
K Benitez
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K LeFevre
Khaled El Emam
L Alexander
L Sweeney
L Sweeney
L Sweeney
L Sweeney
L Sweeney
L Willenborg
L Willenborg
LA Alexander
LH Cox
M Barbaro
M Templ
ME Nergiz
National Committee on Vital and Health Statistics
P Doyle
P Kooiman
P Nanopoulos
P Samarati
P Samarati
P Samarati
R Bayardo
R Gopal
RA Dandekar
RA Dandekar
RJ Bayardo
RJA Little
S Fienberg
S Hansell
S Ochoa
Statistics Canada
Statistics Canada
Statistics Canada
T de Waal
T Delamothe
T Hedrick
T Zeller Jr
V Ciriani
V Iyengar
V Torra
V Torra
V Torra
VS Iyengar
W Lowrance
W Winkler
WE Winkler
X Xiao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records. Methods Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy. Results Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression. Conclusions The strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Efficient and effective pruning strategies for health data de-identification

Author: B Davey
B Fung
B Malin
BCM Fung
CC Aggarwal
DE Willard
F Kohlmayer
F Prasser
F Prasser
F Prasser
Fabian Prasser
FK Dankar
Florian Kohlmayer
G Poulis
J Domingo-Ferrer
J Goldberger
J Soria-Comas
K Babu
K El Emam
K El Emam
K El Emam
K El Emam
K LeFevre
KE Emam
Klaus A. Kuhn
L Mattner
L Sweeney
LH Cox
M Maass
M Nergiz
N Li
P Bose
P Samarati
P Samarati
R Lautenschläger
RJ Bayardo
V Iyengar
W Xia
Z Wan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Algorithms For Computing Association Rules Using A Partial-Support Tree

Author: A Savasere
F Coenen
F Coenen
H Toivonen
M Houtsma
MJ Zaki
R Agrawal
R Agrawal
RJ Bayardo
Publication venue: Springer
Publication date: 01/01/2000
Field of study

This paper presents new algorithms for the extraction of association rules from binary databases. Most existing methods operate by generating many "candidate" sets, representing combinations of attributes which may be associated, and then testing the database to establish the degree of association. This may involve multiple database passes, and is also likely to encounter problems when dealing with "dense" data due to the increase in the number of sets under consideration. Our methods uses a single pass of the database to perform a partial computation of support for all sets encountered in the database, storing this in the form of a set enumeration tree. We describe algorithms for generating this tree and for using it to generate association rules. KEYWORDS: Association Rules, Partial Support, Set Enumeration 1 INTRODUCTION Modern businesses have the capacity to store huge amounts of data regarding all aspects of their operations. Deriving association rules [1] from this dat..

CiteSeerX

Crossref