2,002 research outputs found
Stabilized Benders methods for large-scale combinatorial optimization, with appllication to data privacy
The Cell Suppression Problem (CSP) is a challenging Mixed-Integer Linear Problem arising in statistical tabular data protection. Medium sized instances of CSP involve thousands of binary variables and million of continuous variables and constraints. However, CSP has the typical
structure that allows application of the renowned Bendersâ decomposition method: once the âcomplicatingâ binary variables are fixed, the problem decomposes into a large set of linear subproblems on the âeasyâ continuous ones. This allows to project away the easy variables, reducing to a master problem in the complicating ones where the value functions of the subproblems are approximated with the standard cutting-plane approach. Hence, Bendersâ decomposition suffers from the same drawbacks of the cutting-plane method, i.e., oscillation and slow convergence, compounded with the fact that the master problem is combinatorial. To overcome this drawback we present a stabilized Benders decomposition whose master is restricted to a neighborhood of successful candidates by local branching constraints, which are dynamically adjusted, and even dropped, during the iterations. Our experiments with randomly generated and real-world CSP instances with up to 3600 binary variables, 90M continuous variables and 15M inequality constraints show that our approach is competitive with both the current state-of-the-art (cutting-plane-based) code for cell suppression, and the Benders implementation in CPLEX 12.7. In some instances, stabilized Benders is able to quickly provide a very good solution in less than one minute, while the other approaches were not able to find any feasible solution in one hour.Peer ReviewedPreprin
A genetic approach to statistical disclosure control
Statistical disclosure control is the collective name for a range of tools used by data providers such as government departments to protect the confidentiality of individuals or organizations. When the published tables contain magnitude data such as turnover or health statistics, the preferred method is to suppress the values of certain cells. Assigning a cost to the information lost by suppressing any given cell creates the cell suppression problem. This consists of finding the minimum cost solution which meets the confidentiality constraints. Solving this problem simultaneously for all of the sensitive cells in a table is NP-hard and not possible for medium to large sized tables. In this paper, we describe the development of a heuristic tool for this problem which hybridizes linear programming (to solve a relaxed version for a single sensitive cell) with a genetic algorithm (to seek an order for considering the sensitive cells which minimizes the final cost). Considering a range of real-world and representative artificial datasets, we show that the method is able to provide relatively low cost solutions for far larger tables than is possible for the optimal approach to tackle. We show that our genetic approach is able to significantly improve on the initial solutions provided by existing heuristics for cell ordering, and outperforms local search. This approach is then extended and applied to large statistical tables with over 200000 cells. © 2012 IEEE
Statistical disclosure control: Applications in healthcare
Statistical disclosure control is a progressive subject which offers techniques with which tables of data intended for public release can be protected from the threat of disclosure. In this sense disclosure will usually mean information on an individual subject being revealed by the release of a table. The techniques used centre around detecting potential disclosure in a table and then removing this disclosure by somehow adjusting the original table. This thesis has been produced in conjunction with Information and Services Division (Scotland) (ISD) and therefore will concentrate on the applications of statistical disclosure control in the field of healthcare with particular reference to the problems encountered by ISD. The thesis predominately aims to give an overview of current statistical disclosure control techniques. It will investigate how these techniques would work in the ISD scenario and will ultimately aim to provide ISD with advice on how they should proceed in any future update of their statistical disclosure control policy. Chapter 1 introduces statistical disclosure and investigates some of the legal and social issues associated with the field. It also provides information on the techniques which are used by other organisations worldwide. Further there is an introduction to both the ISD scenario and a leading computing package in the area, Tau-Argus. Chapter 2 gives an overview of the techniques currently used in statistical disclosure control. This overview includes technical justification for the techniques along with the advantages and disadvantages associated with using each technique. Chapter 3 provides a decision rule approach to the selection of disclosure control techniques described in Chapter 2 and much of Chapter 3 revolves around a description of the implications derived from the choices made. Chapter 4 presents the results from an application of statistical disclosure control techniques to a real ISD data set concerned with diabetes in children in Scotland. The results include a quantification of the information lost in the table when the disclosure control technique is applied. The investigation concentrated on two and three- dimensional tables and the analysis was carried out using the Tau-Argus computing package. Chapter 5 concludes by providing a summary of the main findings of the thesis and providing recommendations based on these findings. There is also a discussion of potential further study which may be useful to ISD as they attempt to update their statistical disclosure control policy
Resource Sharing for Multi-Tenant Nosql Data Store in Cloud
Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Multi-tenancy hosting of users in cloud NoSQL data stores is favored by cloud providers because it enables resource sharing at low operating cost. Multi-tenancy takes several forms depending on whether the back-end file system is a local file system (LFS) or a parallel file system (PFS), and on whether tenants are independent or share data across tenants In this thesis I focus on and propose solutions to two cases: independent data-local file system, and shared data-parallel file system. In the independent data-local file system case, resource contention occurs under certain conditions in Cassandra and HBase, two state-of-the-art NoSQL stores, causing performance degradation for one tenant by another. We investigate the interference and propose two approaches. The first provides a scheduling scheme that can approximate resource consumption, adapt to workload dynamics and work in a distributed fashion. The second introduces a workload-aware resource reservation approach to prevent interference. The approach relies on a performance model obtained offline and plans the reservation according to different workload resource demands. Results show the approaches together can prevent interference and adapt to dynamic workloads under multi-tenancy. In the shared data-parallel file system case, it has been shown that running a distributed NoSQL store over PFS for shared data across tenants is not cost effective. Overheads are introduced due to the unawareness of the NoSQL store of PFS. This dissertation targets the key-value store (KVS), a specific form of NoSQL stores, and proposes a lightweight KVS over a parallel file system to improve efficiency. The solution is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not designed for. Results show the proposed system outperforms Cassandra and Voldemort in several different workloads
Tracing an Invasion Paradox across Scales: Patterns and Tests for the Effects of the Introduced Predatory Grouper, Roi (Cephalopholis argus) in Hawaiâi.
Ph.D. Thesis. University of HawaiÊ»i at MÄnoa 2017
Applications of complex adaptive systems approaches to coastal systems
This thesis investigatesth e application of complex adaptives ystemsa pproaches
(e. g. Artificial Neural Networks and Evolutionary Computation) to the study of coastal
hydrodynamica nd morphodynamicb ehaviour.T raditionally, nearshorem orphologicalc oastal
systems tudiesh ave developeda n understandingo f thosep hysicalp rocesseso ccurringo n both
short temporal, and small spatial scales with a large degree of success. The associated
approachesa nd conceptsu sedt o study the coastals ystema t theses calesh ave Primarily been
linear in nature.H owever,w hent hesea pproachetso studyingt he coastals ystema re extendedto
investigating larger temporal and spatial scales,w hich are commensuratew ith the aims of
coastal managementr, esults have had less success.T he lack of successi n developing an
understandingo f large scalec oastalb ehaviouri s to a large extent attributablet o the complex
behavioura ssociatedw ith the coastals ystem.I bis complexity arises as a result of both the
stochastic and chaotic nature of the coastal system. This allows small scale system
understandingto be acquiredb ut preventst he Largers caleb ehaviourt o be predictede ffectively.
This thesis presentsf our hydro-morphodynamicc ase studies to demonstratet he utility of
complex adaptives ystema pproachesfo r studying coastals ystems.T he first two demonstrate
the application of Artificial Neural Networks, whilst the latter two illustrate the application of
EvolutionaryC omputation.C aseS tudy #I considerst he natureo f the discrepancyb etweent he
observedl ocation of wave breakingp atternso ver submergeds andbarsa nd the actual sandbar
locations.A rtificial Neural Networks were able to quantitativelyc orrectt he observedlo cations
to produce reliable estimates of the actual sand bar locations. Case Study #2 considers the
developmenot f an approachf or the discriminationo f shorelinel ocation in video imagesf or the
productiono f intertidal mapso f the nearshorer egion. In this caset he systemm odelledb y the
Artificial Neural Network is the nature of the discrimination model carried out by the eye in
delineating a shoreline feature between regions of sand and water. The Artificial Neural
Network approachw as shownt o robustly recognisea rangeo f shorelinef eaturesa t a variety of
beaches and hydrodynamic settings. Case Study #3 was the only purely hydrodynamic study
consideredin the thesis.I t investigatedth e use of Evolutionary Computationt o provide means
of developing a parametric description of directional wave spectra in both reflective and nonreflective
conditions. It is shown to provide a unifying approach which produces results which
surpassedth ose achievedb y traditional analysisa pproachese vent hough this may not strictly
have been considered as a fidly complex system. Case Study #4 is the most ambitious
applicationa nd addressetsh e needf or data reductiona s a precursorw hen trying to study large
scalem orphodynamicd ata sets.I t utilises EvolutionaryC omputationa pproachesto extractt he
significant morphodynamic variability evidenced in both directly and remotely sampled
nearshorem orphologiesS. ignificantd atar eductioni s achievedw hilst reWning up to 90% of the
original variability in the data sets.
These case studies clearly demonstrate the ability of complex adaptive systems to be
successfidly applied to coastal system studies. This success has been shown to equal and
sometimess urpasst he results that may be obtained by traditional approachesT. he strong
performance of Complex Adaptive System approaches is closely linked to the level of
complexity or non-linearity of the system being studied. Based on a qualitative evaluation,
Evolutionary Computation was shown to demonstrate an advantage over Artificial Neural
Networks in terms of the level of new insights which may be obtained. However, utility also
needs to consider general ease of applicability and ease of implementation of the study
approach.I n this sense,A rtificial Neural Networks demonstratem ore utility for the study of
coastals ystems.T he qualitative assessmenatp proachu sedt o evaluatet he cases tudiesi n this
thesis, may be used as a guide for choosingt he appropriatenesso f either Artificial Neural
Networks or Evolutionary Computation for future coastal system studies
Developments in predictive displays for discrete and continuous tasks
The plan of the thesis is as follows: The introductory chapters
review the literature pertaining to human prediction and predictive
control models (Chapter 1), and to engineering aspects of predictive
displays (Chapter 2). Chapter 3 describes a fundamental study of predictive
display parameters in a laboratory scheduling task, Chapter 4
attempts to verify these findings using test data from an actual job shop
scheduling problem. Chapter 5 branches into the area of continuous
control with a pilot study of predictive displays in a laboratory
simulated continuous stirred-tank chemical reactor. Chapter 6 uses the
experience gained in the pilot study as the basis for a comprehensive study
of predictive display parameters in a further laboratory study of a
simplified dual-meter monitoring and control task, and Chapter 7 attempts
to test the optimal design in a part-simulated semi-batch chemical reactor
using real plant and experienced operators in an industrial setting. The
results of the experimental programme are summarized for convenience in
Chapter 8. Chapter 9 draws together the threads from the various experiments
and discusses the findings in terms of a general hierarchical model
of an operator's control and monitoring behaviour. Finally, Chapter 10
presents conclusions and recommendations from the programme of research,
together with suggestions for further work
- âŠ