2,002 research outputs found

    Stabilized Benders methods for large-scale combinatorial optimization, with appllication to data privacy

    Get PDF
    The Cell Suppression Problem (CSP) is a challenging Mixed-Integer Linear Problem arising in statistical tabular data protection. Medium sized instances of CSP involve thousands of binary variables and million of continuous variables and constraints. However, CSP has the typical structure that allows application of the renowned Benders’ decomposition method: once the “complicating” binary variables are fixed, the problem decomposes into a large set of linear subproblems on the “easy” continuous ones. This allows to project away the easy variables, reducing to a master problem in the complicating ones where the value functions of the subproblems are approximated with the standard cutting-plane approach. Hence, Benders’ decomposition suffers from the same drawbacks of the cutting-plane method, i.e., oscillation and slow convergence, compounded with the fact that the master problem is combinatorial. To overcome this drawback we present a stabilized Benders decomposition whose master is restricted to a neighborhood of successful candidates by local branching constraints, which are dynamically adjusted, and even dropped, during the iterations. Our experiments with randomly generated and real-world CSP instances with up to 3600 binary variables, 90M continuous variables and 15M inequality constraints show that our approach is competitive with both the current state-of-the-art (cutting-plane-based) code for cell suppression, and the Benders implementation in CPLEX 12.7. In some instances, stabilized Benders is able to quickly provide a very good solution in less than one minute, while the other approaches were not able to find any feasible solution in one hour.Peer ReviewedPreprin

    A genetic approach to statistical disclosure control

    Get PDF
    Statistical disclosure control is the collective name for a range of tools used by data providers such as government departments to protect the confidentiality of individuals or organizations. When the published tables contain magnitude data such as turnover or health statistics, the preferred method is to suppress the values of certain cells. Assigning a cost to the information lost by suppressing any given cell creates the cell suppression problem. This consists of finding the minimum cost solution which meets the confidentiality constraints. Solving this problem simultaneously for all of the sensitive cells in a table is NP-hard and not possible for medium to large sized tables. In this paper, we describe the development of a heuristic tool for this problem which hybridizes linear programming (to solve a relaxed version for a single sensitive cell) with a genetic algorithm (to seek an order for considering the sensitive cells which minimizes the final cost). Considering a range of real-world and representative artificial datasets, we show that the method is able to provide relatively low cost solutions for far larger tables than is possible for the optimal approach to tackle. We show that our genetic approach is able to significantly improve on the initial solutions provided by existing heuristics for cell ordering, and outperforms local search. This approach is then extended and applied to large statistical tables with over 200000 cells. © 2012 IEEE

    Statistical disclosure control: Applications in healthcare

    Get PDF
    Statistical disclosure control is a progressive subject which offers techniques with which tables of data intended for public release can be protected from the threat of disclosure. In this sense disclosure will usually mean information on an individual subject being revealed by the release of a table. The techniques used centre around detecting potential disclosure in a table and then removing this disclosure by somehow adjusting the original table. This thesis has been produced in conjunction with Information and Services Division (Scotland) (ISD) and therefore will concentrate on the applications of statistical disclosure control in the field of healthcare with particular reference to the problems encountered by ISD. The thesis predominately aims to give an overview of current statistical disclosure control techniques. It will investigate how these techniques would work in the ISD scenario and will ultimately aim to provide ISD with advice on how they should proceed in any future update of their statistical disclosure control policy. Chapter 1 introduces statistical disclosure and investigates some of the legal and social issues associated with the field. It also provides information on the techniques which are used by other organisations worldwide. Further there is an introduction to both the ISD scenario and a leading computing package in the area, Tau-Argus. Chapter 2 gives an overview of the techniques currently used in statistical disclosure control. This overview includes technical justification for the techniques along with the advantages and disadvantages associated with using each technique. Chapter 3 provides a decision rule approach to the selection of disclosure control techniques described in Chapter 2 and much of Chapter 3 revolves around a description of the implications derived from the choices made. Chapter 4 presents the results from an application of statistical disclosure control techniques to a real ISD data set concerned with diabetes in children in Scotland. The results include a quantification of the information lost in the table when the disclosure control technique is applied. The investigation concentrated on two and three- dimensional tables and the analysis was carried out using the Tau-Argus computing package. Chapter 5 concludes by providing a summary of the main findings of the thesis and providing recommendations based on these findings. There is also a discussion of potential further study which may be useful to ISD as they attempt to update their statistical disclosure control policy

    Resource Sharing for Multi-Tenant Nosql Data Store in Cloud

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Multi-tenancy hosting of users in cloud NoSQL data stores is favored by cloud providers because it enables resource sharing at low operating cost. Multi-tenancy takes several forms depending on whether the back-end file system is a local file system (LFS) or a parallel file system (PFS), and on whether tenants are independent or share data across tenants In this thesis I focus on and propose solutions to two cases: independent data-local file system, and shared data-parallel file system. In the independent data-local file system case, resource contention occurs under certain conditions in Cassandra and HBase, two state-of-the-art NoSQL stores, causing performance degradation for one tenant by another. We investigate the interference and propose two approaches. The first provides a scheduling scheme that can approximate resource consumption, adapt to workload dynamics and work in a distributed fashion. The second introduces a workload-aware resource reservation approach to prevent interference. The approach relies on a performance model obtained offline and plans the reservation according to different workload resource demands. Results show the approaches together can prevent interference and adapt to dynamic workloads under multi-tenancy. In the shared data-parallel file system case, it has been shown that running a distributed NoSQL store over PFS for shared data across tenants is not cost effective. Overheads are introduced due to the unawareness of the NoSQL store of PFS. This dissertation targets the key-value store (KVS), a specific form of NoSQL stores, and proposes a lightweight KVS over a parallel file system to improve efficiency. The solution is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not designed for. Results show the proposed system outperforms Cassandra and Voldemort in several different workloads

    Tracing an Invasion Paradox across Scales: Patterns and Tests for the Effects of the Introduced Predatory Grouper, Roi (Cephalopholis argus) in Hawai‘i.

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

    Applications of complex adaptive systems approaches to coastal systems

    Get PDF
    This thesis investigatesth e application of complex adaptives ystemsa pproaches (e. g. Artificial Neural Networks and Evolutionary Computation) to the study of coastal hydrodynamica nd morphodynamicb ehaviour.T raditionally, nearshorem orphologicalc oastal systems tudiesh ave developeda n understandingo f thosep hysicalp rocesseso ccurringo n both short temporal, and small spatial scales with a large degree of success. The associated approachesa nd conceptsu sedt o study the coastals ystema t theses calesh ave Primarily been linear in nature.H owever,w hent hesea pproachetso studyingt he coastals ystema re extendedto investigating larger temporal and spatial scales,w hich are commensuratew ith the aims of coastal managementr, esults have had less success.T he lack of successi n developing an understandingo f large scalec oastalb ehaviouri s to a large extent attributablet o the complex behavioura ssociatedw ith the coastals ystem.I bis complexity arises as a result of both the stochastic and chaotic nature of the coastal system. This allows small scale system understandingto be acquiredb ut preventst he Largers caleb ehaviourt o be predictede ffectively. This thesis presentsf our hydro-morphodynamicc ase studies to demonstratet he utility of complex adaptives ystema pproachesfo r studying coastals ystems.T he first two demonstrate the application of Artificial Neural Networks, whilst the latter two illustrate the application of EvolutionaryC omputation.C aseS tudy #I considerst he natureo f the discrepancyb etweent he observedl ocation of wave breakingp atternso ver submergeds andbarsa nd the actual sandbar locations.A rtificial Neural Networks were able to quantitativelyc orrectt he observedlo cations to produce reliable estimates of the actual sand bar locations. Case Study #2 considers the developmenot f an approachf or the discriminationo f shorelinel ocation in video imagesf or the productiono f intertidal mapso f the nearshorer egion. In this caset he systemm odelledb y the Artificial Neural Network is the nature of the discrimination model carried out by the eye in delineating a shoreline feature between regions of sand and water. The Artificial Neural Network approachw as shownt o robustly recognisea rangeo f shorelinef eaturesa t a variety of beaches and hydrodynamic settings. Case Study #3 was the only purely hydrodynamic study consideredin the thesis.I t investigatedth e use of Evolutionary Computationt o provide means of developing a parametric description of directional wave spectra in both reflective and nonreflective conditions. It is shown to provide a unifying approach which produces results which surpassedth ose achievedb y traditional analysisa pproachese vent hough this may not strictly have been considered as a fidly complex system. Case Study #4 is the most ambitious applicationa nd addressetsh e needf or data reductiona s a precursorw hen trying to study large scalem orphodynamicd ata sets.I t utilises EvolutionaryC omputationa pproachesto extractt he significant morphodynamic variability evidenced in both directly and remotely sampled nearshorem orphologiesS. ignificantd atar eductioni s achievedw hilst reWning up to 90% of the original variability in the data sets. These case studies clearly demonstrate the ability of complex adaptive systems to be successfidly applied to coastal system studies. This success has been shown to equal and sometimess urpasst he results that may be obtained by traditional approachesT. he strong performance of Complex Adaptive System approaches is closely linked to the level of complexity or non-linearity of the system being studied. Based on a qualitative evaluation, Evolutionary Computation was shown to demonstrate an advantage over Artificial Neural Networks in terms of the level of new insights which may be obtained. However, utility also needs to consider general ease of applicability and ease of implementation of the study approach.I n this sense,A rtificial Neural Networks demonstratem ore utility for the study of coastals ystems.T he qualitative assessmenatp proachu sedt o evaluatet he cases tudiesi n this thesis, may be used as a guide for choosingt he appropriatenesso f either Artificial Neural Networks or Evolutionary Computation for future coastal system studies

    Developments in predictive displays for discrete and continuous tasks

    Get PDF
    The plan of the thesis is as follows: The introductory chapters review the literature pertaining to human prediction and predictive control models (Chapter 1), and to engineering aspects of predictive displays (Chapter 2). Chapter 3 describes a fundamental study of predictive display parameters in a laboratory scheduling task, Chapter 4 attempts to verify these findings using test data from an actual job shop scheduling problem. Chapter 5 branches into the area of continuous control with a pilot study of predictive displays in a laboratory simulated continuous stirred-tank chemical reactor. Chapter 6 uses the experience gained in the pilot study as the basis for a comprehensive study of predictive display parameters in a further laboratory study of a simplified dual-meter monitoring and control task, and Chapter 7 attempts to test the optimal design in a part-simulated semi-batch chemical reactor using real plant and experienced operators in an industrial setting. The results of the experimental programme are summarized for convenience in Chapter 8. Chapter 9 draws together the threads from the various experiments and discusses the findings in terms of a general hierarchical model of an operator's control and monitoring behaviour. Finally, Chapter 10 presents conclusions and recommendations from the programme of research, together with suggestions for further work
    • 

    corecore