8,871 research outputs found

    Scather: programming with multi-party computation and MapReduce

    Full text link
    We present a prototype of a distributed computational infrastructure, an associated high level programming language, and an underlying formal framework that allow multiple parties to leverage their own cloud-based computational resources (capable of supporting MapReduce [27] operations) in concert with multi-party computation (MPC) to execute statistical analysis algorithms that have privacy-preserving properties. Our architecture allows a data analyst unfamiliar with MPC to: (1) author an analysis algorithm that is agnostic with regard to data privacy policies, (2) to use an automated process to derive algorithm implementation variants that have different privacy and performance properties, and (3) to compile those implementation variants so that they can be deployed on an infrastructures that allows computations to take place locally within each participant’s MapReduce cluster as well as across all the participants’ clusters using an MPC protocol. We describe implementation details of the architecture, discuss and demonstrate how the formal framework enables the exploration of tradeoffs between the efficiency and privacy properties of an analysis algorithm, and present two example applications that illustrate how such an infrastructure can be utilized in practice.This work was supported in part by NSF Grants: #1430145, #1414119, #1347522, and #1012798

    Obvious: a meta-toolkit to encapsulate information visualization toolkits. One toolkit to bind them all

    Get PDF
    This article describes “Obvious”: a meta-toolkit that abstracts and encapsulates information visualization toolkits implemented in the Java language. It intends to unify their use and postpone the choice of which concrete toolkit(s) to use later-on in the development of visual analytics applications. We also report on the lessons we have learned when wrapping popular toolkits with Obvious, namely Prefuse, the InfoVis Toolkit, partly Improvise, JUNG and other data management libraries. We show several examples on the uses of Obvious, how the different toolkits can be combined, for instance sharing their data models. We also show how Weka and RapidMiner, two popular machine-learning toolkits, have been wrapped with Obvious and can be used directly with all the other wrapped toolkits. We expect Obvious to start a co-evolution process: Obvious is meant to evolve when more components of Information Visualization systems will become consensual. It is also designed to help information visualization systems adhere to the best practices to provide a higher level of interoperability and leverage the domain of visual analytics

    Physical Representation-based Predicate Optimization for a Visual Analytics Database

    Full text link
    Querying the content of images, video, and other non-textual data sources requires expensive content extraction methods. Modern extraction techniques are based on deep convolutional neural networks (CNNs) and can classify objects within images with astounding accuracy. Unfortunately, these methods are slow: processing a single image can take about 10 milliseconds on modern GPU-based hardware. As massive video libraries become ubiquitous, running a content-based query over millions of video frames is prohibitive. One promising approach to reduce the runtime cost of queries of visual content is to use a hierarchical model, such as a cascade, where simple cases are handled by an inexpensive classifier. Prior work has sought to design cascades that optimize the computational cost of inference by, for example, using smaller CNNs. However, we observe that there are critical factors besides the inference time that dramatically impact the overall query time. Notably, by treating the physical representation of the input image as part of our query optimization---that is, by including image transforms, such as resolution scaling or color-depth reduction, within the cascade---we can optimize data handling costs and enable drastically more efficient classifier cascades. In this paper, we propose Tahoma, which generates and evaluates many potential classifier cascades that jointly optimize the CNN architecture and input data representation. Our experiments on a subset of ImageNet show that Tahoma's input transformations speed up cascades by up to 35 times. We also find up to a 98x speedup over the ResNet50 classifier with no loss in accuracy, and a 280x speedup if some accuracy is sacrificed.Comment: Camera-ready version of the paper submitted to ICDE 2019, In Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE 2019

    Robustness - a challenge also for the 21st century: A review of robustness phenomena in technical, biological and social systems as well as robust approaches in engineering, computer science, operations research and decision aiding

    Get PDF
    Notions on robustness exist in many facets. They come from different disciplines and reflect different worldviews. Consequently, they contradict each other very often, which makes the term less applicable in a general context. Robustness approaches are often limited to specific problems for which they have been developed. This means, notions and definitions might reveal to be wrong if put into another domain of validity, i.e. context. A definition might be correct in a specific context but need not hold in another. Therefore, in order to be able to speak of robustness we need to specify the domain of validity, i.e. system, property and uncertainty of interest. As proofed by Ho et al. in an optimization context with finite and discrete domains, without prior knowledge about the problem there exists no solution what so ever which is more robust than any other. Similar to the results of the No Free Lunch Theorems of Optimization (NLFTs) we have to exploit the problem structure in order to make a solution more robust. This optimization problem is directly linked to a robustness/fragility tradeoff which has been observed in many contexts, e.g. 'robust, yet fragile' property of HOT (Highly Optimized Tolerance) systems. Another issue is that robustness is tightly bounded to other phenomena like complexity for which themselves exist no clear definition or theoretical framework. Consequently, this review rather tries to find common aspects within many different approaches and phenomena than to build a general theorem for robustness, which anyhow might not exist because complex phenomena often need to be described from a pluralistic view to address as many aspects of a phenomenon as possible. First, many different robustness problems have been reviewed from many different disciplines. Second, different common aspects will be discussed, in particular the relationship of functional and structural properties. This paper argues that robustness phenomena are also a challenge for the 21st century. It is a useful quality of a model or system in terms of the 'maintenance of some desired system characteristics despite fluctuations in the behaviour of its component parts or its environment' (s. [Carlson and Doyle, 2002], p. 2). We define robustness phenomena as solution with balanced tradeoffs and robust design principles and robustness measures as means to balance tradeoffs. --
    • …
    corecore