101 research outputs found

    Adaptive Huber Regression

    Get PDF
    Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded (1+δ)(1+\delta)-th moment for any δ>0\delta > 0. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when δ≥1\delta \geq 1, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime 0<δ<10<\delta< 1. Furthermore, this transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive.Comment: final versio

    Language-Based Differential Privacy with Accuracy Estimations and Sensitivity Analyses

    Get PDF
    This thesis focuses on the development of programming frameworks to enforce, by construction, desirable properties of software systems. Particularly, we are interested in enforcing differential privacy -- a mathematical notion of data privacy -- while statically reasoning about the accuracy of computations, along with deriving the sensitivity of arbitrary functions to further strengthen the expressiveness of these systems. To this end, we first introduce DPella, a programming framework for differentially-private queries that allows reasoning about the privacy and accuracy of data analyses. DPella provides a novel component that statically tracks the accuracy of different queries. This component leverages taint analysis to infer statistical independence of the different noises that were added to ensure the privacy of the overall computation. As a result, DPella allows analysts to implement privacy-preserving queries and adjust the privacy parameters to meet accuracy targets or vice-versa.In the context of differentially-private systems, the sensitivity of a function determines the amount of noise needed to achieve a desired level of privacy. However, establishing the sensitivity of arbitrary functions is non-trivial. Consequently, systems such as DPella provided a limited set of functions -- whose sensitivity is known -- to apply over sensitive data, thus hindering the expressiveness of the language. To overcome this limitation, we propose a new approach to derive proofs of sensitivity in programming languages with support for polymorphism. Our approach enriches base types with information about the metric relation between values and applies parametricity to derive proof of a function\u27s sensitivity. These ideas are formalized in a sound calculus and implemented as a Haskell library called Spar, enabling programmers to prove the sensitivity of their functions through type-checking alone.Overall, this thesis contributes to the development of expressive programming frameworks for data analysis with privacy and accuracy guarantees. The proposed approaches are feasible and effective, as demonstrated through the implementation of DPella and Spar

    Enabling fuel system component placement on the basis of geometric and certification requirements within a semantic knowledge-based engineering framework

    Get PDF
    The aerospace industry is in a constant state of evolution, encountering fresh challenges amid technological advancements and shifting economic landscapes. To maintain a competitive edge, aerospace companies must continually foster innovation and adaptability. In response to a more competitive market and heightened customer expectations, the aerospace sector is directing its investments toward pioneering technologies capable of curbing project development costs. The early phases of aircraft development hold paramount significance, exerting great influence on overall project outcomes, and thus stand to gain immensely from novel processes and design methodologies. The accurate and comprehensive definition of aircraft geometry constitutes a critical requirement for a multitude of analyses, profoundly impacting the final aircraft design. Unfortunately, these detailed descriptions are often elusive during the initial project stages. Among the array of methodologies available, Knowledge-Based Engineering (KBE) emerges as an interesting approach for streamlining product development timelines and cost structures by capitalizing on pre-existing knowledge drawn from analogous projects. Harnessing DLR’s innovative KBE framework, known as Codex, a set of geometric verification rules have been crafted to bolster early-stage fuel system design. Codex, facilitated by semantic-web technologies, empowers the development of domain-specific languages and tools and seamlessly integrates them into the framework. In conjunction with a suite of geometry creation rules aimed at enriching the geometric information accessible to designers during the initial phases, the GeoVerification tool equips engineers with valuable data concerning the correctness of the fuel system’s geometry. In addition, a series of airworthiness requirements and guidelines have been implemented, demonstrating the potential for automating the usage of this knowledge. An important feature, seamlessly integrated into the tool, resides in its capacity to evaluate diverse scenarios of uncontained engine burst failures (UERFs) and provide remedial design actions. Efforts have culminated in the development and integration of 25 verification rules within the GeoVerification Tool. To illustrate the tool's capabilities, a fuel system architecture for a twin-engine small-medium range aircraft was designed. Leveraging existing geometric data pertaining to the aircraft's structure, detailed geometric characterizations of the fuel tanks were achieved. Moreover, spar-mounted and bottom-mounted fuel boost pumps, supply lines, fuel valves, and a simplified crossfeed subsystem were designed. Over 500 requirements' checks were executed in under four minutes using a standard personal computer for the fuel system architecture. By embracing a parametric design approach and leveraging the developed geometry creation rules, including additional rules aimed at supporting component placement automation, geometric incompatibilities were minimized. Furthermore, the tool's results, stored within the knowledge graph, facilitate the efficient management of design anomalies and the provision of corrective actions

    Structured arrows : a type-based framework for structured parallelism

    Get PDF
    This thesis deals with the important problem of parallelising sequential code. Despite the importance of parallelism in modern computing, writing parallel software still relies on many low-level and often error-prone approaches. These low-level approaches can lead to serious execution problems such as deadlocks and race conditions. Due to the non-deterministic behaviour of most parallel programs, testing parallel software can be both tedious and time-consuming. A way of providing guarantees of correctness for parallel programs would therefore provide significant benefit. Moreover, even if we ignore the problem of correctness, achieving good speedups is not straightforward, since this generally involves rewriting a program to consider a (possibly large) number of alternative parallelisations. This thesis argues that new languages and frameworks are needed. These language and frameworks must not only support high-level parallel programming constructs, but must also provide predictable cost models for these parallel constructs. Moreover, they need to be built around solid, well-understood theories that ensure that: (a) changes to the source code will not change the functional behaviour of a program, and (b) the speedup obtained by doing the necessary changes is predictable. Algorithmic skeletons are parametric implementations of common patterns of parallelism that provide good abstractions for creating new high-level languages, and also support frameworks for parallel computing that satisfy the correctness and predictability requirements that we require. This thesis presents a new type-based framework, based on the connection between structured parallelism and structured patterns of recursion, that provides parallel structures as type abstractions that can be used to statically parallelise a program. Specifically, this thesis exploits hylomorphisms as a single, unifying construct to represent the functional behaviour of parallel programs, and to perform correct code rewritings between alternative parallel implementations, represented as algorithmic skeletons. This thesis also defines a mechanism for deriving cost models for parallel constructs from a queue-based operational semantics. In this way, we can provide strong static guarantees about the correctness of a parallel program, while simultaneously achieving predictable speedups.“This work was supported by the University of St Andrews (School of Computer Science); by the EU FP7 grant “ParaPhrase:Parallel Patterns Adaptive Heterogeneous Multicore Systems” (n. 288570); by the EU H2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications - a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation Science and Technology); and by EPSRC grant “Discovery: Pattern Discovery and Program Shaping for Manycore Systems” (EP/P020631/1)” -- Acknowledgement

    Methods and tools for the integration of formal verification in domain-specific languages

    Get PDF
    Domain specific Modeling Languages (DSMLs) are increasingly used at the early phases in the development of complex systems, in particular, for safety critical systems. The goal is to be able to reason early in the development on these models and, in particular, to fulfill verification and validation activities (V and V). A widely used technique is the exhaustive behavioral model verification using model-checking by providing a translational semantics to build a formal model from DSML conforming models in order to reuse powerful tools available for this formal domain. Defining a translational semantics, expressing formal properties to be assessed and analysing such verification results require such an expertise in formal methods that it restricts their adoption and may discourage the designers. It is thus necessary to build for each DSML, a toolchain which hides formal aspects for DSML end-users. The goal of this thesis consists in easing the development of such verification toolchains. Our contribution includes 1) expressing behavioral properties in the DSML level by relying on TOCL (Temporal Object Constraint Language), a temporal extension of OCL; 2) An automated transformation of these properties on formal properties while reusing the key elements of the translational semantics; 3) the feedback of verification results thanks to a higher-order transformation and a language which defines mappings between DSML and formal levels; 4) the associated process implementation. Our approach was validated by the experimentation on a subset of the development process modeling language SPEM, and on Ladder Diagram language used to specify programmable logic controllers (PLCs), and by the integration of a formal intermediate language (FIACRE) in the verification toolchain. This last point allows to reduce the semantic gap between DSMLs and formal domains

    Flexible Modellerweiterung und Optimierung von Erdbebensimulationen

    Get PDF
    Simulations of realistic earthquake scenarios require scalable software and extensive supercomputing resources. With increasing fidelity in simulations, advanced rheological and source models need to be incorporated. I introduce a domain-specific language in order to handle the model flexibility in combination with the high efficiency requirements. The contributions in this thesis enabled the to date largest and longest dynamic rupture simulation of the 2004 Sumatra earthquake.Realistische Erdbebensimulationen benötigen skalierbare Software und beträchtliche Rechenressourcen. Mit zunehmender Genauigkeit der Simulationen müssen fortschrittliche rheologische und Quellmodelle integriert werden. Ich führe eine domänenspezifische Sprache ein, um die Modelflexibilität in Kombination mit den hohen Effizienzanforderungen zu beherrschen. Die Beiträge in dieser Arbeit haben die bisher größte und längste dynamische Bruchsimulation des Sumatra-Erdbebens von 2004 ermöglicht

    Principles of Security and Trust

    Get PDF
    This open access book constitutes the proceedings of the 8th International Conference on Principles of Security and Trust, POST 2019, which took place in Prague, Czech Republic, in April 2019, held as part of the European Joint Conference on Theory and Practice of Software, ETAPS 2019. The 10 papers presented in this volume were carefully reviewed and selected from 27 submissions. They deal with theoretical and foundational aspects of security and trust, including on new theoretical results, practical applications of existing foundational ideas, and innovative approaches stimulated by pressing practical problems

    G-CSC Report 2010

    Get PDF
    The present report gives a short summary of the research of the Goethe Center for Scientific Computing (G-CSC) of the Goethe University Frankfurt. G-CSC aims at developing and applying methods and tools for modelling and numerical simulation of problems from empirical science and technology. In particular, fast solvers for partial differential equations (i.e. pde) such as robust, parallel, and adaptive multigrid methods and numerical methods for stochastic differential equations are developed. These methods are highly adanvced and allow to solve complex problems.. The G-CSC is organised in departments and interdisciplinary research groups. Departments are localised directly at the G-CSC, while the task of interdisciplinary research groups is to bridge disciplines and to bring scientists form different departments together. Currently, G-CSC consists of the department Simulation and Modelling and the interdisciplinary research group Computational Finance
    • …
    corecore