52 research outputs found

    Cryptanalysis of the McEliece Cryptosystem on GPGPUs

    Get PDF
    The linear code based McEliece cryptosystem is potentially promising as a so-called post-quantum public key cryptosystem because thus far it has resisted quantum cryptanalysis, but to be considered secure, the cryptosystem must resist other attacks as well. In 2011, Bernstein et al. introduced the Ball Collision Decoding (BCD) attack on McEliece which is a significant improvement in asymptotic complexity over the previous best known attack. We implement this attack on GPUs, which offer a parallel architecture that is well-suited to the matrix operations used in the attack and decrease the asymptotic run-time. Our implementation executes the attack more than twice as fast as the reference implementation and could be used for a practical attack on the original McEliece parameters

    Parallelization and visual analysis of multidimensional fields: Application to ozone production, destruction, and transport in three dimensions

    Get PDF
    Atmospheric modeling is a grand challenge problem for several reasons, including its inordinate computational requirements and its generation of large amounts of data concurrent with its use of very large data sets derived from measurement instruments like satellites. In addition, atmospheric models are typically run several times, on new data sets or to reprocess existing data sets, to investigate or reinvestigate specific chemical or physical processes occurring in the earth's atmosphere, to understand model fidelity with respect to observational data, or simply to experiment with specific model parameters or components

    Steering in computational science: mesoscale modelling and simulation

    Full text link
    This paper outlines the benefits of computational steering for high performance computing applications. Lattice-Boltzmann mesoscale fluid simulations of binary and ternary amphiphilic fluids in two and three dimensions are used to illustrate the substantial improvements which computational steering offers in terms of resource efficiency and time to discover new physics. We discuss details of our current steering implementations and describe their future outlook with the advent of computational grids.Comment: 40 pages, 11 figures. Accepted for publication in Contemporary Physic

    A Survey of Microarchitectural Timing Attacks and Countermeasures on Contemporary Hardware

    Get PDF
    Microarchitectural timing channels expose hidden hardware states though timing. We survey recent attacks that exploit microarchitectural features in shared hardware, especially as they are relevant for cloud computing. We classify types of attacks according to a taxonomy of the shared resources leveraged for such attacks. Moreover, we take a detailed look at attacks used against shared caches. We survey existing countermeasures. We finally discuss trends in attacks, challenges to combating them, and future directions, especially with respect to hardware support

    Toward least-privilege isolation for software

    Get PDF
    Hackers leverage software vulnerabilities to disclose, tamper with, or destroy sensitive data. To protect sensitive data, programmers can adhere to the principle of least-privilege, which entails giving software the minimal privilege it needs to operate, which ensures that sensitive data is only available to software components on a strictly need-to-know basis. Unfortunately, applying this principle in practice is dif- �cult, as current operating systems tend to provide coarse-grained mechanisms for limiting privilege. Thus, most applications today run with greater-than-necessary privileges. We propose sthreads, a set of operating system primitives that allows �ne-grained isolation of software to approximate the least-privilege ideal. sthreads enforce a default-deny model, where software components have no privileges by default, so all privileges must be explicitly granted by the programmer. Experience introducing sthreads into previously monolithic applications|thus, partitioning them|reveals that enumerating privileges for sthreads is di�cult in practice. To ease the introduction of sthreads into existing code, we include Crowbar, a tool that can be used to learn the privileges required by a compartment. We show that only a few changes are necessary to existing code in order to partition applications with sthreads, and that Crowbar can guide the programmer through these changes. We show that applying sthreads to applications successfully narrows the attack surface by reducing the amount of code that can access sensitive data. Finally, we show that applications using sthreads pay only a small performance overhead. We applied sthreads to a range of applications. Most notably, an SSL web server, where we show that sthreads are powerful enough to protect sensitive data even against a strong adversary that can act as a man-in-the-middle in the network, and also exploit most code in the web server; a threat model not addressed to date

    Distributed data structure for factored operating systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 151-158).Future computer architectures will likely exhibit increased parallelism through the addition of more processor cores. Architectural trends such as exponentially increasing parallelism and the possible lack of scalable shared memory motivate the reevaluation of operating system design. This thesis work takes place in the context of Factored Operating Systems which leverage distributed system ideas to increase the scalability of multicore processor operating systems. fos, a Factored Operating System, explores a new design point for operating systems where traditional low-level operating system services are fine-grain parallelized while internally only using explicit message passing for communication. fos factors an operating system first by system service and then further parallelizes inside of the system service by splitting the service into a fleet of server processes which communicate via messaging. Constructing parallel low-level operating system services which only internally use messaging is challenging because shared resources must be partitioned across servers and the services must provide scalable performance when met with uneven demand. To ease the construction of parallel fos system services, this thesis develops the dPool distributed data structure. The dPool data structure provides concurrent access to an unordered collection of elements by server processes within a fos fleet. Internal to a single dPool instance, all communication between different portions of a dPool is done via messaging. This thesis uses the dPool data structure within the parallel fos Physical Memory Allocation fleet and demonstrates that it is possible to use a dPool to manage shared state in a factored operating system's physical page allocator. This thesis begins by presenting the design of the prototype fos operating system. In the context of fos system service fleets, this thesis describes the dPool data structure, its design, different implementations, and interfaces. The dPool data structure is shown to achieve scalability across even and uneven micro-benchmark workloads. This thesis shows that common parallel and distributed programming techniques apply to the creation of dPool and that background threads within a dPool can increase performance. Finally, this thesis evaluates different dPool implementations and demonstrates that intelligently pushing elements between dPool parts can increase scalability.by David Wentzlaff.Ph.D

    Software Transactional Memory Building Blocks

    Get PDF
    Exploiting thread-level parallelism has become a part of mainstream programming in recent years. Many approaches to parallelization require threads executing in parallel to also synchronize occassionally (i.e., coordinate concurrent accesses to shared state). Transactional Memory (TM) is a programming abstraction that provides the concept of database transactions in the context of programming languages such as C/C++. This allows programmers to only declare which pieces of a program synchronize without requiring them to actually implement synchronization and tune its performance, which in turn makes TM typically easier to use than other abstractions such as locks. I have investigated and implemented the building blocks that are required for a high-performance, practical, and realistic TM. They host several novel algorithms and optimizations for TM implementations, both for current hardware and future hardware extensions for TM, and are being used in or have influenced commercial TM implementations such as the TM support in GCC

    Automated Analysis of Weak Memory Models

    Get PDF
    Software verification is considered to be a hard computational problem vulnerable to the state explosion problem. Concurrent software verification raises the complexity of the problem to a power determined by all the possible interleavings of states of the system. Moreover, the architecture of a modern shared-memory multi-core processor and optimisations performed by a compiler can cause program behaviour that is unexpected from the point of view of traditional concurrency. The guarantees that an execution environment can provide to a programmer are formalised in its Weak Memory Model (WMM). Over the last decade, weak memory models were defined for multiple hardware architectures and programming languages. This opens new challenges in software verification with respect to a weak memory model. Most existing tools that perform memory model-aware software analysis tools examine behaviours of the program against a single memory model. The first tool that analyses the portability of a concurrent program from one platform to another is Porthos [PFH+17a] released in April 2017. Porthos can verify that the program is portable from the source platform S to the target platform T by checking that the program has no extra states under T. For that, it performs an SMT-based bounded reachability analysis by encoding the constraints of the program and two memory models M_S and M_T into a single SMT-formula. Although the approach has been proven to be efficient, the tool accepts as input the small C-like toy language. Current thesis aims to rework Porthos by extending its input language, so that it is able to process real-world C programs. However, current implementation of Porthos can be considered hardly extensible, which raises the need to redesign its whole architecture in order to increase the robustness, transparency, efficiency and extensibility. The result of the work is PorthosC, a framework for SMT-based memory model-aware analysis

    Domain architecture a design framework for system development and integration

    Get PDF
    The ever growing complexity of software systems has revealed many short-comings in existing software engineering practices and has raised interest in architecture-driven software development. A system\u27s architecture provides a model of the system that suppresses implementation detail, allowing the architects to concentrate on the analysis and decisions that are most critical to structuring the system to satisfy its requirements. Recently, interests of researchers and practi-tioners have shifted from individual system architectures to architectures for classes of software systems which provide more general, reusable solutions to the issues of overall system organization, interoperability, and allocation of services to system components. These generic architectures, such as product line architectures and domain architectures, promote reuse and interoperability, and create a basis for cost effective construction of high-quality systems. Our focus in this dissertation is on domain architectures as a means of development and integration of large-scale, domain-specific business software systems. Business imperatives, including flexibility, productivity, quality, and ability to adapt to changes, have fostered demands for flexible, coherent and enterprise--wide integrated business systems. The components of such systems, developed separately or purchased off the shelf, need to cohesively form an overall compu-tational environment for the business. The inevitable complexity of such integrated solutions and the highly-demanding process of their construction, management, and evolution support require new software engineering methodologies and tools. Domain architectures, prescribing the organization of software systems in a business domain, hold a promise to serve as a foundation on which such integrated business systems can be effectively constructed. To meet the above expectations, software architectures must be properly defined, represented, and applied, which requires suitable methodologies as well as process and tool support. Despite research efforts, however, state-of-the-art methods and tools for architecture-based system development do not yet meet the practical needs of system developers. The primary focus of this dissertation is on developing methods and tools to support domain architecture engineering and on leveraging architectures to achieve improved system development and integration in presence of increased complexity. In particular, the thesis explores issues related to the following three aspects of software technology: system complexity and software architectures as tools to alleviate complexity; domain architectures as frameworks for construction of large scale, flexible, enterprise-wide software systems; and architectural models and representation techniques as a basis for good” design. The thesis presents an archi-tectural taxonomy to help categorize and better understand architectural efforts. Furthermore, it clarifies the purpose of domain architectures and characterizes them in detail. To support the definition and application of domain architectures we have developed a method for domain architecture engineering and representation: GARM-ASPECT. GARM, the Generic Architecture Reference Model, underlying the method, is a system of modeling abstractions, relations and recommendations for building representations of reference software architectures. The model\u27s focus on reference and domain architectures determines its main distinguishing features: multiple views of architectural elements, a separate rule system to express constraints on architecture element types, and annotations such as “libraries” of patterns and “logs” of guidelines. ASPECT is an architecture description language based on GARM. It provides a normalized vocabulary for representing the skeleton of an architecture, its structural view, and establishes a framework for capturing archi-tectural constraints. It also allows extensions of the structural view with auxiliary information, such as behavior or quality specifications. In this respect, ASPECT provides facilities for establishing relationships among different specifications and gluing them together within an overall architectural description. This design allows flexibility and adaptability of the methodology to the specifics of a domain or a family of systems. ASPECT supports the representation of reference architectures as well as individual system architectures. The practical applicability of this method has been tested through a case study in an industrial setting. The approach to architecture engineering and representation, presented in this dissertation, is pragmatic and oriented towards software practitioners. GARM-ASPECT, as well as the taxonomy of architectures are of use to architects, system planners and system engineers. Beyond these practical contributions, this thesis also creates a more solid basis for expbring the applicability of architectural abstractions, the practicality of representation approaches, and the changes required to the devel-opment process in order to achieve the benefits from an architecture-driven software technology

    Parametric micro-level performance models for parallel computing and parallel implementation of hydrostatic MM5

    Get PDF
    This dissertation presents Parametric micro-level performance models and Parallel implementation of the hydrostatic version of MM5;Parametric micro-level (PM) performance models are introduced to address the important issue of how to realistically model parallel performance. These models can be used to predict execution times and identify performance bottlenecks. The accurate prediction and analysis of execution times is achieved by incorporating precise details of interprocessor communication, memory operations, auxiliary instructions, and effects of communication and computation schedules. The parameters provide the flexibility to study various algorithmic and architectural issues. The development and verification process, parameters and the scope of applicability of these models are discussed. A coherent view of performance is obtained from the execution profiles generated by PM models. The models are targeted at a large class numerical algorithms commonly implemented on both SIMD and MIMD machines. Specific models are presented for matrix multiplication, LU decomposition, and FFT on a 2-D processor array with distributed memory. A case study includes comparison of parallel machines and parallel algorithms. In a comparison of parallel machines, PM models are used to analyze execution times so as to relate the performance to architectural attributes of a machine. In a comparison of parallel algorithms, PM models are used to study performance of two LU decomposition algorithms: non-blocked and blocked. Two algorithms are compared to identify the tradeoffs between them. This analysis is useful to determine an optimum block size for the blocked algorithm. The case study is done on MasPar MP-1 and MP-2 machines;The dissertation also describes the parallel implementation of the hydrostatic version of MM5 (the fifth generation of Mesoscale Model), which has been widely used for climate studies. The model was parallelized in machine-independent manner using the Runtime System Library (RSL), a runtime library for handling message-passing and index transformation. The dissertation discusses validation of the parallel implementation of MM5 using field data and presents performance results. The parallel model was tested on the IBM SP1, a distributed memory parallel computer
    • …
    corecore