88 research outputs found
Recommended from our members
Higher-Order Calculations in Quantum Chromodynamics
In this thesis, several techniques and advances in higher-order Quantum Chromodynamics (QCD) calculations are presented. There is a particular focus on 2-loop 5-point massless QCD amplitudes, which are currently at the frontier of higher-order QCD calculations.
Firstly, we study the Brodsky-Lepage-Mackenzie/Principle of Maximum Conformality (BLM/PMC) method for setting the renormalisation scale, μ_R, in higher-order QCD calculations. We identify three ambiguities in the BLM/PMC procedure and study their numerical impact using the example of the total cross-section for top-pair production at Next-to-Next-to-Leading Order (NNLO) in QCD. The numerical impact of these ambiguities on the BLM/PMC prediction for the cross-section is found to be comparable to the impact of the choice of μ_R in the conventional scale-setting approach.
Secondly, we introduce a novel strategy for solving integration-by-parts (IBP) identities, which are widely used in the computation of multi-loop QCD amplitudes. We implement the strategy in an efficient C++ program and hence solve the IBP identities needed for the computation of any planar 2-loop 5-point massless amplitude in QCD. We also derive representative results for the most complicated non-planar family of integrals.
Thirdly, we present an automated computational framework to reduce 2-loop 5-point massless amplitudes to a basis of pentagon functions. It uses finite-field evaluation and interpolation techniques, as well as the aforementioned analytical IBP results. We use this to calculate the leading-colour 2-loop QCD amplitude for qq̄→γγγ and then compute the NNLO QCD corrections to 3-photon production at the LHC. This is the first NNLO QCD calculation for a 2→3 process. We compare our predictions with the available 8 TeV measurements from the ATLAS collaboration and we find that the inclusion of the NNLO corrections eliminates the existing significant discrepancy with respect to NLO QCD predictions, paving the way for precision phenomenology in this process
Language and compiler support for stream programs
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 153-166).Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in networking, encryption, and other areas. Stream programs can be naturally represented as a graph of independent actors that communicate explicitly over data channels. In this work we focus on programs where the input and output rates of actors are known at compile time, enabling aggressive transformations by the compiler; this model is known as synchronous dataflow. We develop a new programming language, StreamIt, that empowers both programmers and compiler writers to leverage the unique properties of the streaming domain. StreamIt offers several new abstractions, including hierarchical single-input single-output streams, composable primitives for data reordering, and a mechanism called teleport messaging that enables precise event handling in a distributed environment. We demonstrate the feasibility of developing applications in StreamIt via a detailed characterization of our 34,000-line benchmark suite, which spans from MPEG-2 encoding/decoding to GMTI radar processing. We also present a novel dynamic analysis for migrating legacy C programs into a streaming representation. The central premise of stream programming is that it enables the compiler to perform powerful optimizations. We support this premise by presenting a suite of new transformations. We describe the first translation of stream programs into the compressed domain, enabling programs written for uncompressed data formats to automatically operate directly on compressed data formats (based on LZ77). This technique offers a median speedup of 15x on common video editing operations.(cont.) We also review other optimizations developed in the StreamIt group, including automatic parallelization (offering an 11x mean speedup on the 16-core Raw machine), optimization of linear computations (offering a 5.5x average speedup on a Pentium 4), and cache-aware scheduling (offering a 3.5x mean speedup on a StrongARM 1100). While these transformations are beyond the reach of compilers for traditional languages such as C, they become tractable given the abundant parallelism and regular communication patterns exposed by the stream programming model.by William Thies.Ph.D
A computing origami: Optimized code generation for emerging parallel platforms
Ph.DDOCTOR OF PHILOSOPH
Program Analysis Based Approaches to Ensure Security and Safety of Emerging Software Platforms
Our smartphones, homes, hospitals, and automobiles are being enhanced with software that provide an unprecedentedly rich set of functionalities, which has created an enormous market for the development of software that run on almost every personal computing devices in a person's daily life, including security- and safety-critical ones. However, the software development support provided by the emerging platforms also raises security risks by allowing untrusted third-party code, which can potentially be buggy, vulnerable or even malicious to control user's device. Moreover, as the Internet-of-Things (IoT) technology is gaining vast adoptions by a wide range of industries, and is penetrating every aspects of people's life, safety risks brought by the open software development support of the emerging IoT platform (e.g., smart home) could bring more severe threat to the well-being of customers than what security vulnerabilities in mobile apps have done to a cell phone user.
To address this challenge posed on the software security in emerging domains, my dissertation focuses on the flaws, vulnerabilities and malice in the software developed for platforms in these domains. Specifically, we demonstrate that systematic program analyses of software (1) Lead to an understanding of design and implementation flaws across different platforms that can be leveraged in miscellaneous attacks or causing safety problems; (2) Lead to the development of security mechanisms that limit the potential for these threats.We contribute static and dynamic program analysis techniques for three modern platforms in emerging domains -- smartphone, smart home, and autonomous vehicle. Our app analysis reveals various different vulnerabilities and design flaws on these platforms, and we propose (1) static analysis tool OPAnalyzer to automates the discovery of problems by searching for vulnerable code patterns; (2) dynamic testing tool AutoFuzzer to efficiently produce and capture domain specific issues that are previously undefined; and (3) propose new access control mechanism ContexIoT to strengthen the platform's immunity to the vulnerability and malice in third-party software.
Concretely, we first study a vulnerability family caused by the open ports on mobile devices, which allows remote exploitation due to insufficient protection. We devise a tool called OPAnalyzer to perform the first systematic study of open port usage and their security implications on mobile platform, which effectively identify and characterize vulnerable open port usage at scale in popular Android apps. We further identify the lack of context-based access control as a main enabler for such attacks, and begin to seek for defense solution to strengthen the system security. We study the popular smart home platform, and find the existing access control mechanisms to be coarse-grand, insufficient, and undemanding. Taking lessons from previous permission systems, we propose the ContexIoT approach, a context-based permission system for IoT platform that supports third-party app development, which protects the user from vulnerability and malice in these apps through fine-grained identification of context. Finally, we design dynamic fuzzing tool, AutoFuzzer for the testing of self-driving functionalities, which demand very high code quality using improved testing practice combining the state-of-the-art fuzzing techniques with vehicular domain knowledge, and discover problems that lead to crashes in safety-critical software on emerging autonomous vehicle platform.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145845/1/jackjia_1.pd
Semantic-Preserving Transformations for Stream Program Orchestration on Multicore Architectures
Because the demand for high performance with big data processing and distributed computing is increasing, the stream programming paradigm has been revisited for its abundance of parallelism in virtue of independent actors that communicate via data channels. The synchronous data-flow (SDF) programming model is frequently adopted with stream programming languages for its convenience to express stream programs as a set of nodes connected by data channels. Static data-rates of SDF programming model enable program transformations that greatly improve the performance of SDF programs on multicore architectures. The major application domain is for SDF programs are digital signal processing, audio, video, graphics kernels, networking, and security. This thesis makes the following three contributions that improve the performance of SDF programs: First, a new intermediate representation (IR) called LaminarIR is introduced. LaminarIR replaces FIFO queues with direct memory accesses to reduce the data communication overhead and explicates data dependencies between producer and consumer nodes. We provide transformations and their formal semantics to convert conventional, FIFO-queue based program representations to LaminarIR. Second, a compiler framework to perform sound and semantics-preserving program transformations from FIFO semantics to LaminarIR. We employ static program analysis to resolve token positions in FIFO queues and replace them by direct memory accesses. Third, a communication-cost-aware program orchestration method to establish a foundation of LaminarIR parallelization on multicore architectures. The LaminarIR framework, which consists of the aforementioned contributions together with the benchmarks that we used with the experimental evaluation, has been open-sourced to advocate further research on improving the performance of stream programming languages
Recommended from our members
Model-based resource management for fine-grained services
The emergence of DevOps has changed the way modern distributed software systems are developed. Architectures decomposed in fine-grained services, such as microservices or function-as-a-service (FaaS), are now widespread across many organizations. From a resource management perspective, although the systems built with such architectures have many benefits, there are still research challenges that need further attention. In this study, we have focused on three such challenges, each concerning a specific system resource: compute, memory, or storage. Firstly, we focus on scaling the capacity of microservices at runtime. Here, the challenge is to design an autoscaler that can decide between vertical and horizontal scaling options to distribute the CPU capacity. Secondly, we focus on estimating the required capacity of an on-premises FaaS platform such that the service level agreements (SLAs) for function response times are satisfied. The challenge here is to address the cold start dilemma, i.e., that a cold start delays a function response but reduces the memory consumption. Thus, we must find a limit of cold starts such that the memory-consumption remains in-check while satisfying the SLAs. Finally, we focus on the storage management for distributed tracing targeted at microservices. The volume of such traces generated in a data center can be in the scale of tens of terabytes per day, but only a small fraction of these traces is useful for troubleshooting. The objective then is to sample only the useful traces. The key to addressing all these challenges is first, modeling the dynamics concerning the resources and subsequently, leveraging the model in a resource controller. To address the first challenge, we have developed an autoscaler ATOM that leverages layered queueing network (LQN) models to take its scaling decisions. Our experiment, with a real-life application, shows that ATOM produces 30-37% better results than the baseline autoscalers. For the second challenge, we have developed COCOA, a cold start aware capacity planner. COCOA utilizes M/M/k setup and LQN models to assess the cold start scenario and estimate the required capacity. We show with simulation that COCOA can reduce over-provisioning by over 70% compared to the availability aware approaches. Finally, addressing the third challenge, we propose SampleHST, a trace sampler that works under a storage budget constraint. SampleHST relies on either bag of words or graph-based models to represent a trace and groups similar traces using online clustering to perform sampling. We have evaluated the performance of SampleHST using data from both literature and production, which shows it produces 1.2x to 19x better results than the state-of-the-art.Open Acces
Streamroller : A Unified Compilation and Synthesis System for Streaming Applications.
The growing complexity of applications has increased the need for higher processing power. In the embedded domain, the convergence of audio, video, and networking on a handheld device has prompted the need for low cost, low power,and high performance implementations of these applications in the form of custom
hardware. In a more mainstream domain like gaming consoles, the move towards more realism in physics simulations and graphics has forced the industry towards multicore systems. Many of the applications in these domains are streaming in nature. The key challenge is to get efficient implementations of custom hardware from these applications and map these applications efficiently onto multicore architectures.
This dissertation presents a unified methodology, referred to as Streamroller, that can be applied for the problem of scheduling stream programs to multicore architectures and to the problem of automatic synthesis of
custom hardware for stream applications. Firstly, a method called stream-graph modulo scheduling is presented, which maps stream programs effectively onto a multicore architecture. Many aspects of a real system, like
limited memory and explicit DMAs are modeled in the scheduler. The scheduler is evaluated for a set of stream programs on IBM's Cell processor.
Secondly, an automated high-level synthesis system for creating custom hardware for stream applications is presented. The template for the custom hardware is a pipeline of accelerators. The synthesis involves designing loop accelerators for individual kernels, instantiating buffers to store data passed between kernels, and linking these building blocks to form a pipeline. A unique aspect of this system is the use of multifunction accelerators, which improves cost by
efficiently sharing hardware between multiple kernels.
Finally, a method to improve the integer linear program formulations used in the schedulers that exploits symmetry in the solution space is
presented. Symmetry-breaking constraints are added to the formulation, and the performance of the solver is evaluated.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61662/1/kvman_1.pd
Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions
The adoption of hardware accelerators, such as Field-Programmable Gate Arrays,
into general-purpose computation pipelines continues to rise, driven by recent
trends in data collection and analysis as well as pressure from challenging
physical design constraints in hardware. The architectural designs of many of
these accelerators stand in stark contrast to the traditional von Neumann model
of CPUs. Consequently, existing programming languages, maintenance tools, and
techniques are not directly applicable to these devices, meaning that additional
architectural knowledge is required for effective programming and configuration.
Current programming models and techniques are akin to assembly-level programming
on a CPU, thus placing significant burden on developers tasked with using these
architectures. Because programming is currently performed at such low levels of
abstraction, the software development process is tedious and challenging and
hinders the adoption of hardware accelerators.
This dissertation explores the thesis that theoretical finite automata provide a
suitable abstraction for bridging the gap between high-level programming models
and maintenance tools familiar to developers and the low-level hardware
representations that enable high-performance execution on hardware accelerators.
We adopt a principled hardware/software co-design methodology to develop a
programming model providing the key properties that we observe are necessary for success,
namely performance and scalability, ease of use, expressive power, and legacy
support.
First, we develop a framework that allows developers to port existing, legacy
code to run on hardware accelerators by leveraging automata learning algorithms
in a novel composition with software verification, string solvers, and
high-performance automata architectures. Next, we design a domain-specific
programming language to aid programmers writing pattern-searching algorithms and
develop compilation algorithms to produce finite automata, which supports
efficient execution on a wide variety of processing architectures. Then, we
develop an interactive debugger for our new language, which allows developers to
accurately identify the locations of bugs in software while maintaining support
for high-throughput data processing. Finally, we develop two new
automata-derived accelerator architectures to support additional applications,
including the detection of security attacks and the parsing of recursive and
tree-structured data. Using empirical studies, logical reasoning, and
statistical analyses, we demonstrate that our prototype artifacts scale to
real-world applications, maintain manageable overheads, and support developers'
use of hardware accelerators. Collectively, the research efforts detailed in
this dissertation help ease the adoption and use of hardware accelerators for
data analysis applications, while supporting high-performance computation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155224/1/angstadt_1.pd
- …