3,259 research outputs found

    Leading Effect of CP Violation with Four Generations

    Full text link
    In the Standard Model with a fourth generation of quarks, we study the relation between the Jarlskog invariants and the triangle areas in the 4-by-4 CKM matrix. To identify the leading effects that may probe the CP violation in processes involving quarks, we invoke small mass and small angle expansions, and show that these leading effects are enhanced considerably compared to the three generation case by the large masses of fourth generation quarks. We discuss the leading effect in several cases, in particular the possibility of large CP violation in bs b \to s processes, which echoes the heightened recent interest because of experimental hints.Comment: 12 pages, no figur

    Efficient Memory Management for GPU-based Deep Learning Systems

    Get PDF
    GPU (graphics processing unit) has been used for many data-intensive applications. Among them, deep learning systems are one of the most important consumer systems for GPU nowadays. As deep learning applications impose deeper and larger models in order to achieve higher accuracy, memory management becomes an important research topic for deep learning systems, given that GPU has limited memory size. Many approaches have been proposed towards this issue, e.g., model compression and memory swapping. However, they either degrade the model accuracy or require a lot of manual intervention. In this paper, we propose two orthogonal approaches to reduce the memory cost from the system perspective. Our approaches are transparent to the models, and thus do not affect the model accuracy. They are achieved by exploiting the iterative nature of the training algorithm of deep learning to derive the lifetime and read/write order of all variables. With the lifetime semantics, we are able to implement a memory pool with minimal fragments. However, the optimization problem is NP-complete. We propose a heuristic algorithm that reduces up to 13.3% of memory compared with Nvidia's default memory pool with equal time complexity. With the read/write semantics, the variables that are not in use can be swapped out from GPU to CPU to reduce the memory footprint. We propose multiple swapping strategies to automatically decide which variable to swap and when to swap out (in), which reduces the memory cost by up to 34.2% without communication overhead

    Efficient Memory Management for GPU-based Deep Learning Systems

    Full text link
    GPU (graphics processing unit) has been used for many data-intensive applications. Among them, deep learning systems are one of the most important consumer systems for GPU nowadays. As deep learning applications impose deeper and larger models in order to achieve higher accuracy, memory management becomes an important research topic for deep learning systems, given that GPU has limited memory size. Many approaches have been proposed towards this issue, e.g., model compression and memory swapping. However, they either degrade the model accuracy or require a lot of manual intervention. In this paper, we propose two orthogonal approaches to reduce the memory cost from the system perspective. Our approaches are transparent to the models, and thus do not affect the model accuracy. They are achieved by exploiting the iterative nature of the training algorithm of deep learning to derive the lifetime and read/write order of all variables. With the lifetime semantics, we are able to implement a memory pool with minimal fragments. However, the optimization problem is NP-complete. We propose a heuristic algorithm that reduces up to 13.3% of memory compared with Nvidia's default memory pool with equal time complexity. With the read/write semantics, the variables that are not in use can be swapped out from GPU to CPU to reduce the memory footprint. We propose multiple swapping strategies to automatically decide which variable to swap and when to swap out (in), which reduces the memory cost by up to 34.2% without communication overhead

    As-Built and Post-treated Microstructures of an Electron Beam Melting (EBM) Produced Nickel-Based Superalloy

    Get PDF
    The microstructures of an electron beam melted (EBM) nickel-based superalloy (Alloy 718) were comprehensively investigated in as-built and post-treated conditions, with particular focus individually on the contour (outer periphery) and hatch (core) regions of the build. The hatch region exhibited columnar grains with strong 〈001〉 texture in the build direction, while the contour region had a mix of columnar and equiaxed grains, with no preferred crystallographic texture. Both regions exhibited nearly identical hardness and carbide content. However, the contour region showed a higher number density of fine carbides compared to the hatch. The as-built material was subjected to two distinct post-treatments: (1) hot isostatic pressing (HIP) and (2) HIP plus heat treatment (HIP + HT), with the latter carried out as a single cycle inside the HIP vessel. Both post-treatments resulted in nearly an order of magnitude decrease in defect content in hatch and contour regions. HIP + HT led to grain coarsening in the contour, but did not alter the microstructure in the hatch region. Different factors that may be responsible for grain growth, such as grain size, grain orientation, grain boundary curvature and secondary phase particles, are discussed. The differences in carbide sizes in the hatch and contour regions appeared to decrease after post-treatment. After HIP + HT, similar higher hardness was observed in both the hatch and contour regions compared to the as-built material
    corecore