3,259 research outputs found
Leading Effect of CP Violation with Four Generations
In the Standard Model with a fourth generation of quarks, we study the
relation between the Jarlskog invariants and the triangle areas in the 4-by-4
CKM matrix. To identify the leading effects that may probe the CP violation in
processes involving quarks, we invoke small mass and small angle expansions,
and show that these leading effects are enhanced considerably compared to the
three generation case by the large masses of fourth generation quarks. We
discuss the leading effect in several cases, in particular the possibility of
large CP violation in processes, which echoes the heightened recent
interest because of experimental hints.Comment: 12 pages, no figur
Efficient Memory Management for GPU-based Deep Learning Systems
GPU (graphics processing unit) has been used for many data-intensive
applications. Among them, deep learning systems are one of the most important
consumer systems for GPU nowadays. As deep learning applications impose deeper
and larger models in order to achieve higher accuracy, memory management
becomes an important research topic for deep learning systems, given that GPU
has limited memory size. Many approaches have been proposed towards this issue,
e.g., model compression and memory swapping. However, they either degrade the
model accuracy or require a lot of manual intervention. In this paper, we
propose two orthogonal approaches to reduce the memory cost from the system
perspective. Our approaches are transparent to the models, and thus do not
affect the model accuracy. They are achieved by exploiting the iterative nature
of the training algorithm of deep learning to derive the lifetime and
read/write order of all variables. With the lifetime semantics, we are able to
implement a memory pool with minimal fragments. However, the optimization
problem is NP-complete. We propose a heuristic algorithm that reduces up to
13.3% of memory compared with Nvidia's default memory pool with equal time
complexity. With the read/write semantics, the variables that are not in use
can be swapped out from GPU to CPU to reduce the memory footprint. We propose
multiple swapping strategies to automatically decide which variable to swap and
when to swap out (in), which reduces the memory cost by up to 34.2% without
communication overhead
Efficient Memory Management for GPU-based Deep Learning Systems
GPU (graphics processing unit) has been used for many data-intensive
applications. Among them, deep learning systems are one of the most important
consumer systems for GPU nowadays. As deep learning applications impose deeper
and larger models in order to achieve higher accuracy, memory management
becomes an important research topic for deep learning systems, given that GPU
has limited memory size. Many approaches have been proposed towards this issue,
e.g., model compression and memory swapping. However, they either degrade the
model accuracy or require a lot of manual intervention. In this paper, we
propose two orthogonal approaches to reduce the memory cost from the system
perspective. Our approaches are transparent to the models, and thus do not
affect the model accuracy. They are achieved by exploiting the iterative nature
of the training algorithm of deep learning to derive the lifetime and
read/write order of all variables. With the lifetime semantics, we are able to
implement a memory pool with minimal fragments. However, the optimization
problem is NP-complete. We propose a heuristic algorithm that reduces up to
13.3% of memory compared with Nvidia's default memory pool with equal time
complexity. With the read/write semantics, the variables that are not in use
can be swapped out from GPU to CPU to reduce the memory footprint. We propose
multiple swapping strategies to automatically decide which variable to swap and
when to swap out (in), which reduces the memory cost by up to 34.2% without
communication overhead
As-Built and Post-treated Microstructures of an Electron Beam Melting (EBM) Produced Nickel-Based Superalloy
The microstructures of an electron beam melted (EBM) nickel-based superalloy (Alloy 718) were comprehensively investigated in as-built and post-treated conditions, with particular focus individually on the contour (outer periphery) and hatch (core) regions of the build. The hatch region exhibited columnar grains with strong 〈001〉 texture in the build direction, while the contour region had a mix of columnar and equiaxed grains, with no preferred crystallographic texture. Both regions exhibited nearly identical hardness and carbide content. However, the contour region showed a higher number density of fine carbides compared to the hatch. The as-built material was subjected to two distinct post-treatments: (1) hot isostatic pressing (HIP) and (2) HIP plus heat treatment (HIP + HT), with the latter carried out as a single cycle inside the HIP vessel. Both post-treatments resulted in nearly an order of magnitude decrease in defect content in hatch and contour regions. HIP + HT led to grain coarsening in the contour, but did not alter the microstructure in the hatch region. Different factors that may be responsible for grain growth, such as grain size, grain orientation, grain boundary curvature and secondary phase particles, are discussed. The differences in carbide sizes in the hatch and contour regions appeared to decrease after post-treatment. After HIP + HT, similar higher hardness was observed in both the hatch and contour regions compared to the as-built material
- …