142 research outputs found
swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture
The flourish of deep learning frameworks and hardware platforms has been
demanding an efficient compiler that can shield the diversity in both software
and hardware in order to provide application portability. Among the exiting
deep learning compilers, TVM is well known for its efficiency in code
generation and optimization across diverse hardware devices. In the meanwhile,
the Sunway many-core processor renders itself as a competitive candidate for
its attractive computational power in both scientific and deep learning
applications. This paper combines the trends in these two directions.
Specifically, we propose swTVM that extends the original TVM to support
ahead-of-time compilation for architecture requiring cross-compilation such as
Sunway. In addition, we leverage the architecture features during the
compilation such as core group for massive parallelism, DMA for high bandwidth
memory transfer and local device memory for data locality, in order to generate
efficient code for deep learning application on Sunway. The experimental
results show the ability of swTVM to automatically generate code for various
deep neural network models on Sunway. The performance of automatically
generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup
on average than hand-optimized OpenACC implementations on convolution and fully
connected layers respectively. This work is the first attempt from the compiler
perspective to bridge the gap of deep learning and high performance
architecture particularly with productivity and efficiency in mind. We would
like to open source the implementation so that more people can embrace the
power of deep learning compiler and Sunway many-core processor
Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape
Synthesizing novel 3D models that resemble the input example has long been
pursued by researchers and artists in computer graphics. In this paper, we
present Sin3DM, a diffusion model that learns the internal patch distribution
from a single 3D textured shape and generates high-quality variations with fine
geometry and texture details. Training a diffusion model directly in 3D would
induce large memory and computational cost. Therefore, we first compress the
input into a lower-dimensional latent space and then train a diffusion model on
it. Specifically, we encode the input 3D textured shape into triplane feature
maps that represent the signed distance and texture fields of the input. The
denoising network of our diffusion model has a limited receptive field to avoid
overfitting, and uses triplane-aware 2D convolution blocks to improve the
result quality. Aside from randomly generating new samples, our model also
facilitates applications such as retargeting, outpainting and local editing.
Through extensive qualitative and quantitative evaluation, we show that our
model can generate 3D shapes of various types with better quality than prior
methods.Comment: Project page: https://Sin3DM.github.io, Code:
https://github.com/Sin3DM/Sin3D
Coherence memory and amnesia in a mode-locked laser
Self-organization of temporal modes in mode-locked lasers usually starts from
quantum noise. In this process, incoherent spontaneous emission is steered into
coherent ultrashort pulses by dissipation and nonlinearity. In this work, we
investigated self-organization dynamics in a mode-locked Mamyshev oscillator
starting from coherent pulse seeds as opposed to quantum noise. We observed
that the coherence of the seed can be remembered or forgotten depending on the
initial inverse population. The excessive nonlinearity in the coherence amnesia
regime can devastate the seed coherence, causing the oscillator to undergo a
chaotic transition lasting hundreds of round trips before regaining coherence.
Conversely, the oscillator converges in only a few round trips for the
coherence memory regime. A heterodyne technique was developed to record the
fast varying optical phase and characterize these two regimes. Dissipative
soliton molecules were synthesized from external pulse pair seeds via the
coherence memory pathway. In this case, a plateau of the generated pulse
spacing independent from seed pulse spacing, i.e., amnesia of the seed spacing,
was observed for close spaced seed pulse pairs. Moreover, we show that pulse
seeds can be used for laser reconfiguration and pulse pattern control. Our work
paves a way to control transient pulse dynamics and steady pulse forms on
demand in mode-locked lasers
Pac-Sim: Simulation of Multi-threaded Workloads using Intelligent, Live Sampling
High-performance, multi-core processors are the key to accelerating workloads
in several application domains. To continue to scale performance at the limit
of Moore's Law and Dennard scaling, software and hardware designers have turned
to dynamic solutions that adapt to the needs of applications in a transparent,
automatic way. For example, modern hardware improves its performance and power
efficiency by changing the hardware configuration, like the frequency and
voltage of cores, according to a number of parameters such as the technology
used, the workload running, etc. With this level of dynamism, it is essential
to simulate next-generation multi-core processors in a way that can both
respond to system changes and accurately determine system performance metrics.
Currently, no sampled simulation platform can achieve these goals of dynamic,
fast, and accurate simulation of multi-threaded workloads.
In this work, we propose a solution that allows for fast, accurate simulation
in the presence of both hardware and software dynamism. To accomplish this
goal, we present Pac-Sim, a novel sampled simulation methodology for fast,
accurate sampled simulation that requires no upfront analysis of the workload.
With our proposed methodology, it is now possible to simulate long-running
dynamically scheduled multi-threaded programs with significant simulation
speedups even in the presence of dynamic hardware events. We evaluate Pac-Sim
using the multi-threaded SPEC CPU2017, NPB, and PARSEC benchmarks with both
static and dynamic thread scheduling. The experimental results show that
Pac-Sim achieves a very low sampling error of 1.63% and 3.81% on average for
statically and dynamically scheduled benchmarks, respectively. Pac-Sim also
demonstrates significant simulation speedups as high as 523.5
(210.3 on average) for the train input set of SPEC CPU2017.Comment: 14 pages, 14 figure
Bench surgery with autotransplantation for bilateral Wilms tumor—A feasible technique for renal sinus invasion
PurposeBilateral Wilms tumor (BWT) with renal sinus invasion requires extremely difficult surgical care. This study presents an alternative strategy for tumor removal while at the same time preserving the renal parenchyma.Materials and methodsIn total, 9 cases of synchronous BWT were admitted to our hospital between May 2016 to Aug 2020. We retrospectively reviewed the clinical data, surgical technique, and functional and oncological outcomes of these cases.ResultsThe 9 cases included 3 males and 6 females, with a median age of 12 months at surgery (range 7–40). A total of 14 kidney units had renal sinus invasion (77.8%), whereas multifocal neoplasms were observed in 7 units (38.9%). The local stage distribution revealed 1 kidney with stage I, 10 kidneys with stage II, and 7 kidneys with stage III. Nephron-sparing surgery was performed on 15 kidney units (83.3%), among which 13 (72.2%) underwent bench surgery with autotransplantation (BS-AT), whereas 2 (11.1%) were subjected to tumor enucleation in vivo. Urinary leakage was the most prevalent postoperative complication. We observed negative margins. During the mean follow-up of 28.4 months, 2 patients (22.2%) succumbed from sepsis and renal failure, respectively, whereas the other 7 (77.8%) survived without recurrence. Survivors experienced an estimated glomerular filtration rate of 81 ± 15.4 ml/(min × 1.73 m2). The endpoint renal volume of 9 renal units receiving BS-AT significantly increased (P = 0.02).ConclusionsIn summary, the surgical management of bilateral Wilms tumor requires meticulous operative approach and technique. Besides, BS-AT provides a viable alternative to nephron-sparing surgery for BWT patients with renal sinus invasion
Intelligent-Unrolling: Exploiting Regular Patterns in Irregular Applications
Modern optimizing compilers are able to exploit memory access or computation
patterns to generate vectorization codes. However, such patterns in irregular
applications are unknown until runtime due to the input dependence. Thus,
either compiler's static optimization or profile-guided optimization based on
specific inputs cannot predict the patterns for any common input, which leads
to suboptimal code generation. To address this challenge, we develop
Intelligent-Unroll, a framework to automatically optimize irregular
applications with vectorization. Intelligent-Unroll allows the users to depict
the computation task using \textit{code seed} with the memory access and
computation patterns represented in \textit{feature table} and
\textit{information-code tree}, and generates highly efficient codes.
Furthermore, Intelligent-Unroll employs several novel optimization techniques
to optimize reduction operations and gather/scatter instructions. We evaluate
Intelligent-Unroll with sparse matrix-vector multiplication (SpMV) and graph
applications. Experimental results show that Intelligent-Unroll is able to
generate more efficient vectorization codes compared to the state-of-the-art
implementations
Impact of Vehicular Countdown Signals on Driving Psychologies and Behaviors: Taking China as an Example
Countdown signal control is a relatively new control mode that can inform a driver in advance about the remaining time to pass through intersections or the time needed to wait for other drivers and pedestrians. At present, few countries apply vehicular countdown signals. However, in China, some cities have applied vehicular countdown signals for years, though it is unclear how and how much such signals influence driving psychologies and behaviors compared with non-countdown signal controls. The present work aims to clarify the impact of vehicular countdown signals on driving psychologies and behaviors on the cognitive level. A questionnaire survey with 32 questions about driving psychologies and behaviors was designed, and an online survey was conducted. A total of 1051 valid questionnaires were received. The survey data were analyzed, and the main results indicate that most of the surveyed drivers prefer countdown signal controls and think that such controls can improve not only traffic safety but also traffic operational efficiency. The surveyed drivers also think that countdown signal controls have an impact on driving psychologies and behaviors and the survey results have demonstrated that the driving behaviors of female drivers surveyed are not conservative under the clear conditions of green countdown signal control. Further studies and methods concerning the effects of countdown signals on driving psychologies and behaviors are discussed.
Document type: Articl
- …