142 research outputs found

    swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

    Full text link
    The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the exiting deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific and deep learning applications. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient code for deep learning application on Sunway. The experimental results show the ability of swTVM to automatically generate code for various deep neural network models on Sunway. The performance of automatically generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup on average than hand-optimized OpenACC implementations on convolution and fully connected layers respectively. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and high performance architecture particularly with productivity and efficiency in mind. We would like to open source the implementation so that more people can embrace the power of deep learning compiler and Sunway many-core processor

    Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape

    Full text link
    Synthesizing novel 3D models that resemble the input example has long been pursued by researchers and artists in computer graphics. In this paper, we present Sin3DM, a diffusion model that learns the internal patch distribution from a single 3D textured shape and generates high-quality variations with fine geometry and texture details. Training a diffusion model directly in 3D would induce large memory and computational cost. Therefore, we first compress the input into a lower-dimensional latent space and then train a diffusion model on it. Specifically, we encode the input 3D textured shape into triplane feature maps that represent the signed distance and texture fields of the input. The denoising network of our diffusion model has a limited receptive field to avoid overfitting, and uses triplane-aware 2D convolution blocks to improve the result quality. Aside from randomly generating new samples, our model also facilitates applications such as retargeting, outpainting and local editing. Through extensive qualitative and quantitative evaluation, we show that our model can generate 3D shapes of various types with better quality than prior methods.Comment: Project page: https://Sin3DM.github.io, Code: https://github.com/Sin3DM/Sin3D

    Coherence memory and amnesia in a mode-locked laser

    Full text link
    Self-organization of temporal modes in mode-locked lasers usually starts from quantum noise. In this process, incoherent spontaneous emission is steered into coherent ultrashort pulses by dissipation and nonlinearity. In this work, we investigated self-organization dynamics in a mode-locked Mamyshev oscillator starting from coherent pulse seeds as opposed to quantum noise. We observed that the coherence of the seed can be remembered or forgotten depending on the initial inverse population. The excessive nonlinearity in the coherence amnesia regime can devastate the seed coherence, causing the oscillator to undergo a chaotic transition lasting hundreds of round trips before regaining coherence. Conversely, the oscillator converges in only a few round trips for the coherence memory regime. A heterodyne technique was developed to record the fast varying optical phase and characterize these two regimes. Dissipative soliton molecules were synthesized from external pulse pair seeds via the coherence memory pathway. In this case, a plateau of the generated pulse spacing independent from seed pulse spacing, i.e., amnesia of the seed spacing, was observed for close spaced seed pulse pairs. Moreover, we show that pulse seeds can be used for laser reconfiguration and pulse pattern control. Our work paves a way to control transient pulse dynamics and steady pulse forms on demand in mode-locked lasers

    Pac-Sim: Simulation of Multi-threaded Workloads using Intelligent, Live Sampling

    Full text link
    High-performance, multi-core processors are the key to accelerating workloads in several application domains. To continue to scale performance at the limit of Moore's Law and Dennard scaling, software and hardware designers have turned to dynamic solutions that adapt to the needs of applications in a transparent, automatic way. For example, modern hardware improves its performance and power efficiency by changing the hardware configuration, like the frequency and voltage of cores, according to a number of parameters such as the technology used, the workload running, etc. With this level of dynamism, it is essential to simulate next-generation multi-core processors in a way that can both respond to system changes and accurately determine system performance metrics. Currently, no sampled simulation platform can achieve these goals of dynamic, fast, and accurate simulation of multi-threaded workloads. In this work, we propose a solution that allows for fast, accurate simulation in the presence of both hardware and software dynamism. To accomplish this goal, we present Pac-Sim, a novel sampled simulation methodology for fast, accurate sampled simulation that requires no upfront analysis of the workload. With our proposed methodology, it is now possible to simulate long-running dynamically scheduled multi-threaded programs with significant simulation speedups even in the presence of dynamic hardware events. We evaluate Pac-Sim using the multi-threaded SPEC CPU2017, NPB, and PARSEC benchmarks with both static and dynamic thread scheduling. The experimental results show that Pac-Sim achieves a very low sampling error of 1.63% and 3.81% on average for statically and dynamically scheduled benchmarks, respectively. Pac-Sim also demonstrates significant simulation speedups as high as 523.5×\times (210.3×\times on average) for the train input set of SPEC CPU2017.Comment: 14 pages, 14 figure

    Bench surgery with autotransplantation for bilateral Wilms tumor—A feasible technique for renal sinus invasion

    Get PDF
    PurposeBilateral Wilms tumor (BWT) with renal sinus invasion requires extremely difficult surgical care. This study presents an alternative strategy for tumor removal while at the same time preserving the renal parenchyma.Materials and methodsIn total, 9 cases of synchronous BWT were admitted to our hospital between May 2016 to Aug 2020. We retrospectively reviewed the clinical data, surgical technique, and functional and oncological outcomes of these cases.ResultsThe 9 cases included 3 males and 6 females, with a median age of 12 months at surgery (range 7–40). A total of 14 kidney units had renal sinus invasion (77.8%), whereas multifocal neoplasms were observed in 7 units (38.9%). The local stage distribution revealed 1 kidney with stage I, 10 kidneys with stage II, and 7 kidneys with stage III. Nephron-sparing surgery was performed on 15 kidney units (83.3%), among which 13 (72.2%) underwent bench surgery with autotransplantation (BS-AT), whereas 2 (11.1%) were subjected to tumor enucleation in vivo. Urinary leakage was the most prevalent postoperative complication. We observed negative margins. During the mean follow-up of 28.4 months, 2 patients (22.2%) succumbed from sepsis and renal failure, respectively, whereas the other 7 (77.8%) survived without recurrence. Survivors experienced an estimated glomerular filtration rate of 81 ± 15.4 ml/(min × 1.73 m2). The endpoint renal volume of 9 renal units receiving BS-AT significantly increased (P = 0.02).ConclusionsIn summary, the surgical management of bilateral Wilms tumor requires meticulous operative approach and technique. Besides, BS-AT provides a viable alternative to nephron-sparing surgery for BWT patients with renal sinus invasion

    Intelligent-Unrolling: Exploiting Regular Patterns in Irregular Applications

    Full text link
    Modern optimizing compilers are able to exploit memory access or computation patterns to generate vectorization codes. However, such patterns in irregular applications are unknown until runtime due to the input dependence. Thus, either compiler's static optimization or profile-guided optimization based on specific inputs cannot predict the patterns for any common input, which leads to suboptimal code generation. To address this challenge, we develop Intelligent-Unroll, a framework to automatically optimize irregular applications with vectorization. Intelligent-Unroll allows the users to depict the computation task using \textit{code seed} with the memory access and computation patterns represented in \textit{feature table} and \textit{information-code tree}, and generates highly efficient codes. Furthermore, Intelligent-Unroll employs several novel optimization techniques to optimize reduction operations and gather/scatter instructions. We evaluate Intelligent-Unroll with sparse matrix-vector multiplication (SpMV) and graph applications. Experimental results show that Intelligent-Unroll is able to generate more efficient vectorization codes compared to the state-of-the-art implementations

    Impact of Vehicular Countdown Signals on Driving Psychologies and Behaviors: Taking China as an Example

    Get PDF
    Countdown signal control is a relatively new control mode that can inform a driver in advance about the remaining time to pass through intersections or the time needed to wait for other drivers and pedestrians. At present, few countries apply vehicular countdown signals. However, in China, some cities have applied vehicular countdown signals for years, though it is unclear how and how much such signals influence driving psychologies and behaviors compared with non-countdown signal controls. The present work aims to clarify the impact of vehicular countdown signals on driving psychologies and behaviors on the cognitive level. A questionnaire survey with 32 questions about driving psychologies and behaviors was designed, and an online survey was conducted. A total of 1051 valid questionnaires were received. The survey data were analyzed, and the main results indicate that most of the surveyed drivers prefer countdown signal controls and think that such controls can improve not only traffic safety but also traffic operational efficiency. The surveyed drivers also think that countdown signal controls have an impact on driving psychologies and behaviors and the survey results have demonstrated that the driving behaviors of female drivers surveyed are not conservative under the clear conditions of green countdown signal control. Further studies and methods concerning the effects of countdown signals on driving psychologies and behaviors are discussed. Document type: Articl
    • …
    corecore