592 research outputs found
Multiprocessor Out-of-Core FFTs with Distributed Memory and Parallel Disks
This paper extends an earlier out-of-core Fast Fourier Transform (FFT) method for a uniprocessor with the Parallel Disk Model (PDM) to use multiple processors. Four out-of-core multiprocessor methods are examined. Operationally, these methods differ in the size of mini-butterfly computed in memory and how the data are organized on the disks and in the distributed memory of the multiprocessor. The methods also perform differing amounts of I/O and communication. Two of them have the remarkable property that even though they are computing the FFT on a multiprocessor, all interprocessor communication occurs outside the mini-butterfly computations. Performance results on a small workstation cluster indicate that except for unusual combinations of problem size and memory size, the methods that do not perform interprocessor communication during the mini-butterfly computations require approximately 86% of the time of those that do. Moreover, the faster methods are much easier to implement
Determining an Out-of-Core FFT Decomposition Strategy for Parallel Disks by Dynamic Programming
We present an out-of-core FFT algorithm based on the in-core FFT method developed by Swarztrauber. Our algorithm uses a recursive divide-and-conquer strategy, and each stage in the recursion presents several possibilities for how to split the problem into subproblems. We give a recurrence for the algorithm\u27s I/O complexity on the Parallel Disk Model and show how to use dynamic programming to determine optimal splits at each recursive stage. The algorithm to determine the optimal splits takes only Theta(lg^2 N) time for an N-point FFT, and it is practical. The out-of-core FFT algorithm itself takes considerably longer
Wave modes of collective vortex gyration in dipolar-coupled-dot-array magnonic crystals
Lattice vibration modes are collective excitations in periodic arrays of atoms or molecules. These modes determine novel transport properties in solid crystals. Analogously, in periodical arrangements of magnetic vortex-state disks, collective vortex motions have been predicted. Here, we experimentally observe wave modes of collective vortex gyration in one-dimensional (1D) periodic arrays of magnetic disks using time-resolved scanning transmission x-ray microscopy. The observed modes are interpreted based on micromagnetic simulation and numerical calculation of coupled Thiele equations. Dispersion of the modes is found to be strongly affected by both vortex polarization and chirality ordering, as revealed by the explicit analytical form of 1D infinite arrays. A thorough understanding thereof is fundamental both for lattice vibrations and vortex dynamics, which we demonstrate for 1D magnonic crystals. Such magnetic disk arrays with vortex-state ordering, referred to as magnetic metastructure, offer potential implementation into information processing devices.open8
Optimizing the Dimensional Method for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs
We present an improved version of the Dimensional Method for computing multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system when the data consist of too many records to fit into memory. Data are spread across parallel disks and processed in sections. We use the Parallel Disk Model for analysis. The simple Dimensional Method performs the 1-dimensional FFTs for each dimension in term. Between each dimension, an out-of-core permutation is used to rearrange the data to contiguous locations. The improved Dimensional Method processes multiple dimensions at a time. We show that determining an optimal sequence and groupings of dimensions is NP-complete. We then analyze the effects of two modifications to the Dimensional Method independently: processing multiple dimensions at one time, and processing single dimensions in a different order. Finally, we show a lower bound on the I/O complexity of the Dimensional Method and present an algorithm that is approximately asymptotically optimal
Out-of-Core Hydrodynamic Simulations for Cosmological Applications
We present an out-of-core hydrodynamic code for high resolution cosmological
simulations that require terabytes of memory. Out-of-core computation refers to
the technique of using disk space as virtual memory and transferring data in
and out of main memory at high I/O bandwidth. The code is based on a two-level
mesh scheme where short-range physics is solved on a high-resolution, localized
mesh while long-range physics is captured on a lower resolution, global mesh.
The two-level mesh gravity solver allows FFTs to operate on data stored
entirely in memory, which is much faster than the alternative of computing the
transforms out-of-core through non-sequential disk accesses. We also describe
an out-of-core initial conditions generator that is used to prepare large data
sets for cosmological simulations. The out-of-core code is accurate,
cost-effective, and memory-efficient and the current version is implemented to
run in parallel on shared-memory machines. I/O overhead is significantly
reduced down to less than 10% by performing disk operations concurrently with
numerical calculations. The current computational setup, which includes a 32
processor Alpha server and a 3 TB striped SCSI disk array, allows us to run
cosmological simulations with up to 4000^3 grid cells and 2000^3 dark matter
particles.Comment: 19 pages, 10 figures; accepted by New Astronom
ERS-1 SAR data processing
To take full advantage of the synthetic aperature radar (SAR) to be flown on board the European Space Agency's Remote Sensing Satellite (ERS-1) (1989) and the Canadian Radarsat (1990), the implementation of a receiving station in Alaska is being studied to gather and process SAR data pertaining in particular to regions within the station's range of reception. The current SAR data processing requirement is estimated to be on the order of 5 minutes per day. The Interim Digital Sar Processor (IDP) which was under continual development through Seasat (1978) and SIR-B (1984) can process slightly more than 2 minutes of ERS-1 data per day. On the other hand, the Advanced Digital SAR Processore (ADSP), currently under development for the Shuttle Imaging Radar C (SIR-C, 1988) and the Venus Radar Mapper, (VMR, 1988), is capable of processing ERS-1 SAR data at a real time rate. To better suit the anticipated ERS-1 SAR data processing requirement, both a modified IDP and an ADSP derivative are being examined. For the modified IDP, a pipelined architecture is proposed for the mini-computer plus array processor arrangement to improve throughout. For the ADSP derivative, a simplified version is proposed to enhance ease of implementation and maintainability while maintaing real time throughput rates. These processing systems are discussed and evaluated
μμ± λμ€ν¬ λ°°μ΄ λ΄ κ²°ν©λ μκΈ° μμ©λμ΄μ λμ κ±°λ μ°κ΅¬
νμλ
Όλ¬Έ(λ°μ¬)--μμΈλνκ΅ λνμ :곡과λν μ¬λ£κ³΅νλΆ,2020. 2. κΉμκ΅.μκΈ° μμ©λμ΄λ μ λ§μ΄ν¬λ‘λ―Έν° ν¬κΈ° νΉμ κ·Έ μ΄νμ κ°μμ± κ΅¬μ‘°μ²΄μμ μμ μ μΌλ‘ νμ±λλ νΉμ΄ν λ°°μ΄ κ΅¬μ‘°λ₯Ό λ§νλ€. μκΈ° μμ©λμ΄λ λ°λ§λ©΄μ μμ§ν μμ λλ
Έλ―Έν° ν¬κΈ°μ μκΈ° μμ©λμ΄ ν΅κ³Ό, κ·Έ μ£Όμμ νλ©΄ λ΄ νμ νλ λͺ¨μμΌλ‘ λ°°μ΄λ μ€νλ€λ‘ ꡬμ±λλ€. μκΈ° μμ©λμ΄μ μΈλΆ μκΈ°μ₯ νΉμ μ λ₯ λ±μ μΈκ°νλ©΄ μκΈ° μμ©λμ΄ ν΅μ΄ νμ μ΄λμ νλ μ±μ§μ΄ μλ€. μ΄λ¬ν μκΈ° μμ©λμ΄λ ν΅μ λ κ°μ§ μνλ°©ν₯κ³Ό μ£Όλ³μ λ°°μ΄λ μ€νλ€μ λ κ°μ§ νμ λ°©ν₯μ μ‘°ν©μΌλ‘ λ€ κ°μ λμΌν κΈ°μ μλμ§ μ€μλ₯Ό κ°μ§ μ μκ³ , μ΄μ μΌλ‘ λ§€μ° μμ νκΈ° λλ¬Έμ λΉνλ°μ± μ 보μ μ₯ μμλ‘ μμ© κ°λ₯νλ€. λν μ¬λ¬ κ°μ κ²°ν©λ μκΈ° μμ©λμ΄ μ¬μ΄μμ λνλλ μκΈ° μμ©λμ΄ ν΅μ μ§λ¨μ νμ μ΄λμ μλ‘μ΄ μ νΈμ λ¬μ 맀κ°μ²΄λ‘ μ΄μ©λ μ μμ΄ μ 보μ²λ¦¬ μμλ‘μ μμ©μ±μ λν μ°κ΅¬κ° μ§νλμ΄μλ€.
λ³Έ νμ λ
Όλ¬Έμμλ λ―ΈμμκΈ° μ μ°λͺ¨μ¬ λ° μ€νμ μ΄μ©νμ¬ μκΈ° μμ©λμ΄μ λμ κ±°λκ³Ό μκΈ° μμ©λμ΄ κ°μ λμ μνΈμμ© μ°κ΅¬μ μ΄μ μ λκ³ μλ€. μκΈ° λμ€ν¬ λ°°μ΄μμ μκΈ° μμ©λμ΄ κ²°ν© λͺ¨λ, μκΈ° μμ©λμ΄ ν΅ λ°μ λ°©λ² λ° μκΈ° μμ©λμ΄ ν΅μ νμ μ΄λ μ νΈ μ λ¬μ μ μ΄μ κ΄ν μ°κ΅¬κ° μ£Ό λ΄μ©μ΄λ€. μ΄λ¬ν μκΈ° μμ©λμ΄μ λμ κ±°λ μ μ΄ λ°©λ²μ μ΄μ©ν΄ μλ‘μ΄ κ°λ
μ RS λμΉ λ
Όλ¦¬ μμ, μλΆν λ° μ£Όνμ λΆν λλ©ν°νλ μ μμλ₯Ό μ μνκ³ κ·Έ λμ νΉμ±μ μ°κ΅¬νμλ€. μκΈ° μμ©λμ΄λ₯Ό μ΄μ©ν μμλ€μ λΉνλ°μ±μ΄λ©°, κ±°μ 무μ νμ μλͺ
μ κ°μ§κ³ , μλμ§κ° μ κ² λλ λ± λ§μ μ₯μ μ κ°μ§κ³ μλ€. λν μκΈ° μμ©λμ΄λ κ·Έ νΉμ±μ μ μ΄κ° λ§€μ° μ©μ΄ν΄μ ν₯ν κ°λ°λ μ€ννΈλ‘λμ€ μμλ‘ μμ©λ μ μλ κ°λ₯μ±μ κ°μ§κ³ μλ€. λ³Έ μ°κ΅¬ κ²°κ³Όλ μ°¨μΈλ μ€ννΈλ‘λμ€ κΈ°μ λ‘μ μκΈ° μμ©λμ΄μ κΈ°λ°ν λ
Όλ¦¬ μμ λ° μ 보 μ²λ¦¬ μ₯μΉμ ꡬν κ°λ₯μ±μ 보μ¬μ€λ€.In the sub-micrometer-size ferromagnetic structure, the magnetic vortex is in a strongly stable ground state characterized by an in-plane curling magnetization around and an out-of-plane magnetization in the central region. The magnetic vortex is characterized by clockwise (CW) or counter-clockwise (CCW) curling in-plane magnetizations around a single vortex core in which region magnetizations are perpendicularly oriented either upward or downward. In isolated disks, applied external forces induce vortex excitations, among which a translational mode exists in which the vortex core gyrates around its equilibrium position at a characteristic eigenfrequency. Vortex-core switching can be accomplished with low power consumption when vortex gyrations are resonantly excited. Moreover, the gyration modes of individual vortex cores in a periodic array of patterned vortex-state disks are coupled with each other, thus yielding collectively coupled motions of the individual cores. On the basis of such novel dynamic characteristics, non-volatile memory and information processing devices using magnetic vortex have been proposed.
This work focused on dynamic interaction between vortex-state ferromagnetic structures and its applications, utilizing micromagnetic simulations, analytical calculations, and experiments. The dynamic behaviors of vortex-gyration-coupled modes, vortex-core switching, and propagation of vortex-core gyration signal in magnetic-disk-network devices are investigated. Based on the combinations of the novel dynamic characteristics of vortices in dipolar-coupled disks, a new concept RS latch logic, time- and frequency-division demultiplexer device operations are explored. Magnetic vortex has many advantages such as non-volatility, almost unlimited endurance, and low power operation. Furthermore, a rich tunability of magnetic vortices makes them adoptable as future spintronics devices. This work can pave the way for possible implementation of logic gates and information processing devices based on coupled magnetic vortices.1. Introduction 1
2. Research Background 5
2.1. Magnetization dynamics and micromagnetics 5
2.1.1. Landau-Lifshitz-Gilbert equation 5
2.1.2. Effective fields in the LLG equation 8
2.2. Vortices in magnetic microstructures and their dynamics 10
2.2.1. Vortex core gyration 15
2.2.2. Vortex core switching 18
2.2.3. Interaction between magnetic vortices 18
2.3. Experimental methods 20
2.3.1. Photo lithography 20
2.3.2. Electron beam lithography 20
2.3.3. Anisotropic magneto resistance in vortex 21
3. Vortex Core Switching by Propagation of a Gyration-Coupled Mode 23
3.1. Micromagnetic simulation conditions 23
3.2. Coupled modes of gyration for the two types of vortex-state configurations 26
3.3. Concept design of reset-set latch device 32
3.4. Magnitude of oscillating magnetic field and radius of disks dependent switching behavior 36
3.5. Reset-set latch logic operation 39
4. Control of Gyration Signal Propagation in Coupled Magnetic Vortices 43
4.1. Dynamics of the single and coupled disk array 43
4.2. Control of gyration signal propagation by in-plane bias field 50
4.3. Control of gyration signal propagation by vortex core switching 53
4.4. Concept design of time-division demultiplexer device and its operation 60
4.5. Concept design of frequency-division demultiplexer device and its operation 65
5. Electrical Measurement of the Gyrotropic Resonance of a Magnetic Vortex in Circular and Chopped Disks. 68
5.1. Sample fabrication 68
5.2. DC AMR measurement 73
5.3. AC AMR measurement by rectification technique 78
6. Summary 88
Bibliography 90
Publication List 100
Patent List 102
Presentations in Conferences 103Docto
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
- β¦