47,744 research outputs found
Architectural support for task dependence management with flexible software scheduling
The growing complexity of multi-core architectures has motivated a wide range of software mechanisms to improve the orchestration of parallel executions. Task parallelism has become a very attractive approach thanks to its programmability, portability and potential for optimizations. However, with the expected increase in core counts, finer-grained tasking will be required to exploit the available parallelism, which will increase the overheads introduced by the runtime system. This work presents Task Dependence Manager (TDM), a hardware/software co-designed mechanism to mitigate runtime system overheads. TDM introduces a hardware unit, denoted Dependence Management Unit (DMU), and minimal ISA extensions that allow the runtime system to offload costly dependence tracking operations to the DMU and to still perform task scheduling in software. With lower hardware cost, TDM outperforms hardware-based solutions and enhances the flexibility, adaptability and composability of the system. Results show that TDM improves performance by 12.3% and reduces EDP by 20.4% on average with respect to a software runtime system. Compared to a runtime system fully implemented in hardware, TDM achieves an average speedup of 4.2% with 7.3x less area requirements and significant EDP reductions. In addition, five different software schedulers are evaluated with TDM, illustrating its flexibility and performance gains.This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and
Innovation (contracts TIN2015-65316-P, TIN2016-76635-C2-2-R and TIN2016-81840-REDT), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), and by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671697 and No. 671610. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047.Peer ReviewedPostprint (author's final draft
Phase synchronization of instrumental music signals
Signal analysis is one of the finest scientific techniques in communication
theory. Some quantitative and qualitative measures describe the pattern of a
music signal, vary from one to another. Same musical recital, when played by
different instrumentalists, generates different types of music patterns. The
reason behind various patterns is the psychoacoustic measures - Dynamics,
Timber, Tonality and Rhythm, varies in each time. However, the psycho-acoustic
study of the music signals does not reveal any idea about the similarity
between the signals. For such cases, study of synchronization of long-term
nonlinear dynamics may provide effective results. In this context, phase
synchronization (PS) is one of the measures to show synchronization between two
non-identical signals. In fact, it is very critical to investigate any other
kind of synchronization for experimental condition, because those are
completely non identical signals. Also, there exists equivalence between the
phases and the distances of the diagonal line in Recurrence plot (RP) of the
signals, which is quantifiable by the recurrence quantification measure
tau-recurrence rate. This paper considers two nonlinear music signals based on
same raga played by two eminent sitar instrumentalists as two non-identical
sources. The psycho-acoustic study shows how the Dynamics, Timber, Tonality and
Rhythm vary for the two music signals. Then, long term analysis in the form of
phase space reconstruction is performed, which reveals the chaotic phase spaces
for both the signals. From the RP of both the phase spaces, tau-recurrence rate
is calculated. Finally by the correlation of normalized tau-recurrence rate of
their 3D phase spaces and the PS of the two music signals has been established.
The numerical results well support the analysis
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems
Two emerging hardware trends will dominate the database system technology in
the near future: increasing main memory capacities of several TB per server and
massively parallel multi-core processing. Many algorithmic and control
techniques in current database technology were devised for disk-based systems
where I/O dominated the performance. In this work we take a new look at the
well-known sort-merge join which, so far, has not been in the focus of research
in scalable massively parallel multi-core data processing as it was deemed
inferior to hash joins. We devise a suite of new massively parallel sort-merge
(MPSM) join algorithms that are based on partial partition-based sorting.
Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a
hard to parallelize final merge step to create one complete sort order. Rather
they work on the independently created runs in parallel. This way our MPSM
algorithms are NUMA-affine as all the sorting is carried out on local memory
partitions. An extensive experimental evaluation on a modern 32-core machine
with one TB of main memory proves the competitive performance of MPSM on large
main memory databases with billions of objects. It scales (almost) linearly in
the number of employed cores and clearly outperforms competing hash join
proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel
query engine by a factor of four.Comment: VLDB201
SKIRT: hybrid parallelization of radiative transfer simulations
We describe the design, implementation and performance of the new hybrid
parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which
has been used extensively for modeling the continuum radiation of dusty
astrophysical systems including late-type galaxies and dusty tori. The hybrid
scheme combines distributed memory parallelization, using the standard Message
Passing Interface (MPI) to communicate between processes, and shared memory
parallelization, providing multiple execution threads within each process to
avoid duplication of data structures. The synchronization between multiple
threads is accomplished through atomic operations without high-level locking
(also called lock-free programming). This improves the scaling behavior of the
code and substantially simplifies the implementation of the hybrid scheme. The
result is an extremely flexible solution that adjusts to the number of
available nodes, processors and memory, and consequently performs well on a
wide variety of computing architectures.Comment: 21 pages, 20 figure
New Concept of PLC Modems: Multi-Carrier System for Frequency Selective Slow-Fading Channels Based on Layered SCCC Turbocodes
The article introduces a novel concept of a PLC modem as a complement to the existing G3 and PRIME standards for communications using medium- or high-voltage overhead or cable lines. The proposed concept is based on the fact that the levels of impulse noise and frequency selectivity are lower on high-voltage lines than on low-voltage ones. Also, the demands for “cost-effective” circuitry design are not so crucial as in the case of modems for low-voltage level. In contract to these positive conditions, however, there is the need to overcome much longer distances and to take into account low SNR on the receiving side. With respect to the listed reasons, our concept makes use of MCM, instead of OFDM. The assumption of low SNR is compensated through the use of an efficient channel coding based on a serially concatenated turbo code. In addition, MCM offers lower latency and PAPR compared to OFDM. Therefore, when using MCM, it is possible to excite the line with higher power. The proposed concept has been verified during experimental transmission of testing data over a real, 5 km long, 22kV overhead line
Method and apparatus for a single channel digital communications system
A method and apparatus are described for synchronizing a received PCM communications signal without requiring a separate synchronizing channel. The technique provides digital correlation of the received signal with a reference signal, first with its unmodulated subcarrier and then with a bit sync code modulated subcarrier, where the code sequence length is equal in duration to each data bit
- …