730 research outputs found
On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution
Technology trends are making the cost of data movement increasingly dominant,
both in terms of energy and time, over the cost of performing arithmetic
operations in computer systems. The fundamental ratio of aggregate data
movement bandwidth to the total computational power (also referred to the
machine balance parameter) in parallel computer systems is decreasing. It is
there- fore of considerable importance to characterize the inherent data
movement requirements of parallel algorithms, so that the minimal architectural
balance parameters required to support it on future systems can be well
understood. In this paper, we develop an extension of the well-known red-blue
pebble game to develop lower bounds on the data movement complexity for the
parallel execution of computational directed acyclic graphs (CDAGs) on parallel
systems. We model multi-node multi-core parallel systems, with the total
physical memory distributed across the nodes (that are connected through some
interconnection network) and in a multi-level shared cache hierarchy for
processors within a node. We also develop new techniques for lower bound
characterization of non-homogeneous CDAGs. We demonstrate the use of the
methodology by analyzing the CDAGs of several numerical algorithms, to develop
lower bounds on data movement for their parallel execution
Evaluating Multicore Algorithms on the Unified Memory Model
One of the challenges to achieving good performance on multicore architectures is the effective utilization of the underlying memory hierarchy. While this is an issue for single-core architectures, it is a critical problem for multicore chips. In this paper, we formulate the unified multicore model (UMM) to help understand the fundamental limits on cache performance on these architectures. The UMM seamlessly handles different types of multiple-core processors with varying degrees of cache sharing at different levels. We demonstrate that our model can be used to study a variety of multicore architectures on a variety of applications. In particular, we use it to analyze an option pricing problem using the trinomial model and develop an algorithm for it that has near-optimal memory traffic between cache levels. We have implemented the algorithm on a two Quad-Core Intel Xeon 5310 1.6 GHz processors (8 cores). It achieves a peak performance of 19.5 GFLOPs, which is 38% of the theoretical peak of the multicore system. We demonstrate that our algorithm outperforms compiler-optimized and auto-parallelized code by a factor of up to 7.5
Evaluating Multicore Algorithms on the Unified Memory Model
One of the challenges to achieving good performance on multicore architectures is the effective utilization of the underlying memory hierarchy. While this is an issue for single-core architectures, it is a critical problem for multicore chips. In this paper, we formulate the unified multicore model (UMM) to help understand the fundamental limits on cache performance on these architectures. The UMM seamlessly handles different types of multiple-core processors with varying degrees of cache sharing at different levels. We demonstrate that our model can be used to study a variety of multicore architectures on a variety of applications. In particular, we use it to analyze an option pricing problem using the trinomial model and develop an algorithm for it that has near-optimal memory traffic between cache levels. We have implemented the algorithm on a two Quad-Core Intel Xeon 5310 1.6 GHz processors (8 cores). It achieves a peak performance of 19.5 GFLOPs, which is 38% of the theoretical peak of the multicore system. We demonstrate that our algorithm outperforms compiler-optimized and auto-parallelized code by a factor of up to 7.5
A Lower Bound Technique for Communication in BSP
Communication is a major factor determining the performance of algorithms on
current computing systems; it is therefore valuable to provide tight lower
bounds on the communication complexity of computations. This paper presents a
lower bound technique for the communication complexity in the bulk-synchronous
parallel (BSP) model of a given class of DAG computations. The derived bound is
expressed in terms of the switching potential of a DAG, that is, the number of
permutations that the DAG can realize when viewed as a switching network. The
proposed technique yields tight lower bounds for the fast Fourier transform
(FFT), and for any sorting and permutation network. A stronger bound is also
derived for the periodic balanced sorting network, by applying this technique
to suitable subnetworks. Finally, we demonstrate that the switching potential
captures communication requirements even in computational models different from
BSP, such as the I/O model and the LPRAM
On Characterizing the Data Access Complexity of Programs
Technology trends will cause data movement to account for the majority of
energy expenditure and execution time on emerging computers. Therefore,
computational complexity will no longer be a sufficient metric for comparing
algorithms, and a fundamental characterization of data access complexity will
be increasingly important. The problem of developing lower bounds for data
access complexity has been modeled using the formalism of Hong & Kung's
red/blue pebble game for computational directed acyclic graphs (CDAGs).
However, previously developed approaches to lower bounds analysis for the
red/blue pebble game are very limited in effectiveness when applied to CDAGs of
real programs, with computations comprised of multiple sub-computations with
differing DAG structure. We address this problem by developing an approach for
effectively composing lower bounds based on graph decomposition. We also
develop a static analysis algorithm to derive the asymptotic data-access lower
bounds of programs, as a function of the problem size and cache size
Sparse multinomial kernel discriminant analysis (sMKDA)
Dimensionality reduction via canonical variate analysis (CVA) is important for pattern recognition and has been extended variously to permit more flexibility, e.g. by "kernelizing" the formulation. This can lead to over-fitting, usually ameliorated by regularization. Here, a method for sparse, multinomial kernel discriminant analysis (sMKDA) is proposed, using a sparse basis to control complexity. It is based on the connection between CVA and least-squares, and uses forward selection via orthogonal least-squares to approximate a basis, generalizing a similar approach for binomial problems. Classification can be performed directly via minimum Mahalanobis distance in the canonical variates. sMKDA achieves state-of-the-art performance in terms of accuracy and sparseness on 11 benchmark datasets
Limites práticos de segurança da distribuição de chaves quânticas de variáveis contínuas
Discrete Modulation Continuous Variable Quantum Key Distribution (DM-CV-QKD) systems are very attractive for modern quantum cryptography, since they
manage to surpass all Gaussian modulation (GM) system’s disadvantages while
maintaining the advantages of using CVs. Nonetheless, DM-CV-QKD is still underdeveloped, with a very limited study of large constellations. This work intends to
increase the knowledge on DM-CV-QKD systems considering large constellations,
namely M-symbol Amplitude Phase Shift Keying (M-APSK) irregular and regular
constellations. As such, a complete DM-CV-QKD system was implemented, con sidering collective attacks and reverse reconciliation under the realistic scenario,
assuming Bob detains the knowledge of his detector’s noise. Tight security bounds
were obtained considering M-APSK constellations and GM, both for the mutual
information between Bob and Alice and the Holevo bound between Bob and Eve.
M-APSK constellations with binomial distribution can approximate GM’s results
for the secret key rate. Without the consideration of the finite size effects (FSEs),
the regular constellation 256-APSK (reg. 32) with binomial distribution achieves
242.9 km, only less 7.2 km than GM for a secret key rate of 10¯⁶ photons per symbol. Considering FSEs, 256-APSK (reg. 32) achieves 96.4% of GM’s maximum
transmission distance (2.3 times more than 4-PSK), and 78.4% of GM’s maximum compatible excess noise (10.2 times more than 4-PSK). Additionally, larger
constellations allow the use of higher values of modulation variance in a practical
implementation, i.e., we are no longer subjected to the sub-one limit for the mean
number of photons per symbol. The information reconciliation step considering a
binary symmetric channel, the sum-product algorithm and multi-edge type low den sity parity check matrices, constructed from the progressive edge growth algorithm,
allowed the correction of keys up to 18 km. The consideration of multidimensional
reconciliation allows 256-APSK (reg. 32) to reconcile keys up to 55 km. Privacy
amplification was carried out considering the application of fast Fourier transforms
to the Toeplitz extractor, being unable of extracting keys for more than, approximately, 49 km, almost haft the theoretical value, and for excess noises larger than
0.16 SNU, like the theoretical value.Os sistemas de distribuição de chaves quânticas com variáveis contínuas e modulação discreta (DM-CV-QKD) são muito atrativos para a criptografia quântica
moderna, pois conseguem superar todas as desvantagens do sistema com modulação Gaussiana (GM) enquanto mantêm as vantagens do uso de CVs. No entanto,
DM-CV-QKD ainda está subdesenvolvida, sendo o estudo de grandes constelações muito reduzido. Este trabalho pretende aumentar o conhecimento sobre os
sistemas DM-CV-QKD com constelações grandes, nomeadamente as do tipo M-symbol Amplitude Phase Shift Keying (M-APSK) irregulares e regulares. Com isto,
foi simulado um sistema DM-CV-QKD completo, considerando ataques coletivos e
reconciliação reversa tendo em conta o cenário realista, assumindo que o Bob co nhece o ruído de seu detetor. Os limites de segurança foram obtidos considerando
constelações M-APSK e GM, tanto para a informação mútua entre o Bob e a Alice,
quanto para o limite de Holevo entre o Bob e a Eve. As constelações M-APSK com
distribuição binomial aproximam-se à GM quanto à taxa de chave secreta. Sem
considerar o efeito de tamanho finito (FSE), a constelação regular 256-APSK (reg.
32) com distribuição binomial atinge 242.9 km, apenas menos 7.2 km do que GM
para uma taxa de chave secreta de 10¯⁶
fotões por símbolo. Considerando FSEs,
a 256-APSK (reg. 32) atinge 96.4% da distância máxima de transmissão para
GM (2.3 vezes mais que a 4-PSK), e 78.4% do valor máximo de excesso de ruído
compatível para GM (10.2 vezes mais do que a 4-PSK). Adicionalmente, grandes
constelações permitem o uso de valores mais altos de variância de modulação em
implementações práticas, pelo que deixa de ser necessário um número de fotões
por símbolo abaixo de um. A etapa de reconciliação de informação considerou um
canal binário simétrico, o algoritmo soma-produto e matrizes multi-edge type low
density parity check, construídas a partir do algoritmo progressive edge growth,
permitindo a correção de chaves até 18 km. A consideração de reconciliação multidimensional permite que a 256-APSK (reg. 32) reconcilie chaves até 55 km. A
amplificação de privacidade foi realizada considerando a aplicação de transformadas de Fourier rápidas ao extrator de Toeplitz, mostrando-se incapaz de extrair
chaves para mais de, aproximadamente, 49 km, quase metade do valor teórico, e
para excesso de ruído superior a 0.16 SNU, semelhante ao valor teórico.Mestrado em Engenharia Físic
- …