Search CORE

29 research outputs found

Effective Implementation of DGEMM on Modern Multicore CPU

Author: Gepner Pawel
Gamayunov Victor
Fraser David L.
Publication venue: Published by Elsevier B.V.
Publication date: 10/02/1975
Field of study

AbstractIn this paper we will present a detailed study on tuning double-precision matrix-matrix multiplication (DGEMM) on the Intel Xeon E5-2680 CPU. We selected an optimal algorithm from the instruction set perspective as well software tools optimized for Intel Advance Vector Extensions (AVX). Our optimizations included the use of vector memory operations, and AVX instructions. Our proposed algorithm achieves a performance improvement of 33% compared to the latest results achieved using the Intel Math Kernel Library DGEMM subroutine

Elsevier - Publisher Connector

Crossref

Repositori Obert de Coneixement de l'Ajuntament de Barcelona

Algorithme innovant pour le traitement parallèle basé sur l'indépendance des tâches et la décomposition des données

Author: Abu Azab Hussam Hussein
Publication venue
Publication date: 01/01/2017
Field of study

Dépôt numérique de UQTR

Cache-oblivious algorithms

Author: Prokop Harald, 1975-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 67-70).by Harald Prokop.S.M

CiteSeerX

DSpace@MIT

Literature Study on Analyzing and Designing of Algorithms

Author: Aishwarya
Sneha Kumari
Publication venue: Global Journals Inc. (US)
Publication date: 28/10/2023
Field of study

The fundamental goal of problem solution under numerous limitations such as those imposed by issue size performance and cost in terms of both space and time Designing a quick effective and efficient solution to a problem domain is the objective Certain problems are simple to resolve while others are challenging To develop a quick and effective answer much intelligence is needed A new technology is required for system design and the foundation of the new technology is the improvement of an already existing algorithm The goal of algorithm research is to create effective algorithms that improve scalability dependability and availability in addi

Global Journal of Computer Science and Technology (GJCST)

TR-2002002: Can We Optimize Toeplitz/Hankel Computations? II. Singular Toeplitz/Hankel-like Case

Author: Pan V. Y.
Publication venue: CUNY Academic Works
Publication date: 01/01/2002
Field of study

City University of New York

An Investigation into the Performance Evaluation of Connected Vehicle Applications: From Real-World Experiment to Parallel Simulation Paradigm

Author: Ahmed Md Salman
Publication venue: Digital Commons @ East Tennessee State University
Publication date: 01/05/2017
Field of study

A novel system was developed that provides drivers lane merge advisories, using vehicle trajectories obtained through Dedicated Short Range Communication (DSRC). It was successfully tested on a freeway using three vehicles, then targeted for further testing, via simulation. The failure of contemporary simulators to effectively model large, complex urban transportation networks then motivated further research into distributed and parallel traffic simulation. An architecture for a closed-loop, parallel simulator was devised, using a new algorithm that accounts for boundary nodes, traffic signals, intersections, road lengths, traffic density, and counts of lanes; it partitions a sample, Tennessee road network more efficiently than tools like METIS, which increase interprocess communications (IPC) overhead by partitioning more transportation corridors. The simulator uses logarithmic accumulation to synchronize parallel simulations, further reducing IPC. Analyses suggest this eliminates up to one-third of IPC overhead incurred by a linear accumulation model

East Tennessee State University

Unreliable and resource-constrained decoding

Author: Varshney Lav R. (Lav Raj)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 185-213).Traditional information theory and communication theory assume that decoders are noiseless and operate without transient or permanent faults. Decoders are also traditionally assumed to be unconstrained in physical resources like material, memory, and energy. This thesis studies how constraining reliability and resources in the decoder limits the performance of communication systems. Five communication problems are investigated. Broadly speaking these are communication using decoders that are wiring cost-limited, that are memory-limited, that are noisy, that fail catastrophically, and that simultaneously harvest information and energy. For each of these problems, fundamental trade-offs between communication system performance and reliability or resource consumption are established. For decoding repetition codes using consensus decoding circuits, the optimal tradeoff between decoding speed and quadratic wiring cost is defined and established. Designing optimal circuits is shown to be NP-complete, but is carried out for small circuit size. The natural relaxation to the integer circuit design problem is shown to be a reverse convex program. Random circuit topologies are also investigated. Uncoded transmission is investigated when a population of heterogeneous sources must be categorized due to decoder memory constraints. Quantizers that are optimal for mean Bayes risk error, a novel fidelity criterion, are designed. Human decision making in segregated populations is also studied with this framework. The ratio between the costs of false alarms and missed detections is also shown to fundamentally affect the essential nature of discrimination. The effect of noise on iterative message-passing decoders for low-density parity check (LDPC) codes is studied. Concentration of decoding performance around its average is shown to hold. Density evolution equations for noisy decoders are derived. Decoding thresholds degrade smoothly as decoder noise increases, and in certain cases, arbitrarily small final error probability is achievable despite decoder noisiness. Precise information storage capacity results for reliable memory systems constructed from unreliable components are also provided. Limits to communicating over systems that fail at random times are established. Communication with arbitrarily small probability of error is not possible, but schemes that optimize transmission volume communicated at fixed maximum message error probabilities are determined. System state feedback is shown not to improve performance. For optimal communication with decoders that simultaneously harvest information and energy, a coding theorem that establishes the fundamental trade-off between the rates at which energy and reliable information can be transmitted over a single line is proven. The capacity-power function is computed for several channels; it is non-increasing and concave.by Lav R. Varshney.Ph.D

DSpace@MIT

Dynamic hashing technique for bandwidth reduction in image transmission

Author: Noroozi Erfaneh
Publication venue
Publication date: 01/05/2015
Field of study

Hash functions are widely used in secure communication systems by generating the message digests for detection of unauthorized changes in the files. Encrypted hashed message or digital signature is used in many applications like authentication to ensure data integrity. It is almost impossible to ensure authentic messages when sending over large bandwidth in highly accessible network especially on insecure channels. Two issues that required to be addressed are the large size of hashed message and high bandwidth. A collaborative approach between encoded hash message and steganography provides a highly secure hidden data. The aim of the research is to propose a new method for producing a dynamic and smaller encoded hash message with reduced bandwidth. The encoded hash message is embedded into an image as a stego-image to avoid additional file and consequently the bandwidth is reduced. The receiver extracts the encoded hash and dynamic hashed message from the received file at the same time. If decoding encrypted hash by public key and hashed message from the original file matches the received file, it is considered as authentic. In enhancing the robustness of the hashed message, we compressed or encoded it or performed both operations before embedding the hashed data into the image. The proposed algorithm had achieved the lowest dynamic size (1 KB) with no fix length of the original file compared to MD5, SHA-1 and SHA-2 hash algorithms. The robustness of hashed message was tested against the substitution, replacement and collision attacks to check whether or not there is any detection of the same message in the output. The results show that the probability of the existence of the same hashed message in the output is closed to 0% compared to the MD5 and SHA algorithms. Amongst the benefits of this proposed algorithm is computational efficiency, and for messages with the sizes less than 1600 bytes, the hashed file reduced the original file up to 8.51%

Universiti Teknologi Malaysia Institutional Repository

The exploitation of parallelism on shared memory multiprocessors

Author: Stoker Michael Allan
Publication venue: Newcastle University
Publication date: 01/01/1990
Field of study

PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

Newcastle University eTheses

MASSIVELY PARALLEL OIL RESERVOIR SIMULATION FOR HISTORY MATCHING

Author
Publication venue
Publication date
Field of study

KFUPM ePrints