29 research outputs found

    Effective Implementation of DGEMM on Modern Multicore CPU

    Get PDF
    AbstractIn this paper we will present a detailed study on tuning double-precision matrix-matrix multiplication (DGEMM) on the Intel Xeon E5-2680 CPU. We selected an optimal algorithm from the instruction set perspective as well software tools optimized for Intel Advance Vector Extensions (AVX). Our optimizations included the use of vector memory operations, and AVX instructions. Our proposed algorithm achieves a performance improvement of 33% compared to the latest results achieved using the Intel Math Kernel Library DGEMM subroutine

    Cache-oblivious algorithms

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 67-70).by Harald Prokop.S.M

    Literature Study on Analyzing and Designing of Algorithms

    Get PDF
    The fundamental goal of problem solution under numerous limitations such as those imposed by issue size performance and cost in terms of both space and time Designing a quick effective and efficient solution to a problem domain is the objective Certain problems are simple to resolve while others are challenging To develop a quick and effective answer much intelligence is needed A new technology is required for system design and the foundation of the new technology is the improvement of an already existing algorithm The goal of algorithm research is to create effective algorithms that improve scalability dependability and availability in addi

    An Investigation into the Performance Evaluation of Connected Vehicle Applications: From Real-World Experiment to Parallel Simulation Paradigm

    Get PDF
    A novel system was developed that provides drivers lane merge advisories, using vehicle trajectories obtained through Dedicated Short Range Communication (DSRC). It was successfully tested on a freeway using three vehicles, then targeted for further testing, via simulation. The failure of contemporary simulators to effectively model large, complex urban transportation networks then motivated further research into distributed and parallel traffic simulation. An architecture for a closed-loop, parallel simulator was devised, using a new algorithm that accounts for boundary nodes, traffic signals, intersections, road lengths, traffic density, and counts of lanes; it partitions a sample, Tennessee road network more efficiently than tools like METIS, which increase interprocess communications (IPC) overhead by partitioning more transportation corridors. The simulator uses logarithmic accumulation to synchronize parallel simulations, further reducing IPC. Analyses suggest this eliminates up to one-third of IPC overhead incurred by a linear accumulation model

    Unreliable and resource-constrained decoding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 185-213).Traditional information theory and communication theory assume that decoders are noiseless and operate without transient or permanent faults. Decoders are also traditionally assumed to be unconstrained in physical resources like material, memory, and energy. This thesis studies how constraining reliability and resources in the decoder limits the performance of communication systems. Five communication problems are investigated. Broadly speaking these are communication using decoders that are wiring cost-limited, that are memory-limited, that are noisy, that fail catastrophically, and that simultaneously harvest information and energy. For each of these problems, fundamental trade-offs between communication system performance and reliability or resource consumption are established. For decoding repetition codes using consensus decoding circuits, the optimal tradeoff between decoding speed and quadratic wiring cost is defined and established. Designing optimal circuits is shown to be NP-complete, but is carried out for small circuit size. The natural relaxation to the integer circuit design problem is shown to be a reverse convex program. Random circuit topologies are also investigated. Uncoded transmission is investigated when a population of heterogeneous sources must be categorized due to decoder memory constraints. Quantizers that are optimal for mean Bayes risk error, a novel fidelity criterion, are designed. Human decision making in segregated populations is also studied with this framework. The ratio between the costs of false alarms and missed detections is also shown to fundamentally affect the essential nature of discrimination. The effect of noise on iterative message-passing decoders for low-density parity check (LDPC) codes is studied. Concentration of decoding performance around its average is shown to hold. Density evolution equations for noisy decoders are derived. Decoding thresholds degrade smoothly as decoder noise increases, and in certain cases, arbitrarily small final error probability is achievable despite decoder noisiness. Precise information storage capacity results for reliable memory systems constructed from unreliable components are also provided. Limits to communicating over systems that fail at random times are established. Communication with arbitrarily small probability of error is not possible, but schemes that optimize transmission volume communicated at fixed maximum message error probabilities are determined. System state feedback is shown not to improve performance. For optimal communication with decoders that simultaneously harvest information and energy, a coding theorem that establishes the fundamental trade-off between the rates at which energy and reliable information can be transmitted over a single line is proven. The capacity-power function is computed for several channels; it is non-increasing and concave.by Lav R. Varshney.Ph.D

    Dynamic hashing technique for bandwidth reduction in image transmission

    Get PDF
    Hash functions are widely used in secure communication systems by generating the message digests for detection of unauthorized changes in the files. Encrypted hashed message or digital signature is used in many applications like authentication to ensure data integrity. It is almost impossible to ensure authentic messages when sending over large bandwidth in highly accessible network especially on insecure channels. Two issues that required to be addressed are the large size of hashed message and high bandwidth. A collaborative approach between encoded hash message and steganography provides a highly secure hidden data. The aim of the research is to propose a new method for producing a dynamic and smaller encoded hash message with reduced bandwidth. The encoded hash message is embedded into an image as a stego-image to avoid additional file and consequently the bandwidth is reduced. The receiver extracts the encoded hash and dynamic hashed message from the received file at the same time. If decoding encrypted hash by public key and hashed message from the original file matches the received file, it is considered as authentic. In enhancing the robustness of the hashed message, we compressed or encoded it or performed both operations before embedding the hashed data into the image. The proposed algorithm had achieved the lowest dynamic size (1 KB) with no fix length of the original file compared to MD5, SHA-1 and SHA-2 hash algorithms. The robustness of hashed message was tested against the substitution, replacement and collision attacks to check whether or not there is any detection of the same message in the output. The results show that the probability of the existence of the same hashed message in the output is closed to 0% compared to the MD5 and SHA algorithms. Amongst the benefits of this proposed algorithm is computational efficiency, and for messages with the sizes less than 1600 bytes, the hashed file reduced the original file up to 8.51%

    The exploitation of parallelism on shared memory multiprocessors

    Get PDF
    PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council
    corecore