60 research outputs found

    Real-time sound synthesis on a multi-processor platform

    Get PDF
    Real-time sound synthesis means that the calculation and output of each sound sample for a channel of audio information must be completed within a sample period. At a broadcasting standard, a sampling rate of 32,000 Hz, the maximum period available is 31.25 ÎŒsec. Such requirements demand a large amount of data processing power. An effective solution for this problem is a multi-processor platform; a parallel and distributed processing system. The suitability of the MIDI [Music Instrument Digital Interface] standard, published in 1983, as a controller for real-time applications is examined. Many musicians have expressed doubts on the decade old standard's ability for real-time performance. These have been investigated by measuring timing in various musical gestures, and by comparing these with the subjective characteristics of human perception. An implementation and its optimisation of real-time additive synthesis programs on a multi-transputer network are described. A prototype 81-polyphonic-note- organ configuration was implemented. By devising and deploying monitoring processes, the network's performance was measured and enhanced, leading to an efficient usage; the 88-note configuration. Since 88 simultaneous notes are rarely necessary in most performances, a scheduling program for dynamic note allocation was then introduced to achieve further efficiency gains. Considering calculation redundancies still further, a multi-sampling rate approach was applied as a further step to achieve an optimal performance. The theories underlining sound granulation, as a means of constructing complex sounds from grains, and the real-time implementation of this technique are outlined. The idea of sound granulation is quite similar to the quantum-wave theory, "acoustic quanta". Despite the conceptual simplicity, the signal processing requirements set tough demands, providing a challenge for this audio synthesis engine. Three issues arising from the results of the implementations above are discussed; the efficiency of the applications implemented, provisions for new processors and an optimal network architecture for sound synthesis

    Computational Modeling of Biological Neural Networks on GPUs: Strategies and Performance

    Get PDF
    Simulating biological neural networks is an important task for computational neuroscientists attempting to model and analyze brain activity and function. As these networks become larger and more complex, the computational power required grows significantly, often requiring the use of supercomputers or compute clusters. An emerging low-cost, highly accessible alternative to many of these resources is the Graphics Processing Unit (GPU) - specialized massively-parallel graphics hardware that has seen increasing use as a general purpose computational accelerator thanks largely due to NVIDIA\u27s CUDA programming interface. We evaluated the relative benefits and limitations of GPU-based tools for large-scale neural network simulation and analysis, first by developing an agent-inspired spiking neural network simulator then by adapting a neural signal decoding algorithm. Under certain network configurations, the simulator was able to outperform an equivalent MPI-based parallel implementation run on a dedicated compute cluster, while the decoding algorithm implementation consistently outperformed its serial counterpart. Additionally, the GPU-based simulator was able to readily visualize network spiking activity in real-time due to the close integration with standard computer graphics APIs. The GPU was shown to provide significant performance benefits under certain circumstances while lagging behind in others. Given the complex nature of these research tasks, a hybrid strategy that combines GPU- and CPU-based approaches provides greater performance than either separately

    Prefetching and Caching Techniques in File Systems for Mimd Multiprocessors

    Get PDF
    The increasing speed of the most powerful computers, especially multiprocessors, makes it difficult to provide sufficient I/O bandwidth to keep them running at full speed for the largest problems. Trends show that the difference in the speed of disk hardware and the speed of processors is increasing, with I/O severely limiting the performance of otherwise fast machines. This widening access-time gap is known as the “I/O bottleneck crisis.” One solution to the crisis, suggested by many researchers, is to use many disks in parallel to increase the overall bandwidth. \par This dissertation studies some of the file system issues needed to get high performance from parallel disk systems, since parallel hardware alone cannot guarantee good performance. The target systems are large MIMD multiprocessors used for scientific applications, with large files spread over multiple disks attached in parallel. The focus is on automatic caching and prefetching techniques. We show that caching and prefetching can transparently provide the power of parallel disk hardware to both sequential and parallel applications using a conventional file system interface. We also propose a new file system interface (compatible with the conventional interface) that could make it easier to use parallel disks effectively. \par Our methodology is a mixture of implementation and simulation, using a software testbed that we built to run on a BBN GP1000 multiprocessor. The testbed simulates the disks and fully implements the caching and prefetching policies. Using a synthetic workload as input, we use the testbed in an extensive set of experiments. The results show that prefetching and caching improved the performance of parallel file systems, often dramatically

    Cluster Computing Review

    Get PDF
    In the past decade there has been a dramatic shift from mainframe or ‘host−centric’ computing to a distributed ‘client−server’ approach. In the next few years this trend is likely to continue with further shifts towards ‘network−centric’ computing becoming apparent. All these trends were set in motion by the invention of the mass−reproducible microprocessor by Ted Hoff of Intel some twenty−odd years ago. The present generation of RISC microprocessors are now more than a match for mainframes in terms of cost and performance. The long−foreseen day when collections of RISC microprocessors assembled together as a parallel computer could out perform the vector supercomputers has finally arrived. Such high−performance parallel computers incorporate proprietary interconnection networks allowing low−latency, high bandwidth inter−processor communications. However, for certain types of applications such interconnect optimization is unnecessary and conventional LAN technology is sufficient. This has led to the realization that clusters of high−performance workstations can be realistically used for a variety of applications either to replace mainframes, vector supercomputers and parallel computers or to better manage already installed collections of workstations. Whilst it is clear that ‘cluster computers’ have limitations, many institutions and companies are exploring this option. Software to manage such clusters is at an early stage of development and this report reviews the current state−of−the−art. Cluster computing is a rapidly maturing technology that seems certain to play an important part in the ‘network−centric’ computing future

    Analysis and resynthesis of polyphonic music

    Get PDF
    This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a system that can analyse, transcribe, process, and resynthesise monaural polyphonic music. I then describe and compare the possible hardware and software platforms. After this I describe a prototype hybrid system that attempts to carry out these tasks using a method based on additive synthesis. Next I present results from its application to a variety of musical examples, and critically assess its performance and limitations. I then address these issues in the design of a second system based on Gabor wavelets. I conclude by summarising the research and outlining suggestions for future developments

    A Review of Commercial and Research Cluster Management Software

    Get PDF
    In the past decade there has been a dramatic shift from mainframe or ‘host-centric’ computing to a distributed ‘client-server’ approach. In the next few years this trend is likely to continue with further shifts towards ‘network-centric’ computing becoming apparent. All these trends were set in motion by the invention of the mass-reproducible microprocessor by Ted Hoff of Intel some twenty-odd years ago. The present generation of RISC microprocessors are now more than a match for mainframes in terms of cost and performance. The long-foreseen day when collections of RISC microprocessors assembled together as a parallel computer could outperform the vector supercomputers has finally arrived. Such high-performance parallel computers incorporate proprietary interconnection networks allowing low-latency, high bandwidth inter-processor communications. However, for certain types of applications such interconnect optimization is unnecessary and conventional LAN technology is sufficient. This has led to the realization that clusters of high-performance workstations can be realistically used for a variety of applications either to replace mainframes, vector supercomputers and parallel computers or to better manage already installed collections of workstations. Whilst it is clear that ‘cluster computers’ have limitations, many institutions and companies are exploring this option. Software to manage such clusters is at an early stage of development and this report reviews the current state-of-the-art. Cluster computing is a rapidly maturing technology that seems certain to play an important part in the ‘network-centric’ computing future

    A PC-based data acquisition system for sub-atomic physics measurements

    Get PDF
    Modern particle physics measurements are heavily dependent upon automated data acquisition systems (DAQ) to collect and process experiment-generated information. One research group from the University of Saskatchewan utilizes a DAQ known as the Lucid data acquisition and analysis system. This thesis examines the project undertaken to upgrade the hardware and software components of Lucid. To establish the effectiveness of the system upgrades, several performance metrics were obtained including the system's dead time and input/output bandwidth.Hardware upgrades to Lucid consisted of replacing its aging digitization equipment with modern, faster-converting Versa-Module Eurobus (VME) technology and replacing the instrumentation processing platform with common, PC hardware. The new processor platform is coupled to the instrumentation modules via a fiber-optic bridging-device, the sis1100/3100 from Struck Innovative Systems.The software systems of Lucid were also modified to follow suit with the new hardware. Originally constructed to utilize a proprietary real-time operating system, the data acquisition application was ported to run under the freely available Real-Time Executive for Multiprocessor Systems (RTEMS). The device driver software provided with sis1100/3100 interface also had to be ported for use under the RTEMS-based system. Performance measurements of the upgraded DAQ indicate that the dead time has been reduced from being on the order of milliseconds to being on the order of several tens of microseconds. This increased capability means that Lucid's users may acquire significantly more data in a shorter period of time, thereby decreasing both the statistical uncertainties and data collection duration associated with a given experiment

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    Effective memory management for mobile environments

    Get PDF
    Smartphones, tablets, and other mobile devices exhibit vastly different constraints compared to regular or classic computing environments like desktops, laptops, or servers. Mobile devices run dozens of so-called “apps” hosted by independent virtual machines (VM). All these VMs run concurrently and each VM deploys purely local heuristics to organize resources like memory, performance, and power. Such a design causes conflicts across all layers of the software stack, calling for the evaluation of VMs and the optimization techniques specific for mobile frameworks. In this dissertation, we study the design of managed runtime systems for mobile platforms. More specifically, we deepen the understanding of interactions between garbage collection (GC) and system layers. We develop tools to monitor the memory behavior of Android-based apps and to characterize GC performance, leading to the development of new techniques for memory management that address energy constraints, time performance, and responsiveness. We implement a GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), Just-in-time (JIT) compilation, and across established dimensions of heap memory size and concurrency. Finally, we provision GC as a global service that collects statistics from all running VMs and then makes an informed decision that optimizes across all them (and not just locally), and across all layers of the stack. Our evaluation illustrates the power of such a central coordination service and garbage collection mechanism in improving memory utilization, throughput, and adaptability to user activities. In fact, our techniques aim at a sweet spot, where total on-chip energy is reduced (20–30%) with minimal impact on throughput and responsiveness (5–10%). The simplicity and efficacy of our approach reaches well beyond the usual optimization techniques
    • 

    corecore