2,067 research outputs found

    Using reconfigurable functional units in conventional microprocessors.

    Full text link

    RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors

    Get PDF
    Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors. This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance

    Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems

    Get PDF
    Scientific applications strive for increased memory and computing performance, requiring massive amounts of data and time to produce results. Applications utilize large-scale, parallel computing platforms with advanced architectures to accommodate their needs. However, developing performance-portable applications for modern, heterogeneous platforms requires lots of effort and expertise in both the application and systems domains. This is more relevant for unstructured applications whose workflow is not statically predictable due to their heavily data-dependent nature. One possible solution for this problem is the introduction of an intelligent Domain-Specific Language (iDSL) that transparently helps to maintain correctness, hides the idiosyncrasies of lowlevel hardware, and scales applications. An iDSL includes domain-specific language constructs, a compilation toolchain, and a runtime providing task scheduling, data placement, and workload balancing across and within heterogeneous nodes. In this work, we focus on the runtime framework. We introduce a novel design and extension of a runtime framework, the Parallel Runtime Environment for Multicore Applications. In response to the ever-increasing intra/inter-node concurrency, the runtime system supports efficient task scheduling and workload balancing at both levels while allowing the development of custom policies. Moreover, the new framework provides abstractions supporting the utilization of heterogeneous distributed nodes consisting of CPUs and GPUs and is extensible to other devices. We demonstrate that by utilizing this work, an application (or the iDSL) can scale its performance on heterogeneous exascale-era supercomputers with minimal effort. A future goal for this framework (out of the scope of this thesis) is to be integrated with machine learning to improve its decision-making and performance further. As a bridge to this goal, since the framework is under development, we experiment with data from Nuclear Physics Particle Accelerators and demonstrate the significant improvements achieved by utilizing machine learning in the hit-based track reconstruction process

    Trends in Sighting Systems for Combat Vehicles

    Get PDF
    Search and tracking in dynamic condition, rapid re-targeting, precision pointing and long range engagement in day and night condition are core requisite of stabilised sighting systems used for combat vehicles. Complex battle field requires integrated fire control system with stabilised sighting system as its main constituent. It facilitates quick reaction to fire control system and provides vital edge in the battlefield scenario. Precision gimbal design, optics design, embedded engineering, control system, electro-optical sensors, target detection and tracking, panorama generation, auto-alerting, digital image stabilisation, image fusion and integration are important aspects of sighting system development. In this paper, design considerations for a state of art stabilised sighting system have been presented including laboratory and field evaluation methods for such systems

    Adaptive Cognitive Interaction Systems

    Get PDF
    Adaptive kognitive Interaktionssysteme beobachten und modellieren den Zustand ihres Benutzers und passen das Systemverhalten entsprechend an. Ein solches System besteht aus drei Komponenten: Dem empirischen kognitiven Modell, dem komputationalen kognitiven Modell und dem adaptiven Interaktionsmanager. Die vorliegende Arbeit enthält zahlreiche Beiträge zur Entwicklung dieser Komponenten sowie zu deren Kombination. Die Ergebnisse werden in zahlreichen Benutzerstudien validiert

    The Contemporary Affirmation of Taxonomy and Recent Literature on Workflow Scheduling and Management in Cloud Computing

    Get PDF
    The Cloud computing systemspreferred over the traditional forms of computing such as grid computing, utility computing, autonomic computing is attributed forits ease of access to computing, for its QoS preferences, SLA2019;s conformity, security and performance offered with minimal supervision. A cloud workflow schedule when designed efficiently achieves optimalre source sage, balance of workloads, deadline specific execution, cost control according to budget specifications, efficient consumption of energy etc. to meet the performance requirements of today2019; svast scientific and business requirements. The businesses requirements under recent technologies like pervasive computing are motivating the technology of cloud computing for further advancements. In this paper we discuss some of the important literature published on cloud workflow scheduling

    Using a Multiobjective Approach to Balance Mission and Network Goals within a Delay Tolerant Network Topology

    Get PDF
    This thesis investigates how to incorporate aspects of an Air Tasking Order (ATO), a Communications Tasking Order (CTO), and a Network Tasking Order (NTO) within a cognitive network framework. This was done in an effort to aid the commander and or network operator by providing automation for battlespace management to improve response time and potential inconsistent problem resolution. In particular, autonomous weapon systems such as unmanned aerial vehicles (UAVs) were the focus of this research This work implemented a simple cognitive process by incorporating aspects of behavior based robotic control principles to solve the multi-objective optimization problem of balancing both network and mission goals. The cognitive process consisted of both a multi-move look ahead component, in which the future outcomes of decisions were estimated, and a subsumption decision making architecture in which these decision-outcome pairs were selected so they co-optimized the dual goals. This was tested within a novel Air force mission scenario consisting of a UAV surveillance mission within a delay tolerant network (DTN) topology. This scenario used a team of small scale UAVs (operating as a team but each running the cognitive process independently) to balance the mission goal of maintaining maximum overall UAV time-on-target and the network goal of minimizing the packet end-to-end delays experienced within the DTN. The testing was accomplished within a MATLAB discrete event simulation. The results indicated that this proposed approach could successfully simultaneously improve both goals as the network goal improved 52% and the mission goal improved by approximately 6%

    MPI+X: task-based parallelization and dynamic load balance of finite element assembly

    Get PDF
    The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strategies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when assembling the element system into the local system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality. The second technique avoids this issue but requires extensive changes in the implementation, which can be cumbersome when several element loops should be treated. We propose an alternative, based on the task parallelism of the element loop using some extensions to the OpenMP programming model. The taskification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs of the different elements is impossible to estimate a priori. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores

    A Hybrid Decomposition Parallel Implementation of the Car-Parrinello Method

    Full text link
    We have developed a flexible hybrid decomposition parallel implementation of the first-principles molecular dynamics algorithm of Car and Parrinello. The code allows the problem to be decomposed either spatially, over the electronic orbitals, or any combination of the two. Performance statistics for 32, 64, 128 and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel supercomputers and comparison with the performance of an optimized code running the smaller systems on the Cray Y-MP and C90 are presented.Comment: Accepted by Computer Physics Communications, latex, 34 pages without figures, 15 figures available in PostScript form via WWW at http://www-theory.chem.washington.edu/~wiggs/hyb_figures.htm

    Effects of Signal Probability on Multitasking-Based Distraction in Driving, Cyberattack & Battlefield Simulation

    Get PDF
    Multitasking-based failures of perception and action are the focus of much research in driving, where they are attributed to distraction. Similar failures occur in contexts where the construct of distraction is little used. Such narrow application was attributed to methodology which cannot precisely account for experimental variables in time and space, limiting distraction\u27s conceptual portability to other contexts. An approach based upon vigilance methodology was forwarded as a solution, and highlighted a fundamental human performance question: Would increasing the signal probability (SP) of a secondary task increase associated performance, as is seen in the prevalence effect associated with vigilance tasks? Would it reduce associated performance, as is seen in driving distraction tasks? A series of experiments weighed these competing assumptions. In the first, a psychophysical task, analysis of accuracy and response data revealed an interaction between the number of concurrent tasks and SP of presented targets. The question was further tested in the applied contexts of driving, cyberattack and battlefield target decision-making. In line with previous prevalence effect inquiry, presentation of stimuli at higher SP led to higher accuracy. In line with existing distraction work, performance of higher numbers of concurrent tasks tended to elicit slower response times. In all experiments raising either number of concurrent tasks or SP of targets resulted in greater subjective workload, as measured by the NASA TLX, even when accompanied by improved accuracy. It would seem that distraction in previous experiments has been an aggregate effect including both delayed response time and prevalence-based accuracy effects. These findings support the view that superior experimental control of SP reveals nomothetic patterns of performance that allow better understanding and wider application of the distraction construct both within and in diverse contexts beyond driving
    corecore