Search CORE

1,366 research outputs found

Building real-time embedded applications on QduinoMC: a web-connected 3D printer case study

Author: Cheng Zhuoqun
West Richard
Ye Ying
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Single Board Computers (SBCs) are now emerging with multiple cores, ADCs, GPIOs, PWM channels, integrated graphics, and several serial bus interfaces. The low power consumption, small form factor and I/O interface capabilities of SBCs with sensors and actuators makes them ideal in embedded and real-time applications. However, most SBCs run non-realtime operating systems based on Linux and Windows, and do not provide a user-friendly API for application development. This paper presents QduinoMC, a multicore extension to the popular Arduino programming environment, which runs on the Quest real-time operating system. QduinoMC is an extension of our earlier single-core, real-time, multithreaded Qduino API. We show the utility of QduinoMC by applying it to a specific application: a web-connected 3D printer. This differs from existing 3D printers, which run relatively simple firmware and lack operating system support to spool multiple jobs, or interoperate with other devices (e.g., in a print farm). We show how QduinoMC empowers devices with the capabilities to run new services without impacting their timing guarantees. While it is possible to modify existing operating systems to provide suitable timing guarantees, the effort to do so is cumbersome and does not provide the ease of programming afforded by QduinoMC.http://www.cs.bu.edu/fac/richwest/papers/rtas_2017.pdfAccepted manuscrip

Boston University Institutional Repository (OpenBU)

On Designing Multicore-aware Simulators for Biological Systems

Author: Aldinucci Marco
Coppo Mario
Damiani Ferruccio
Drocco Maurizio
Torquati Massimo
Troina Angelo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/10/2010
Field of study

The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Institutional Research Information System University of Turin

Creating portable and efficient packet processing applications

Author: A Korobeynikov
AV Aho
B Wun
CW Fraser
D Bernstein
EA Lee
EJ Johnson
Fulvio Risso
G Memik
J Carlstrom
J Wagner
JA Fisher
JL Hennessy
L Ciminiera
L George
M Baldi
M Baldi
MK Chen
N Shah
Olivier Morandi
P Briggs
Paolo Veglia
Pierluigi Rolando
R Cytron
R Ennals
R Morris
Silvio Valenti
SS Muchnick
T Lindholm
Z Budimlic
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Optimizing simulation on shared-memory platforms: The smart cities case

Author: Cingolani Davide
Ianni Mauro
Marotta Romolo
Pellegrini Alessandro
Quaglia Francesco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Modern advancements in computing architectures have been accompanied by new emergent paradigms to run Parallel Discrete Event Simulation models efficiently. Indeed, many new paradigms to effectively use the available underlying hardware have been proposed in the literature. Among these, the Share-Everything paradigm tackles massively-parallel shared-memory machines, in order to support speculative simulation by taking into account the limits and benefits related to this family of architectures. Previous results have shown how this paradigm outperforms traditional speculative strategies (such as data-separated Time Warp systems) whenever the granularity of executed events is small. In this paper, we show performance implications of this simulation-engine organization when the simulation models have a variable granularity. To this end, we have selected a traffic model, tailored for smart cities-oriented simulation. Our assessment illustrates the effects of the various tuning parameters related to the approach, opening to a higher understanding of this innovative paradigm

Crossref

ART

Archivio della ricerca- Università di Roma La Sapienza

Parallel simulation of Population Dynamics P systems: updates and roadmap

Author: Macías Ramos Luis Felipe
Martínez del Amor Miguel Ángel
Pérez Jiménez Mario de Jesús
Valencia Cabrera Luis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Population Dynamics P systems are a type of multienvironment P systems that serve as a formal modeling framework for real ecosystems. The accurate simulation of these probabilisticmodels, e.g. with Direct distribution based on Consistent Blocks Algorithm, entails large run times. Hence, parallel platforms such as GPUs have been employed to speedup the simulation. In 2012, the first GPU simulator of PDP systems was presented. However, it was able to run only randomly generated PDP systems. In this paper, we present current updates made on this simulator, involving an input modu le for binary files and an output module for CSV files. Finally, the simulator has been experimentally validated with a real ecosystem model, and its performance has been tested with two high-end GPUs: Tesla C1060 and K40.Ministerio de Economía y Competitividad TIN2012-37434Junta de Andalucía P08-TIC-0420

idUS. Depósito de Investigación Universidad de Sevilla

MULTI-SCALE SCHEDULING TECHNIQUES FOR SIGNAL PROCESSING SYSTEMS

Author: Zhou Zheng
Publication venue
Publication date: 01/01/2013
Field of study

A variety of hardware platforms for signal processing has emerged, from distributed systems such as Wireless Sensor Networks (WSNs) to parallel systems such as Multicore Programmable Digital Signal Processors (PDSPs), Multicore General Purpose Processors (GPPs), and Graphics Processing Units (GPUs) to heterogeneous combinations of parallel and distributed devices. When a signal processing application is implemented on one of those platforms, the performance critically depends on the scheduling techniques, which in general allocate computation and communication resources for competing processing tasks in the application to optimize performance metrics such as power consumption, throughput, latency, and accuracy. Signal processing systems implemented on such platforms typically involve multiple levels of processing and communication hierarchy, such as network-level, chip-level, and processor-level in a structural context, and application-level, subsystem-level, component-level, and operation- or instruction-level in a behavioral context. In this thesis, we target scheduling issues that carefully address and integrate scheduling considerations at different levels of these structural and behavioral hierarchies. The core contributions of the thesis include the following. Considering both the network-level and chip-level, we have proposed an adaptive scheduling algorithm for wireless sensor networks (WSNs) designed for event detection. Our algorithm exploits discrepancies among the detection accuracy of individual sensors, which are derived from a collaborative training process, to allow each sensor to operate in a more energy efficient manner while the network satisfies given constraints on overall detection accuracy. Considering the chip-level and processor-level, we incorporated both temperature and process variations to develop new scheduling methods for throughput maximization on multicore processors. In particular, we studied how to process a large number of threads with high speed and without violating a given maximum temperature constraint. We targeted our methods to multicore processors in which the cores may operate at different frequencies and different levels of leakage. We develop speed selection and thread assignment schedulers based on the notion of a core's steady state temperature. Considering the application-level, component-level and operation-level, we developed a new dataflow based design flow within the targeted dataflow interchange format (TDIF) design tool. Our new multiprocessor system-on-chip (MPSoC)-oriented design flow, called TDIF-PPG, is geared towards analysis and mapping of embedded DSP applications on MPSoCs. An important feature of TDIF-PPG is its capability to integrate graph level parallelism and actor level parallelism into the application mapping process. Here, graph level parallelism is exposed by the dataflow graph application representation in TDIF, and actor level parallelism is modeled by a novel model for multiprocessor dataflow graph implementation that we call the Parallel Processing Group (PPG) model. Building on the contribution above, we formulated a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for chip-level MPSoC mapping of DSP systems that are represented as synchronous dataflow (SDF) graphs. In contrast to traditional SDF-based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized can be parallelized. For this special case, we develop and experimentally evaluate a two-phase scheduling framework with three work flows --- particle swarm optimization with a mixed integer programming formulation, particle swarm optimization with a simulated annealing engine, and particle swarm optimization with a fast heuristic based on list scheduling. Then, we extend our scheduling framework to support general PAS problem which considers the actors cannot be parallelized

Digital Repository at the University of Maryland

Threads and Or-Parallelism Unified

Author: Carro
Correia
Gupta
INÊS DUTRA
Moura
Pontelli
RICARDO ROCHA
Rocha
Santos Costa
Santos Costa
Shen
Stevens
VíTOR SANTOS COSTA
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 26/07/2010
Field of study

One of the main advantages of Logic Programming (LP) is that it provides an excellent framework for the parallel execution of programs. In this work we investigate novel techniques to efficiently exploit parallelism from real-world applications in low cost multi-core architectures. To achieve these goals, we revive and redesign the YapOr system to exploit or-parallelism based on a multi-threaded implementation. Our new approach takes full advantage of the state-of-the-art fast and optimized YAP Prolog engine and shares the underlying execution environment, scheduler and most of the data structures used to support YapOr's model. Initial experiments with our new approach consistently achieve almost linear speedups for most of the applications, proving itself as a good alternative for exploiting implicit parallelism in the currently available low cost multi-core architectures.Comment: 17 pages, 21 figures, International Conference on Logic Programming (ICLP 2010

arXiv.org e-Print Archive

Crossref

A Survey of Prediction and Classification Techniques in Multicore Processor Systems

Author: Ababei Cristinel
Moghaddam Milad Ghorbani
Publication venue: e-Publications@Marquette
Publication date: 01/05/2019
Field of study

In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

epublications@Marquette

Recommended from our members

On the conditions for efficient interoperability with threads: An experience with PGAS languages using Cray communication domains

Author: Ibrahim KZ
Yelick K
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

Today's high performance systems are typically built from shared memory nodes connected by a high speed network. That architecture, combined with the trend towards less memory per core, encourages programmers to use a mixture of message passing and multithreaded programming. Unfortunately, the advantages of using threads for in-node programming are hindered by their inability to efficiently communicate between nodes. In this work, we identify some of the performance problems that arise in such hybrid programming environments and characterize conditions needed to achieve high communication performance for multiple threads: addressability of targets, separability of communication paths, and full direct reachability to targets. Using the GASNet communication layer on the Cray XC30 as our experimental platform, we show how to satisfy these conditions. We also discuss how satisfying these conditions is influenced by the communication abstraction, implementation constraints, and the interconnect messaging capabilities. To evaluate these ideas, we compare the communication performance of a thread-based node runtime to a process-based runtime. Without our GASNet extensions, thread communication is significantly slower than processes - up to 21x slower. Once the implementation is modified to address each of our conditions, the two runtimes have comparable communication performance. This allows programmers to more easily mix models like OpenMP, CILK, or pthreads with a GASNet-based model like UPC, with the associated performance, convenience and interoperability advantages that come from using threads within a node. © 2014 ACM

eScholarship - University of California