Search CORE

33 research outputs found

A Parallel Monte Carlo Code for Simulating Collisional N-body Systems

Author: Alok Choudhary
Bharath Pattabiraman
Binney
Böker
Chatterjee
Collins
Frederic A. Rasio
Fregeau
Fregeau
Giersz
Gokhan Memik
Goswami
Gropp
Heggie
Heggie
Heggie
Joshi
Joshi
L'Ecuyer
Li
Lightman
Lusk
McLaughlin
Merritt
Miller
Nvidia.
Spitzer
Stefan Umbreit
Stodolkiewicz
Trenti
Umbreit
Vassiliki Kalogera
Wei-keng Liao
Publication venue: 'IOP Publishing'
Publication date: 15/11/2012
Field of study

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within less than 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7. The runtime reaches a saturation with the addition of more processors beyond these limits which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60x, 100x, and 220x, respectively.Comment: 53 pages, 13 figures, accepted for publication in ApJ Supplement

arXiv.org e-Print Archive

Crossref

Development and certification of mixed-criticality embedded systems based on probabilistic timing analysis

Author: Agirre Troncoso Irune
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2018
Field of study

An increasing variety of emerging systems relentlessly replaces or augments the functionality of mechanical subsystems with embedded electronics. For quantity, complexity, and use, the safety of such subsystems is an increasingly important matter. Accordingly, those systems are subject to safety certification to demonstrate system's safety by rigorous development processes and hardware/software constraints. The massive augment in embedded processors' complexity renders the arduous certification task significantly harder to achieve. The focus of this thesis is to address the certification challenges in multicore architectures: despite their potential to integrate several applications on a single platform, their inherent complexity imperils their timing predictability and certification. Recently, the Measurement-Based Probabilistic Timing Analysis (MBPTA) technique emerged as an alternative to deal with hardware/software complexity. The innovation that MBPTA brings about is, however, a major step from current certification procedures and standards. The particular contributions of this Thesis include: (i) the definition of certification arguments for mixed-criticality integration upon multicore processors. In particular we propose a set of safety mechanisms and procedures as required to comply with functional safety standards. For timing predictability, (ii) we present a quantitative approach to assess the likelihood of execution-time exceedance events with respect to the risk reduction requirements on safety standards. To this end, we build upon the MBPTA approach and we present the design of a safety-related source of randomization (SoR), that plays a key role in the platform-level randomization needed by MBPTA. And (iii) we evaluate current certification guidance with respect to emerging high performance design trends like caches. Overall, this Thesis pushes the certification limits in the use of multicore and MBPTA technology in Critical Real-Time Embedded Systems (CRTES) and paves the way towards their adoption in industry.Una creciente variedad de sistemas emergentes reemplazan o aumentan la funcionalidad de subsistemas mecánicos con componentes electrónicos embebidos. El aumento en la cantidad y complejidad de dichos subsistemas electrónicos así como su cometido, hacen de su seguridad una cuestión de creciente importancia. Tanto es así que la comercialización de estos sistemas críticos está sujeta a rigurosos procesos de certificación donde se garantiza la seguridad del sistema mediante estrictas restricciones en el proceso de desarrollo y diseño de su hardware y software. Esta tesis trata de abordar los nuevos retos y dificultades dadas por la introducción de procesadores multi-núcleo en dichos sistemas críticos: aunque su mayor rendimiento despierta el interés de la industria para integrar múltiples aplicaciones en una sola plataforma, suponen una mayor complejidad. Su arquitectura desafía su análisis temporal mediante los métodos tradicionales y, asimismo, su certificación es cada vez más compleja y costosa. Con el fin de lidiar con estas limitaciones, recientemente se ha desarrollado una novedosa técnica de análisis temporal probabilístico basado en medidas (MBPTA). La innovación de esta técnica, sin embargo, supone un gran cambio cultural respecto a los estándares y procedimientos tradicionales de certificación. En esta línea, las contribuciones de esta tesis están agrupadas en tres ejes principales: (i) definición de argumentos de seguridad para la certificación de aplicaciones de criticidad-mixta sobre plataformas multi-núcleo. Se definen, en particular, mecanismos de seguridad, técnicas de diagnóstico y reacción de faltas acorde con el estándar IEC 61508 sobre una arquitectura multi-núcleo de referencia. Respecto al análisis temporal, (ii) presentamos la cuantificación de la probabilidad de exceder un límite temporal y su relación con los requisitos de reducción de riesgos derivados de los estándares de seguridad funcional. Con este fin, nos basamos en la técnica MBPTA y presentamos el diseño de una fuente de números aleatorios segura; un componente clave para conseguir las propiedades aleatorias requeridas por MBPTA a nivel de plataforma. Por último, (iii) extrapolamos las guías actuales para la certificación de arquitecturas multi-núcleo a una solución comercial de 8 núcleos y las evaluamos con respecto a las tendencias emergentes de diseño de alto rendimiento (caches). Con estas contribuciones, esta tesis trata de abordar los retos que el uso de procesadores multi-núcleo y MBPTA implican en el proceso de certificación de sistemas críticos de tiempo real y facilita, de esta forma, su adopción por la industria.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

TamaRISC-CS: An Ultra-Low-Power Application-Specific Processor for Compressed Sensing

Author: Andersson Oskar
Atienza Alonso David
Burg Andreas Peter
Constantin Jeremy Hugues-Felix
Dogan Ahmed Yasir
Meinerzhagen Pascal Andreas
Rodrigues Joachim Neves
Publication venue
Publication date: 01/01/2012
Field of study

Compressed sensing (CS) is a universal technique for the compression of sparse signals. CS has been widely used in sensing platforms where portable, autonomous devices have to operate for long periods of time with limited energy resources. Therefore, an ultra-low-power (ULP) CS implementation is vital for these kind of energy-limited systems. Sub-threshold (sub-VT ) operation is commonly used for ULP computing, and can also be combined with CS. However, most established CS implementations can achieve either no or very limited benefit from sub-VT operation. Therefore, we propose a sub-VT application-specific instruction-set processor (ASIP), exploiting the specific operations of CS. Our results show that the proposed ASIP accomplishes 62x speed-up and 11.6x power savings with respect to an established CS implementation running on the baseline low-power processor

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Lund University Publications

Crossref

Side Channel Attacks on IoT Applications

Author: Yan Yan
Publication venue
Publication date: 19/03/2019
Field of study

Explore Bristol Research

Implementing IPsec using the Five-layer security framework and FPGAs.

Author: Wiebe James
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

Scholarship at UWindsor

LiveSNN: a new ecosystem for HEENS architecture

Author: Oltra Oltra Josep Angel
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/09/2020
Field of study

This project proposal and development of a several tools, a communication protocol and an embedded program for giving a neural network called HEENS that it is currently developed by the group ISSET from UPC the capability of being controlled remotely, and to extract the neural . This will provide a faster development, user friendly tools for the development and analysis of spiking neural networks for emulation and verification of biological neural networks and neural models. As such, three new pieces of software are developed called: LiveSNN protocol, which communicates the HEENS architecture to a remote PC for Supervisory Control And Data Acquisition (SCADA) operations, LiveSNN program, which manages the conections and supervises the activity of the HEENS, and HEENS Toolchain Suite (HTS), which is an upgrade of the previous synthesis and assembler tools in order to allow the implementation of more sophisticated neural networks being easy and be capable of optimise the assembler model of the neuron for the architecture. With those developments, the time to generate spiking neural networks, to debug them, and emulate them is increased in addition to the reduction of possible human errors due to the increased automation workflow that those programs give to the end user

UPCommons. Portal del coneixement obert de la UPC

Testing a Random Number Generator: formal properties and automotive application

Author: Mattioli Federico
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 29/03/2019
Field of study

L'elaborato analizza un metodo di validazione dei generatori di numeri casuali (RNG), utilizzati per garantire la sicurezza dei moderni sistemi automotive. Il primo capitolo fornisce una panoramica della struttura di comunicazione dei moderni autoveicoli attraverso l'utilizzo di centraline (ECU): vengono riportati i principali punti di accesso ad un automobile, assieme a possibili tipologie di hacking; viene poi descritto l'utilizzo dei numeri casuali in crittografia, con particolare riferimento a quella utilizzata nei veicoli. Il secondo capitolo riporta le basi di probabilità necessarie all'approccio dei test statistici utilizzati per la validazione e riporta i principali approcci teorici al problema della casualità. Nei due capitoli centrali, viene proposta una descrizione dei metodi probabilistici ed entropici per l'analisi di dati reali utilizzati nei test. Vengono poi descritti e studiati i 15 test statistici proposti dal National Institute of Standards and Technology (NIST). Dopo i primi test, basati su proprietà molto semplici delle sequenze casuali, vengono proposti test più sofisticati, basati sull'uso della trasformata di Fourier (per testare eventuali comportamenti periodici), dell'entropia (strettamente connessi con la comprimibilità della sequenza), o sui random path. Due ulteriori test, permettono di valutare il buon funzionamento del generatore, e non solo delle singole sequenze generate. Infine, il quinto capitolo è dedicato all'implementazione dei test al fine di testare il TRNG delle centraline

AMS Tesi di Laurea

A Solder-Defined Computer Architecture for Backdoor and Malware Resistance

Author: Abel Marc W.
Publication venue: CORE Scholar
Publication date: 01/01/2022
Field of study

This research is about securing control of those devices we most depend on for integrity and confidentiality. An emerging concern is that complex integrated circuits may be subject to exploitable defects or backdoors, and measures for inspection and audit of these chips are neither supported nor scalable. One approach for providing a “supply chain firewall” may be to forgo such components, and instead to build central processing units (CPUs) and other complex logic from simple, generic parts. This work investigates the capability and speed ceiling when open-source hardware methodologies are fused with maker-scale assembly tools and visible-scale final inspection. The author has designed, and demonstrated in simulation, a 36-bit CPU and protected memory subsystem that use only synchronous static random access memory (SRAM) and trivial glue logic integrated circuits as components. The design presently lacks preemptive multitasking, ability to load firmware into the SRAMs used as logic elements, and input/output. Strategies are presented for adding these missing subsystems, again using only SRAM and trivial glue logic. A load-store architecture is employed with four clock cycles per instruction. Simulations indicate that a clock speed of at least 64 MHz is probable, corresponding to 16 million instructions per second (16 MIPS), despite the architecture containing no microprocessors, field programmable gate arrays, programmable logic devices, application specific integrated circuits, or other purchased complex logic. The lower speed, larger size, higher power consumption, and higher cost of an “SRAM minicomputer,” compared to traditional microcontrollers, may be offset by the fully open architecture—hardware and firmware—along with more rigorous user control, reliability, transparency, and auditability of the system. SRAM logic is also particularly well suited for building arithmetic logic units, and can implement complex operations such as population count, a hash function for associative arrays, or a pseudorandom number generator with good statistical properties in as few as eight clock cycles per 36-bit word processed. 36-bit unsigned multiplication can be implemented in software in 47 instructions or fewer (188 clock cycles). A general theory is developed for fast SRAM parallel multipliers should they be needed

CORE

Towards end-to-end security in internet of things based healthcare

Author: Rahimi Moosavi Sanaz
Publication venue: fi=Turku Centre of Computer Science|en=Turku Centre of Computer Science|
Publication date: 05/12/2019
Field of study

Healthcare IoT systems are distinguished in that they are designed to serve human beings, which primarily raises the requirements of security, privacy, and reliability. Such systems have to provide real-time notifications and responses concerning the status of patients. Physicians, patients, and other caregivers demand a reliable system in which the results are accurate and timely, and the service is reliable and secure. To guarantee these requirements, the smart components in the system require a secure and efficient end-to-end communication method between the end-points (e.g., patients, caregivers, and medical sensors) of a healthcare IoT system. The main challenge faced by the existing security solutions is a lack of secure end-to-end communication. This thesis addresses this challenge by presenting a novel end-to-end security solution enabling end-points to securely and efficiently communicate with each other. The proposed solution meets the security requirements of a wide range of healthcare IoT systems while minimizing the overall hardware overhead of end-to-end communication. End-to-end communication is enabled by the holistic integration of the following contributions. The first contribution is the implementation of two architectures for remote monitoring of bio-signals. The first architecture is based on a low power IEEE 802.15.4 protocol known as ZigBee. It consists of a set of sensor nodes to read data from various medical sensors, process the data, and send them wirelessly over ZigBee to a server node. The second architecture implements on an IP-based wireless sensor network, using IEEE 802.11 Wireless Local Area Network (WLAN). The system consists of a IEEE 802.11 based sensor module to access bio-signals from patients and send them over to a remote server. In both architectures, the server node collects the health data from several client nodes and updates a remote database. The remote webserver accesses the database and updates the webpage in real-time, which can be accessed remotely. The second contribution is a novel secure mutual authentication scheme for Radio Frequency Identification (RFID) implant systems. The proposed scheme relies on the elliptic curve cryptography and the D-Quark lightweight hash design. The scheme consists of three main phases: (1) reader authentication and verification, (2) tag identification, and (3) tag verification. We show that among the existing public-key crypto-systems, elliptic curve is the optimal choice due to its small key size as well as its efficiency in computations. The D-Quark lightweight hash design has been tailored for resource-constrained devices. The third contribution is proposing a low-latency and secure cryptographic keys generation approach based on Electrocardiogram (ECG) features. This is performed by taking advantage of the uniqueness and randomness properties of ECG's main features comprising of PR, RR, PP, QT, and ST intervals. This approach achieves low latency due to its reliance on reference-free ECG's main features that can be acquired in a short time. The approach is called Several ECG Features (SEF)-based cryptographic key generation. The fourth contribution is devising a novel secure and efficient end-to-end security scheme for mobility enabled healthcare IoT. The proposed scheme consists of: (1) a secure and efficient end-user authentication and authorization architecture based on the certificate based Datagram Transport Layer Security (DTLS) handshake protocol, (2) a secure end-to-end communication method based on DTLS session resumption, and (3) support for robust mobility based on interconnected smart gateways in the fog layer. Finally, the fifth and the last contribution is the analysis of the performance of the state-of-the-art end-to-end security solutions in healthcare IoT systems including our end-to-end security solution. In this regard, we first identify and present the essential requirements of robust security solutions for healthcare IoT systems. We then analyze the performance of the state-of-the-art end-to-end security solutions (including our scheme) by developing a prototype healthcare IoT system

UTUPub

Microarchitectural Low-Power Design Techniques for Embedded Microprocessors

Author: Constantin Jeremy Hugues-Felix
Publication venue: Lausanne, EPFL
Publication date: 09/11/2016
Field of study

With the omnipresence of embedded processing in all forms of electronics today, there is a strong trend towards wireless, battery-powered, portable embedded systems which have to operate under stringent energy constraints. Consequently, low power consumption and high energy efficiency have emerged as the two key criteria for embedded microprocessor design. In this thesis we present a range of microarchitectural low-power design techniques which enable the increase of performance for embedded microprocessors and/or the reduction of energy consumption, e.g., through voltage scaling. In the context of cryptographic applications, we explore the effectiveness of instruction set extensions (ISEs) for a range of different cryptographic hash functions (SHA-3 candidates) on a 16-bit microcontroller architecture (PIC24). Specifically, we demonstrate the effectiveness of light-weight ISEs based on lookup table integration and microcoded instructions using finite state machines for operand and address generation. On-node processing in autonomous wireless sensor node devices requires deeply embedded cores with extremely low power consumption. To address this need, we present TamaRISC, a custom-designed ISA with a corresponding ultra-low-power microarchitecture implementation. The TamaRISC architecture is employed in conjunction with an ISE and standard cell memories to design a sub-threshold capable processor system targeted at compressed sensing applications. We furthermore employ TamaRISC in a hybrid SIMD/MIMD multi-core architecture targeted at moderate to high processing requirements (> 1 MOPS). A range of different microarchitectural techniques for efficient memory organization are presented. Specifically, we introduce a configurable data memory mapping technique for private and shared access, as well as instruction broadcast together with synchronized code execution based on checkpointing. We then study an inherent suboptimality due to the worst-case design principle in synchronous circuits, and introduce the concept of dynamic timing margins. We show that dynamic timing margins exist in microprocessor circuits, and that these margins are to a large extent state-dependent and that they are correlated to the sequences of instruction types which are executed within the processor pipeline. To perform this analysis we propose a circuit/processor characterization flow and tool called dynamic timing analysis. Moreover, this flow is employed in order to devise a high-level instruction set simulation environment for impact-evaluation of timing errors on application performance. The presented approach improves the state of the art significantly in terms of simulation accuracy through the use of statistical fault injection. The dynamic timing margins in microprocessors are then systematically exploited for throughput improvements or energy reductions via our proposed instruction-based dynamic clock adjustment (DCA) technique. To this end, we introduce a 6-stage 32-bit microprocessor with cycle-by-cycle DCA. Besides a comprehensive design flow and simulation environment for evaluation of the DCA approach, we additionally present a silicon prototype of a DCA-enabled OpenRISC microarchitecture fabricated in 28 nm FD-SOI CMOS. The test chip includes a suitable clock generation unit which allows for cycle-by-cycle DCA over a wide range with fine granularity at frequencies exceeding 1 GHz. Measurement results of speedups and power reductions are provided

Infoscience - École polytechnique fédérale de Lausanne