45,529 research outputs found
High-Performance Architecture for Binary-Tree-Based Finite State Machines
A binary-tree-based finite state machine (BT-FSM)
is a state machine with a 1-bit input signal whose state transition
graph is a binary tree. BT-FSMs are useful in those
application areas where searching in a binary tree is required,
such as computer networks, compression, automatic control, or
cryptography. This paper presents a new architecture for implementing
BT-FSMs which is based on the model finite virtual state
machine (FVSM). The proposed architecture has been compared
with the general FVSM and conventional approaches by using
both synthetic test benches and very large BT-FSMs obtained
from a real application. In synthetic test benches, the average
speed improvement of the proposed architecture respect to the
best results of the other approaches achieves 41% (there are
some cases in which the speed is more than double). In the
case of the real application, the average speed improvement
achieves 155%
Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach
While it is well-known and acknowledged that the performance of graph
algorithms is heavily dependent on the input data, there has been surprisingly
little research to quantify and predict the impact the graph structure has on
performance. Parallel graph algorithms, running on many-core systems such as
GPUs, are no exception: most research has focused on how to efficiently
implement and tune different graph operations on a specific GPU. However, the
performance impact of the input graph has only been taken into account
indirectly as a result of the graphs used to benchmark the system.
In this work, we present a case study investigating how to use the properties
of the input graph to improve the performance of the breadth-first search (BFS)
graph traversal. To do so, we first study the performance variation of 15
different BFS implementations across 248 graphs. Using this performance data,
we show that significant speed-up can be achieved by combining the best
implementation for each level of the traversal. To make use of this
data-dependent optimization, we must correctly predict the relative performance
of algorithms per graph level, and enable dynamic switching to the optimal
algorithm for each level at runtime.
We use the collected performance data to train a binary decision tree, to
enable high-accuracy predictions and fast switching. We demonstrate empirically
that our decision tree is both fast enough to allow dynamic switching between
implementations, without noticeable overhead, and accurate enough in its
prediction to enable significant BFS speedup. We conclude that our model-driven
approach (1) enables BFS to outperform state of the art GPU algorithms, and (2)
can be adapted for other BFS variants, other algorithms, or more specific
datasets
Hardware-based Security for Virtual Trusted Platform Modules
Virtual Trusted Platform modules (TPMs) were proposed as a software-based
alternative to the hardware-based TPMs to allow the use of their cryptographic
functionalities in scenarios where multiple TPMs are required in a single
platform, such as in virtualized environments. However, virtualizing TPMs,
especially virutalizing the Platform Configuration Registers (PCRs), strikes
against one of the core principles of Trusted Computing, namely the need for a
hardware-based root of trust. In this paper we show how strength of
hardware-based security can be gained in virtual PCRs by binding them to their
corresponding hardware PCRs. We propose two approaches for such a binding. For
this purpose, the first variant uses binary hash trees, whereas the other
variant uses incremental hashing. In addition, we present an FPGA-based
implementation of both variants and evaluate their performance
Recommended from our members
Performance analysis of a message-oriented knowledge-base
First-order Horn logic is a useful formalism to design knowledge-based systems. When implemented on a sequential von Neumann computer, the main limitation of such systems is performance. We present a message-driven model for function-free Horn logic, where the knowledge base is represented as a network of logical processing elements communicating with one another exclusively through messages. The lack of centralized control and centralized memory makes this model suitable to implementation on a highly-parallel asynchronous computer architecture.The primary contribution of this paper is a performance analysis of this message-driven system and a comparison with a sequential resolution scheme using backtracking. For both approaches, closed form expressions for the performance results are derived and compared
CloudTree: A Library to Extend Cloud Services for Trees
In this work, we propose a library that enables on a cloud the creation and
management of tree data structures from a cloud client. As a proof of concept,
we implement a new cloud service CloudTree. With CloudTree, users are able to
organize big data into tree data structures of their choice that are physically
stored in a cloud. We use caching, prefetching, and aggregation techniques in
the design and implementation of CloudTree to enhance performance. We have
implemented the services of Binary Search Trees (BST) and Prefix Trees as
current members in CloudTree and have benchmarked their performance using the
Amazon Cloud. The idea and techniques in the design and implementation of a BST
and prefix tree is generic and thus can also be used for other types of trees
such as B-tree, and other link-based data structures such as linked lists and
graphs. Preliminary experimental results show that CloudTree is useful and
efficient for various big data applications
HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA
Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host
processor with programmable manycore accelerators (PMCAs) to combine
general-purpose computing with domain-specific, efficient processing
capabilities. While leading companies successfully advance their HESoC
products, research lags behind due to the challenges of building a prototyping
platform that unites an industry-standard host processor with an open research
PMCA architecture. In this work we introduce HERO, an FPGA-based research
platform that combines a PMCA composed of clusters of RISC-V cores, implemented
as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host
processor. The PMCA architecture mapped on the FPGA is silicon-proven,
scalable, configurable, and fully modifiable. HERO includes a complete software
stack that consists of a heterogeneous cross-compilation toolchain with support
for OpenMP accelerator programming, a Linux driver, and runtime libraries for
both host and PMCA. HERO is designed to facilitate rapid exploration on all
software and hardware layers: run-time behavior can be accurately analyzed by
tracing events, and modifications can be validated through fully automated hard
ware and software builds and executed tests. We demonstrate the usefulness of
HERO by means of case studies from our research
Identifying Native Applications with High Assurance
The work described in this paper investigates the problem
of identifying and deterring stealthy malicious processes on
a host. We point out the lack of strong application iden-
tication in main stream operating systems. We solve the
application identication problem by proposing a novel iden-
tication model in which user-level applications are required
to present identication proofs at run time to be authenti-
cated by the kernel using an embedded secret key. The se-
cret key of an application is registered with a trusted kernel
using a key registrar and is used to uniquely authenticate
and authorize the application. We present a protocol for
secure authentication of applications. Additionally, we de-
velop a system call monitoring architecture that uses our
model to verify the identity of applications when making
critical system calls. Our system call monitoring can be
integrated with existing policy specication frameworks to
enforce application-level access rights. We implement and
evaluate a prototype of our monitoring architecture in Linux
as device drivers with nearly no modication of the ker-
nel. The results from our extensive performance evaluation
shows that our prototype incurs low overhead, indicating the
feasibility of our model
- …