4,710 research outputs found
Fluid flow queue models for fixed-mobile network evaluation
A methodology for fast and accurate end-to-end KPI, like throughput and delay, estimation is proposed based on the service-centric traffic flow analysis and the fluid flow queuing model named CURSA-SQ. Mobile network features, like shared medium and mobility, are considered defining the models to be taken into account such as the propagation models and the fluid flow scheduling model. The developed methodology provides accurate computation of these KPIs, while performing orders of magnitude faster than discrete event simulators like ns-3. Finally, this methodology combined to its capacity for performance estimation in MPLS networks enables its application for near real-time converged fixed-mobile networks operation as it is proven in three use case scenarios
Optimal Control of Software Ensuring Safety and Functionality
Existing verification and validation methodologies can detect software violations very effectively but fail to provide any mechanism for correcting faults once they are detected. Detection of faults, their diagnosis and corrective actions are all essential components of any software rectification framework. In this paper, we propose a framework for correction of violations in software systems ensuring that the desired goals of the system are achieved. We describe a stochastic finite state machine used to abstract a software system along with the uncertainty in its operating environment. Safety property violations and satisfaction of functionalities are abstracted using penalties and rewards on the states, respectively. Rectification of software is then formulated as a stochastic optimal control problem over this abstraction. Algorithms polynomial in the size of the abstraction have been developed for solving this optimization problem exactly. The paper also applies the developed framework to a variety of examples from different domains
Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis
Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions
Selectable Heaps and Their Application to Lazy Search Trees
We show the O(log n) time extract minimum function of efficient priority queues can be generalized to the extraction of the k smallest elements in O(k log(n/k)) time. We first show the heap-ordered tree selection of Kaplan et al. can be applied on the heap-ordered trees of the classic Fibonacci heap to support the extraction in O(k \log(n/k)) amortized time. We then show selection is possible in a priority queue with optimal worst-case guarantees by applying heap-ordered tree selection on Brodal queues, supporting the operation in O(k log(n/k)) worst-case time.
Via a reduction from the multiple selection problem, Ω(k log(n/k)) time is necessary.
We then apply the result to the lazy search trees of Sandlund & Wild, creating a new interval data structure based on selectable heaps. This gives optimal O(B+n) lazy search tree performance, lowering insertion complexity into a gap Δi to O(log(n/|Δi|))$ time. An O(1)-time merge operation is also made possible under certain conditions. If Brodal queues are used, all runtimes of the lazy search tree can be made worst-case. The presented data structure uses soft heaps of Chazelle, biased search trees, and efficient priority queues in a non-trivial way, approaching the theoretically-best data structure for ordered data
Recommended from our members
Scalable hardware memory disambiguation
This dissertation deals with one of the long-standing problems in Computer Architecture
– the problem of memory disambiguation. Microprocessors typically reorder
memory instructions during execution to improve concurrency. Such microprocessors
use hardware memory structures for memory disambiguation, known as LoadStore
Queues (LSQs), to ensure that memory instruction dependences are satisfied
even when the memory instructions execute out-of-order. A typical LSQ implementation
(circa 2006) holds all in-flight memory instructions in a physically centralized
LSQ and performs a fully associative search on all buffered instructions to ensure
that memory dependences are satisfied. These LSQ implementations do not scale
because they use large, fully associative structures, which are known to be slow and
power hungry. The increasing trend towards distributed microarchitectures further
exacerbates these problems. As on-chip wire delays increase and high-performance
processors become necessarily distributed, centralized structures such as the LSQ
can limit scalability.
This dissertation describes techniques to create scalable LSQs in both centralized
and distributed microarchitectures. The problems and solutions described
in this thesis are motivated and validated by real system designs. The dissertation
starts with a description of the partitioned primary memory system of the TRIPS
processor, of which the LSQ is an important component, and then through a series
of optimizations describes how the power, area, and centralization problems
of the LSQ can be solved with minor performance losses (if at all) even for large
number of in flight memory instructions. The four solutions described in this dissertation
— partitioning, filtering, late binding and efficient overflow management —
enable power-, area-efficient, distributed and scalable LSQs, which in turn enable
aggressive large-window processors capable of simultaneously executing thousands
of instructions.
To mitigate the power problem, we replaced the power-hungry, fully associative
search with a power-efficient hash table lookup using a simple address-based
Bloom filter. Bloom filters are probabilistic data structures used for testing set
membership and can be used to quickly check if an instruction with the same data
address is likely to be found in the LSQ without performing the associative search.
Bloom filters typically eliminate more than 80% of the associative searches and they
are highly effective because in most programs, it is uncommon for loads and stores
to have the same data address and be in execution simultaneously.
To rectify the area problem, we observe the fact that only a small fraction
of all memory instructions are dependent, that only such dependent instructions
need to be buffered in the LSQ, and that these instructions need to be in the LSQ
only for certain parts of the pipelined execution. We propose two mechanisms to
exploit these observations. The first mechanism, area filtering, is a hardware mechanism
that couples Bloom filters and dependence predictors to dynamically identify
and buffer only those instructions which are likely to be dependent. The second
mechanism, late binding, reduces the occupancy and hence size of the LSQ. Both of
these optimizations allows the number of LSQ slots to be reduced by up to one-half
compared to a traditional organization without any performance degradation.
Finally, we describe a new decentralized LSQ design for handling LSQ structural
hazards in distributed microarchitectures. Decentralization of LSQs, and to
a large extent distributed microarchitectures with memory speculation, has proved
to be impractical because of the high performance penalties associated with the
mechanisms for dealing with hazards. To solve this problem, we applied classic
flow-control techniques from interconnection networks for handling resource con-
flicts. The first method, memory-side buffering, buffers the overflowing instructions
in a separate buffer near the LSQs. The second scheme, execution-side NACKing,
sends the overflowing instruction back to the issue window from which it is later
re-issued. The third scheme, network buffering, uses the buffers in the interconnection
network between the execution units and memory to hold instructions when the
LSQ is full, and uses virtual channel flow control to avoid deadlocks. The network
buffering scheme is the most robust of all the overflow schemes and shows less than
1% performance degradation due to overflows for a subset of SPEC CPU 2000 and
EEMBC benchmarks on a cycle-accurate simulator that closely models the TRIPS
processor.
The techniques proposed in this dissertation are independent, architectureneutral
and their cumulative benefits result in LSQs that can be partitioned at a
fine granularity and have low design complexity. Each of these partitions selectively
buffers only memory instructions with true dependences and can be closely coupled
with the execution units thus minimizing power, area, and latency. Such LSQ
designs with near-ideal characteristics are well suited for microarchitectures with
thousands of instructions in-flight and may enable even more aggressive microarchitectures
in the future.Computer Science
- …