194 research outputs found

    Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

    Get PDF
    With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

    Algorithm Development and VLSI Implementation of Energy Efficient Decoders of Polar Codes

    Get PDF
    With its low error-floor performance, polar codes attract significant attention as the potential standard error correction code (ECC) for future communication and data storage. However, the VLSI implementation complexity of polar codes decoders is largely influenced by its nature of in-series decoding. This dissertation is dedicated to presenting optimal decoder architectures for polar codes. This dissertation addresses several structural properties of polar codes and key properties of decoding algorithms that are not dealt with in the prior researches. The underlying concept of the proposed architectures is a paradigm that simplifies and schedules the computations such that hardware is simplified, latency is minimized and bandwidth is maximized. In pursuit of the above, throughput centric successive cancellation (TCSC) and overlapping path list successive cancellation (OPLSC) VLSI architectures and express journey BP (XJBP) decoders for the polar codes are presented. An arbitrary polar code can be decomposed by a set of shorter polar codes with special characteristics, those shorter polar codes are referred to as constituent polar codes. By exploiting the homogeneousness between decoding processes of different constituent polar codes, TCSC reduces the decoding latency of the SC decoder by 60% for codes with length n = 1024. The error correction performance of SC decoding is inferior to that of list successive cancellation decoding. The LSC decoding algorithm delivers the most reliable decoding results; however, it consumes most hardware resources and decoding cycles. Instead of using multiple instances of decoding cores in the LSC decoders, a single SC decoder is used in the OPLSC architecture. The computations of each path in the LSC are arranged to occupy the decoder hardware stages serially in a streamlined fashion. This yields a significant reduction of hardware complexity. The OPLSC decoder has achieved about 1.4 times hardware efficiency improvement compared with traditional LSC decoders. The hardware efficient VLSI architectures for TCSC and OPLSC polar codes decoders are also introduced. Decoders based on SC or LSC algorithms suffer from high latency and limited throughput due to their serial decoding natures. An alternative approach to decode the polar codes is belief propagation (BP) based algorithm. In BP algorithm, a graph is set up to guide the beliefs propagated and refined, which is usually referred to as factor graph. BP decoding algorithm allows decoding in parallel to achieve much higher throughput. XJBP decoder facilitates belief propagation by utilizing the specific constituent codes that exist in the conventional factor graph, which results in an express journey (XJ) decoder. Compared with the conventional BP decoding algorithm for polar codes, the proposed decoder reduces the computational complexity by about 40.6%. This enables an energy-efficient hardware implementation. To further explore the hardware consumption of the proposed XJBP decoder, the computations scheduling is modeled and analyzed in this dissertation. With discussions on different hardware scenarios, the optimal scheduling plans are developed. A novel memory-distributed micro-architecture of the XJBP decoder is proposed and analyzed to solve the potential memory access problems of the proposed scheduling strategy. The register-transfer level (RTL) models of the XJBP decoder are set up for comparisons with other state-of-the-art BP decoders. The results show that the power efficiency of BP decoders is improved by about 3 times

    A Linux Real-Time Packet Scheduler for Reliable Static SDN Routing

    Get PDF
    In a distributed computing environment, guaranteeing the hard deadline for real-time messages is essential to ensure schedulability of real-time tasks. Since capabilities of the shared resources for transmission are limited, e.g., the buffer size is limited on network devices, it becomes a challenge to design an effective and feasible resource sharing policy based on both the demand of real-time packet transmissions and the limitation of resource capabilities. We address this challenge in two cooperative mechanisms. First, we design a static routing algorithm to find forwarding paths for packets to guarantee their hard deadlines. The routing algorithm employs a validation-based backtracking procedure capable of deriving the demand of a set of real-time packets on each shared network device, and it checks whether this demand can be met on the device. Second, we design a packet scheduler that runs on network devices to transmit messages according to our routing requirements. We implement these mechanisms on virtual software-defined network (SDN) switches and evaluate them on real hardware in a local cluster to demonstrate the feasibility and effectiveness of our routing algorithm and packet scheduler

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Ku-band signal design study

    Get PDF
    Analytical tools, methods and techniques for assessing the design and performance of the space shuttle orbiter data processing system (DPS) are provided. The computer data processing network is evaluated in the key areas of queueing behavior synchronization and network reliability. The structure of the data processing network is described as well as the system operation principles and the network configuration. The characteristics of the computer systems are indicated. System reliability measures are defined and studied. System and network invulnerability measures are computed. Communication path and network failure analysis techniques are included

    Affordable techniques for dependable microprocessor design

    Get PDF
    As high computing power is available at an affordable cost, we rely on microprocessor-based systems for much greater variety of applications. This dependence indicates that a processor failure could have more diverse impacts on our daily lives. Therefore, dependability is becoming an increasingly important quality measure of microprocessors.;Temporary hardware malfunctions caused by unstable environmental conditions can lead the processor to an incorrect state. This is referred to as a transient error or soft error. Studies have shown that soft errors are the major source of system failures. This dissertation characterizes the soft error behavior on microprocessors and presents new microarchitectural approaches that can realize high dependability with low overhead.;Our fault injection studies using RISC processors have demonstrated that different functional blocks of the processor have distinct susceptibilities to soft errors. The error susceptibility information must be reflected in devising fault tolerance schemes for cost-sensitive applications. Considering the common use of on-chip caches in modern processors, we investigated area-efficient protection schemes for memory arrays. The idea of caching redundant information was exploited to optimize resource utilization for increased dependability. We also developed a mechanism to verify the integrity of data transfer from lower level memories to the primary caches. The results of this study show that by exploiting bus idle cycles and the information redundancy, an almost complete check for the initial memory data transfer is possible without incurring a performance penalty.;For protecting the processor\u27s control logic, which usually remains unprotected, we propose a low-cost reliability enhancement strategy. We classified control logic signals into static and dynamic control depending on their changeability, and applied various techniques including commit-time checking, signature caching, component-level duplication, and control flow monitoring. Our schemes can achieve more than 99% coverage with a very small hardware addition. Finally, a virtual duplex architecture for superscalar processors is presented. In this system-level approach, the processor pipeline is backed up by a partially replicated pipeline. The replication-based checker minimizes the design and verification overheads. For a large-scale superscalar processor, the proposed architecture can bring 61.4% reduction in die area while sustaining the maximum performance

    Infrastructure for distributed enterprise simulation

    Full text link

    The evaluation of computer performance by means of state-dependent queueing network models

    Get PDF
    Imperial Users onl

    Methods and Devices for Mobile Robot Navigation and Mapping in Unstructured Environments

    Get PDF
    2006/2007The work described in this thesis has been carried out in the context of the exploration of an unknown environment by an autonomous mobile robot. It is rather difficult to imagine a robot that is truly autonomous without being capable of acquiring a model of its environment. This model can be built by the robot exploring the environment and registering the data collected with the sensors over time. In the last decades a lot of progress has been made regarding techniques focused on environments which posses a lot of structure. This thesis contributes to the goal of extending existing techniques to unstructured environments by proposing new methods and devices for mapping in real-time. The first part of the thesis addresses some of the problems of ultrasonic sensors which are widely used in mobile robotics for mapping and obstacle detection during exploration. Ultrasonic sensors have two main shortcomings leading to disappointing performance: uncertainty in target location and multiple reflections. The former is caused by wide beam width and the latter gives erroneous distance measurements because of the insertion of spikes not directly connected to the target. With the aim of registering a detailed contour of the environment surrounding the robot, a sensing device was developed by focusing the ultrasonic beam of the most common ultrasonic sensor to extend its range and improve the spatial resolution. Extended range makes this sensor much more suitable for mapping of outdoor environments which are typically larger. Improved spatial resolution enables the usage of recent laser scan matching techniques on the sonar scans of the environment collected with the sensor. Furthermore, an algorithm is proposed to mitigate some undesirable effects and problems of the ultrasonic sensor. The method registers the acquired raw ultrasonic signal in order to obtain a reliable mapping of the environment. A single sonar measurement consists of a number of pulses reflected by an obstacle. From a series of sensor readings at different sonar angles the sequence of pulses reflected by the environment changes according to the distance between the sensor and the environment. This results in an image of sonar reflections that can be built by representing the reading angle on the horizontal axis and the echoes acquired by the sensor on the vertical one. The characteristics of a sonar emission result in a texture embedded in the image. The algorithm performs a 2D texture analysis of the sonar reflections image in such a way that the texture continuity is analyzed at the overall image scale, thus enabling the correction of the texture continuity by restoring weak or missing reflections. The first part of the algorithm extracts geometric semantic attributes from the image in order to enhance and correct the signal. The second part of the algorithm applies heuristic rules to find the leading pulse of the echo and to estimate the obstacle location in points where otherwise it would not be possible due to noise or lack of signal. The method overcomes inherent problems of ultrasonic sensing in case of high irregularities and missing reflections. It is suitable for map building during mobile robot exploration missions. It's main limitation is small coverage area. This area however increases during exploration as more scans are processed from different positions. Localization and mapping problems were addressed in the second part of the thesis. The main issue in robot self-localization is how to match sensed data, acquired with devices such as laser range finders or ultrasonic range sensors, against reference map information. In particular scan matching techniques are used to correct the accumulated positional error using dead reckoning sensors like odometry and inertial sensors and thus cancel out the effects of noise on localization and mapping. Given the reference scan from a known position and the new scan in unknown or approximately known position, the scan matching algorithm should provide a position estimate which is close to the true robot position from which the new scan was acquired. A genetic based optimization algorithm that solves this problem called GLASM is proposed. It uses a novel fitness function which is based on a look up table requiring little memory to speed the search. Instead of searching for corresponding point pairs and then computing the mean of the distances between them, as in other algorithms, the fitness is directly evaluated by matching points which, after the projection on the same coordinate frame, fall in the search window around the previous scan. It has a linear computational complexity, whereas the algorithms based on correspondences have a quadratic cost. The GLASM algorithm has been compared to it's closest rivals. The results of comparison are reported in the thesis and show, to summarize, that GLASM outperforms them both in speed and in matching success ratio. Glasm is suitable for implementation in feature-poor environments and robust to high sensor noise, as is the case with the sonar readings used in this thesis which are much noisier than laser scanners. The algorithm does not place a high computational burden on the processor, which is important for real world applications where the power consumption is a concern, and scales easily on multiprocessor systems. The algorithm does not require an initial position estimate and is suitable for unstructured environments. In mobile robotics it is critical to evaluate the above mentioned methods and devices in real world applications on systems with limited power and computational resources. In the third part of the thesis some new theoretical results are derived concerning open problems in non-preemptive scheduling of periodic tasks on a uniprocessor. This results are then used to propose a design methodology which is used in an application on a mobile robot. The mobile robot is equipped with an embedded system running a new real-time kernel called Yartek with a non-preemptive scheduler of periodic tasks. The application is described and some preliminary mapping results are presented. The real-time operating system has been developed in a collaborative work for an embedded platform based on a Coldfire microcontroller. The operating system allows the creation and running of tasks and offers a dynamic management of a contiguous memory using a first-fit criterion. The tasks can be real-time periodic scheduled with non-preemptive EDF, or non real-time. In order to improve the usability of the system, a RAM-disk is included: it is actually an array defined in the main memory and managed using pointers, therefore its operation is very fast. The goal was to realize small autonomous embedded system for implementing real-time algorithms for non visual robotic sensors, such as infrared, tactile, inertial devices or ultrasonic proximity sensors. The system provides the processing requested by non visual sensors without imposing a computation burden on the main processor of the robot. In particular, the embedded system described in this thesis provides the robot with the environmental map acquired with the ultrasonic sensors. Yartek has low footprint and low overhead. In order to compare Yartek with another operating system a porting of RTAI for Linux has been performed on the Avnet M5282EVB board and testing procedures were implemented. Tests regarding context switch time, jitter time and interrupt latency time are reported to describe the performance of Yartek. The contributions of this thesis include the presentation of new algorithms and devices, their applications and also some theoretical results. They are briefly summarized as: A focused ultrasonic sensing device is developed and used in mapping applications. An algorithm that processes the ultrasonic readings in order to develop a reliable map of the environment is presented. A new genetic algorithm for scan matching called GLASM is proposed. Schedulability conditions for non-preemptive scheduling in a hard real-time operating system are introduced and a design methodology is proposed. A real-time kernel for embedded systems in mobile robotics is presented. A practical robotic application is described and implementation details and trade-offs are explained.XIX Ciclo197
    • …
    corecore