1,293 research outputs found

    CHANNEL CODING TECHNIQUES FOR A MULTIPLE TRACK DIGITAL MAGNETIC RECORDING SYSTEM

    Get PDF
    In magnetic recording greater area) bit packing densities are achieved through increasing track density by reducing space between and width of the recording tracks, and/or reducing the wavelength of the recorded information. This leads to the requirement of higher precision tape transport mechanisms and dedicated coding circuitry. A TMS320 10 digital signal processor is applied to a standard low-cost, low precision, multiple-track, compact cassette tape recording system. Advanced signal processing and coding techniques are employed to maximise recording density and to compensate for the mechanical deficiencies of this system. Parallel software encoding/decoding algorithms have been developed for several Run-Length Limited modulation codes. The results for a peak detection system show that Bi-Phase L code can be reliably employed up to a data rate of 5kbits/second/track. Development of a second system employing a TMS32025 and sampling detection permitted the utilisation of adaptive equalisation to slim the readback pulse. Application of conventional read equalisation techniques, that oppose inter-symbol interference, resulted in a 30% increase in performance. Further investigation shows that greater linear recording densities can be achieved by employing Partial Response signalling and Maximum Likelihood Detection. Partial response signalling schemes use controlled inter-symbol interference to increase recording density at the expense of a multi-level read back waveform which results in an increased noise penalty. Maximum Likelihood Sequence detection employs soft decisions on the readback waveform to recover this loss. The associated modulation coding techniques required for optimised operation of such a system are discussed. Two-dimensional run-length-limited (d, ky) modulation codes provide a further means of increasing storage capacity in multi-track recording systems. For example the code rate of a single track run length-limited code with constraints (1, 3), such as Miller code, can be increased by over 25% when using a 4-track two-dimensional code with the same d constraint and with the k constraint satisfied across a number of parallel channels. The k constraint along an individual track, kx, can be increased without loss of clock synchronisation since the clocking information derived by frequent signal transitions can be sub-divided across a number of, y, parallel tracks in terms of a ky constraint. This permits more code words to be generated for a given (d, k) constraint in two dimensions than is possible in one dimension. This coding technique is furthered by development of a reverse enumeration scheme based on the trellis description of the (d, ky) constraints. The application of a two-dimensional code to a high linear density system employing extended class IV partial response signalling and maximum likelihood detection is proposed. Finally, additional coding constraints to improve spectral response and error performance are discussed.Hewlett Packard, Computer Peripherals Division (Bristol

    Circuit design and analysis for on-FPGA communication systems

    No full text
    On-chip communication system has emerged as a prominently important subject in Very-Large- Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects. Interconnects often dictates the system performance, and, therefore, research for new methodologies and system architectures that deliver high-performance communication services across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication. Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable fabrics, switches and the specific routing architecture also introduce additional latency and bandwidth degradation further hindering intra-chip communication performance. Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs. Communication with programmable interconnect received little attention and is inadequately understood. This thesis is among the first to research on-chip communication systems that are built on top of programmable fabrics and proposes methodologies to maximize the interconnect throughput performance. There are three major contributions in this thesis: (i) an analysis of on-chip interconnect fringing, which degrades the bandwidth of communication channels due to routing congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly improves the interconnect throughput by exploiting the fundamental electrical characteristics of the reconfigurable interconnect structures. This new scheme can potentially mitigate the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime optimization for route planning and dynamic routing which, effectively utilizes the in-silicon bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new methodologies and concepts are proposed to enhance the on-FPGA communication throughput performance that is of vital importance in new technology processes

    SIMULATION OF MANYCORE ARCHITECTURES ON MULTICORE HOSTS

    Get PDF
    Computer architects heavily rely on software simulation to evaluate new and existing processor designs. As target designs become more complex, a growing gap has emerged between single-threaded simulator performance and simulation requirements. Even though modern machines feature multiple cores, most host cores are typically unused or underutilized by state-of-the-art simulators. Parallel simulators are inherently limited by their need to synchronize threads for correctness. In my thesis, I study accurate and efficient parallelization techniques for architecture simulation. This thesis contains several contributions. First, I study synchronization between simulator threads simulating homogeneous hardware structures such as cores or network tiles. Based on this study, I introduce a new synchronization policy, weighted-tuple synchronization, and show that it provides a better performance-accuracy trade-off compared to synchronization currently used by state-of-the-art parallel simulators. Next, I study synchronization between separate simulators responsible for modeling heterogeneous components and introduce reciprocal abstraction. Reciprocal abstraction allows asynchronous simulators to exchange information at runtime for more accurate event timing. Lastly, the reciprocal abstraction model relaxes communication latency restrictions and synchronization requirements; I show how relaxed synchronization requirements allows for coprocessor acceleration

    Parallel-sampling ADC architecture for power-efficient broadband multi-carrier systems

    Get PDF

    Design Techniques for High Performance Wireline Communication and Security Systems

    Full text link
    As the amount of data traffic grows exponentially on the internet, towards thousands of exabytes by 2020, high performance and high efficiency communication and security solutions are constantly in high demand, calling for innovative solutions. Within server communication dominates todays network data transfer, outweighing between-server and server-to-user data transfer by an order of magnitude. Solutions for within-server communication tend to be very wideband, i.e. on the order of tens of gigahertz, equalizers are widely deployed to provide extended bandwidth at reasonable cost. However, using equalizers typically costs the available signal-to-noise ratio (SNR) at the receiver side. What is worse is that the SNR available at the channel becomes worse as data rate increases, making it harder to meet the tight constraint on error rate, delay, and power consumption. In this thesis, two equalization solutions that address optimal equalizer implementations are discussed. One is a low-power high-speed maximum likelihood sequence detection (MLSD) that achieves record energy efficiency, below 10 pico-Joule per bit. The other one is a phase-shaping equalizer design that suppresses inter-symbol interference at almost zero cost of SNR. The growing amount of communication use also challenges the design of security subsystems, and the emerging need for post-quantum security adds to the difficulties. Most of currently deployed cryptographic primitives rely on the hardness of discrete logarithms that could potentially be solved efficiently with a powerful enough quantum computer. Efficient post-quantum encryption solutions have become of substantial value. In this thesis a fast and efficient lattice encryption application-specific integrated circuit is presented that surpasses the energy efficiency of embedded processors by 4 orders of magnitude.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146092/1/shisong_1.pd

    Energy-efficient analog-to-digital conversion for ultra-wideband radio

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 207-222).In energy constrained signal processing and communication systems, a focus on the analog or digital circuits in isolation cannot achieve the minimum power consumption. Furthermore, in advanced technologies with significant variation, yield is traditionally achieved only through conservative design and a sacrifice of energy efficiency. In this thesis, these limitations are addressed with both a comprehensive mixed-signal design methodology and new circuits and architectures, as presented in the context of an analog-to-digital converter (ADC) for ultra-wideband (UWB) radio. UWB is an emerging technology capable of high-data-rate wireless communication and precise locationing, and it requires high-speed (>500MS/s), low-resolution ADCs. The successive approximation register (SAR) topology exhibits significantly reduced complexity compared to the traditional flash architecture. Three time-interleaved SAR ADCs have been implemented. At the mixed-signal optimum energy point, parallelism and reduced voltage supplies provide more than 3x energy savings. Custom control logic, a new capacitive DAC, and a hierarchical sampling network enable the high-speed operation. Finally, only a small amount of redundancy, with negligible power penalty, dramatically improves the yield of the highly parallel ADC in deep sub-micron CMOS.by Brian P. Ginsburg.Ph.D

    Source-synchronous I/O Links using Adaptive Interface Training for High Bandwidth Applications

    Get PDF
    Mobility is the key to the global business which requires people to be always connected to a central server. With the exponential increase in smart phones, tablets, laptops, mobile traffic will soon reach in the range of Exabytes per month by 2018. Applications like video streaming, on-demand-video, online gaming, social media applications will further increase the traffic load. Future application scenarios, such as Smart Cities, Industry 4.0, Machine-to-Machine (M2M) communications bring the concepts of Internet of Things (IoT) which requires high-speed low power communication infrastructures. Scientific applications, such as space exploration, oil exploration also require computing speed in the range of Exaflops/s by 2018 which means TB/s bandwidth at each memory node. To achieve such bandwidth, Input/Output (I/O) link speed between two devices needs to be increased to GB/s. The data at high speed between devices can be transferred serially using complex Clock-Data-Recovery (CDR) I/O links or parallely using simple source-synchronous I/O links. Even though CDR is more efficient than the source-synchronous method for single I/O link, but to achieve TB/s bandwidth from a single device, additional I/O links will be required and the source-synchronous method will be more advantageous in terms of area and power requirements as additional I/O links do not require extra hardware resources. At high speed, there are several non-idealities (Supply noise, crosstalk, Inter- Symbol-Interference (ISI), etc.) which create unwanted skew problem among parallel source-synchronous I/O links. To solve these problems, adaptive trainings are used in time domain to synchronize parallel source-synchronous I/O links irrespective of these non-idealities. In this thesis, two novel adaptive training architectures for source-synchronous I/O links are discussed which require significantly less silicon area and power in comparison to state-of-the-art architectures. First novel adaptive architecture is based on the unit delay concept to synchronize two parallel clocks by adjusting the phase of one clock in only one direction. Second novel adaptive architecture concept consists of Phase Interpolator (PI)-based Phase Locked Loop (PLL) which can adjust the phase in both direction and achieve faster synchronization at the expense of added complexity. With an increase in parallel I/O links, clock skew which is generated by the improper clock tree, also affects the timing margin. Incorrect duty cycle further reduces the timing margin mainly in Double Data Rate (DDR) systems which are generally used to increase the bandwidth of a high-speed communication system. To solve clock skew and duty cycle problems, a novel clock tree buffering algorithm and a novel duty cycle corrector are described which further reduce the power consumption of a source-synchronous system

    Inter-module Interfacing techniques for SoCs with multiple clock domains to address challenges in modern deep sub-micron technologies

    Get PDF
    Miniaturization of integrated circuits (ICs) due to the improvement in lithographic techniques in modem deep sub-micron (DSM) technologies allows several complex processing elements to coexist in one IC, which are called System-on-Chip. As a first contribution, this thesis quantitatively analyzes the severity of timing constraints associated with Clock Distribution Network (CDN) in modem DSM technologies and shows that different processing elements may work in different dock domains to alleviate these constraints. Such systems are known as Globally Asynchronous Locally Synchronous (GALS) systems. It is imperative that different processing elements of a GALS system need to communicate with each other through some interfacing technique, and these interfaces can be asynchronous or synchronous. Conventionally, the asynchronous interfaces are described at the Register Transfer Logic (RTL) or system level. Such designs are susceptible to certain design constraints that cannot be addressed at higher abstraction levels; crosstalk glitch is one such constraint. This thesis initially identifies, using an analytical model, the possibility of asynchronous interface malfunction due to crosstalk glitch propagation. Next, we characterize crosstalk glitch propagation under normal operating conditions for two different classes of asynchronous protocols, namely bundled data protocol based and delay insensitive asynchronous designs. Subsequently, we propose a logic abstraction level modeling technique, which provides a framework to the designer to verify the asynchronous protocols against crosstalk glitches. The utility of this modeling technique is demonstrated experimentally on a Xilinx Virtex-II Pro FPGA. Furthermore, a novel methodology is proposed to quench such crosstalk glitch propagation through gating the asynchronous interface from sending the signal during potential glitch vulnerable instances. This methodology is termed as crosstalk glitch gating. This technique is successfully applied to obtain crosstalk glitch quenching in the representative interfaces. This thesis also addresses the dock skew challenges faced by high-performance synchronous interfacing methodologies in modem DSM technologies. The proposed methodology allows communicating modules to run at a frequency that is independent of the dock skew. Leveraging a novel clock-scheduling algorithm, our technique permits a faster module to communicate safely with a slower module without slowing down. Safe data communications for mesochronous schemes and for the cases when communicating modules have dock frequency ratios of integer or coprime numbers are theoretically explained and experimentally demonstrated. A clock-scheduling technique to dynamically accommodate phase variations is also proposed. These methods are implemented to the Xilinx Virtex II Pro technology. Experiments prove that the proposed interfacing scheme allows modules to communicate data safely, for mesochronous schemes, at 350 MHz, which is the limit of the technology used, under a dock skew of more than twice the time period (i.e. a dock skew of 12 ns
    corecore