Search CORE

492 research outputs found

Robust and efficient video/image transmission

Author: Zhang Xi Min
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2003
Field of study

The Internet has become a primary medium for information transmission. The unreliability of channel conditions, limited channel bandwidth and explosive growth of information transmission requests, however, hinder its further development. Hence, research on robust and efficient delivery of video/image content is demanding nowadays. Three aspects of this task, error burst correction, efficient rate allocation and random error protection are investigated in this dissertation. A novel technique, called successive packing, is proposed for combating multi-dimensional (M-D) bursts of errors. A new concept of basis interleaving array is introduced. By combining different basis arrays, effective M-D interleaving can be realized. It has been shown that this algorithm can be implemented only once and yet optimal for a set of error bursts having different sizes for a given two-dimensional (2-D) array. To adapt to variable channel conditions, a novel rate allocation technique is proposed for FineGranular Scalability (FGS) coded video, in which real data based rate-distortion modeling is developed, constant quality constraint is adopted and sliding window approach is proposed to adapt to the variable channel conditions. By using the proposed technique, constant quality is realized among frames by solving a set of linear functions. Thus, significant computational simplification is achieved compared with the state-of-the-art techniques. The reduction of the overall distortion is obtained at the same time. To combat the random error during the transmission, an unequal error protection (UEP) method and a robust error-concealment strategy are proposed for scalable coded video bitstreams

Digital Commons @ New Jersey Institute of Technology (NJIT)

High performance dense linear algebra on a spatially distributed processor

Author: Behnam Robatmili
Doug Burger
Jeff Diamond
Kazushige Goto
Robert van de Geijn
Stephen W. Keckler
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

As technology trends have limited the performance scaling of conventional processors, industry and academic research has turned to parallel architectures on a single chip, including distributed uniprocessors and multicore chips. This paper examines how to extend the archtypical operation of dense linear algebra, matrix multiply, to an emerging class of uniprocessor architectures characterized by a large number of independent functional units, register banks, and cache banks connected by a 2-D on-chip network. We extend the well known algorithm for matrix multiplication by Goto to this spatially distributed class of uniprocessor and describe the optimizations of the innermost kernel, a systolic-like algorithm running on a general purpose uniprocessor. The resulting implementation yields the first demonstration of high-performance in an application executing on the TRIPS processor hardware, a next-generation distributed processor core. We show that such processors are indeed capable of substantial improvements in single threaded performance provided their spatial topography is taken into account

CiteSeerX

Crossref

Near-Instantaneously Adaptive HSDPA-Style OFDM Versus MC-CDMA Transceivers for WIFI, WIMAX, and Next-Generation Cellular Systems

Author: Choi B.J.
Hanzo Lajos
Publication venue
Publication date: 01/12/2007
Field of study

Burts-by-burst (BbB) adaptive high-speed downlink packet access (HSDPA) style multicarrier systems are reviewed, identifying their most critical design aspects. These systems exhibit numerous attractive features, rendering them eminently eligible for employment in next-generation wireless systems. It is argued that BbB-adaptive or symbol-by-symbol adaptive orthogonal frequency division multiplex (OFDM) modems counteract the near instantaneous channel quality variations and hence attain an increased throughput or robustness in comparison to their fixed-mode counterparts. Although they act quite differently, various diversity techniques, such as Rake receivers and space-time block coding (STBC) are also capable of mitigating the channel quality variations in their effort to reduce the bit error ratio (BER), provided that the individual antenna elements experience independent fading. By contrast, in the presence of correlated fading imposed by shadowing or time-variant multiuser interference, the benefits of space-time coding erode and it is unrealistic to expect that a fixed-mode space-time coded system remains capable of maintaining a near-constant BER

Southampton (e-Prints Soton)

Threshold Error Penalty for Fault Tolerant Computation with Nearest Neighbour Communication

Author: Boykin P. O.
Fan H.
Fong B.
Gyure M.
Roychowdhury V.
Simms G.
Szkopek T.
Yablonovitch E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/09/2005
Field of study

The error threshold for fault tolerant quantum computation with concatenated encoding of qubits is penalized by internal communication overhead. Many quantum computation proposals rely on nearest-neighbour communication, which requires excess gate operations. For a qubit stripe with a width of L+1 physical qubits implementing L levels of concatenation, we find that the error threshold of 2.1x10^-5 without any communication burden is reduced to 1.2x10^-7 when gate errors are the dominant source of error. This ~175X penalty in error threshold translates to an ~13X penalty in the amplitude and timing of gate operation control pulses.Comment: minor correctio

arXiv.org e-Print Archive

Crossref

Recommended from our members

Compiling Communication-Minimizing Query Plans

Author: Love Eric J
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Because of the low arithmetic intensity of relational database operators, the performance of in-memory column stores ought to be bound by main-memory bandwidth, and in practice, highly-optimized operator implementations already achieve close to their peak theoretical performance. By itself, this would imply that hardware acceleration for analytics would be of limited utility, but I show that the emergence of full-query compilation presents new opportunities to reduce memory traffic and trade computation for communication, meaning that database-oriented processors may yet be worth designing.Moreover, the communication costs of queries on a given processor and memory hierarchy are determined by factors below the level of abstraction expressed in traditional query plans, such as how operators are (or are not) fused together, how execution is parallelized and cache-blocked, and how intermediate results are arranged in memory. I present a Scala- embedded programming language called Ressort that exposes these machine-level aspects of query compilation, and which emits parallel C++/OpenMP code as its target to express a greater range of algorithmic variants for each query than would be easy to study by hand

eScholarship - University of California

Near-capacity MIMOs using iterative detection

Author: El-Hajjar Mohammed H.
Publication venue
Publication date: 28/09/2008
Field of study

In this thesis, Multiple-Input Multiple-Output (MIMO) techniques designed for transmission over narrowband Rayleigh fading channels are investigated. Specifically, in order to providea diversity gain while eliminating the complexity of MIMO channel estimation, a Differential Space-Time Spreading (DSTS) scheme is designed that employs non-coherent detection. Additionally, in order to maximise the coding advantage of DSTS, it is combined with Sphere Packing (SP) modulation. The related capacity analysis shows that the DSTS-SP scheme exhibits a higher capacity than its counterpart dispensing with SP. Furthermore, in order to attain additional performance gains, the DSTS system invokes iterative detection, where the outer code is constituted by a Recursive Systematic Convolutional (RSC) code, while the inner code is a SP demapper in one of the prototype systems investigated, while the other scheme employs a Unity Rate Code (URC) as its inner code in order to eliminate the error floor exhibited by the system dispensing with URC. EXIT charts are used to analyse the convergence behaviour of the iteratively detected schemes and a novel technique is proposed for computing the maximum achievable rate of the system based on EXIT charts. Explicitly, the four-antenna-aided DSTSSP system employing no URC precoding attains a coding gain of 12 dB at a BER of 10-5 and performs within 1.82 dB from the maximum achievable rate limit. By contrast, the URC aidedprecoded system operates within 0.92 dB from the same limit.On the other hand, in order to maximise the DSTS system’s throughput, an adaptive DSTSSP scheme is proposed that exploits the advantages of differential encoding, iterative decoding as well as SP modulation. The achievable integrity and bit rate enhancements of the system are determined by the following factors: the specific MIMO configuration used for transmitting data from the four antennas, the spreading factor used and the RSC encoder’s code rate.Additionally, multi-functional MIMO techniques are designed to provide diversity gains, multiplexing gains and beamforming gains by combining the benefits of space-time codes, VBLASTand beamforming. First, a system employing Nt=4 transmit Antenna Arrays (AA) with LAA number of elements per AA and Nr=4 receive antennas is proposed, which is referred to as a Layered Steered Space-Time Code (LSSTC). Three iteratively detected near-capacity LSSTC-SP receiver structures are proposed, which differ in the number of inner iterations employed between the inner decoder and the SP demapper as well as in the choice of the outer code, which is either an RSC code or an Irregular Convolutional Code (IrCC). The three systems are capable of operating within 0.9, 0.4 and 0.6 dB from the maximum achievable rate limit of the system. A comparison between the three iteratively-detected schemes reveals that a carefully designed two-stage iterative detection scheme is capable of operating sufficiently close to capacity at a lower complexity, when compared to a three-stage system employing a RSC or a two-stage system using an IrCC as an outer code. On the other hand, in order to allow the LSSTC scheme to employ less receive antennas than transmit antennas, while still accommodating multiple users, a Layered Steered Space-Time Spreading (LSSTS) scheme is proposed that combines the benefits of space-time spreading, V-BLAST, beamforming and generalised MC DS-CDMA. Furthermore, iteratively detected LSSTS schemes are presented and an LLR post-processing technique is proposed in order to improve the attainable performance of the iteratively detected LSSTS system.Finally, a distributed turbo coding scheme is proposed that combines the benefits of turbo coding and cooperative communication, where iterative detection is employed by exchanging extrinsic information between the decoders of different single-antenna-aided users. Specifically, the effect of the errors induced in the first phase of cooperation, where the two users exchange their data, on the performance of the uplink in studied, while considering different fading channel characteristics

Southampton (e-Prints Soton)

Domain specific high performance reconfigurable architecture for a communication platform

Author: Ahmed Imran
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

Memory Access Patterns for Cellular Automata Using GPGPUs

Author: Balasalle James Michael
Publication venue: Digital Commons @ DU
Publication date: 01/01/2011
Field of study

Today\u27s graphical processing units have hundreds of individual processing cores that can be used for general purpose computation of mathematical and scientific problems. Due to their hardware architecture, these devices are especially effective when solving problems that exhibit a high degree of spatial locality. Cellular automata use small, local neighborhoods to determine successive states of individual elements and therefore, provide an excellent opportunity for the application of general purpose GPU computing. However, the GPU presents a challenging environment because it lacks many of the features of traditional CPUs, such as automatic, on-chip caching of data. To fully realize the potential of a GPU, specialized memory techniques and patterns must be employed to account for their unique architecture. Several techniques are presented which not only dramatically improve performance, but, in many cases, also simplify implementation. Many of the approaches discussed relate to the organization of data in memory or patterns for accessing that data, while others detail methods of increasing the computation to memory access ratio. The ideas presented are generic, and applicable to cellular automata models as a whole. Example implementations are given for several problems, including the Game of Life and Gaussian blurring, while performance characteristics, such as instruction and memory accesses counts, are analyzed and compared. A case study is detailed, showing the effectiveness of the various techniques when applied to a larger, real-world problem. Lastly, the reasoning behind each of the improvements is explained, providing general guidelines for determining when a given technique will be most and least effective

University of Denver