Search CORE

13,656 research outputs found

Hybrid analog-digital transmit beamforming for spectrum sharing backhaul networks

Author: Blanco Botana Luis
Pérez Neira Ana Isabel
Vázquez Oliver Miguel Ángel
Publication venue
Publication date: 01/01/2018
Field of study

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper deals with the problem of analog-digital transmit beamforming under spectrum sharing constraints for backhaul systems. In contrast to fully digital designs, where the spatial processing is done at baseband unit with all the flexible computational resources of digital processors, analog-digital beamforming schemes require that certain processing is done through analog components, such as phase-shifters or switches. These analog components do not have the same processing flexibility as the digital processor, but on the other hand, they can substantially reduce the cost and complexity of the beamforming solution. This paper presents the joint optimization of the analog and digital parts, which results in a nonconvex, NP-hard, and coupled problem. In order to solve it, an alternating optimization with a penalized convex-concave method is proposed. According to the simulation results, this novel iterative procedure is able to find a solution that behaves close to the fully digital beamforming upper bound scheme.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recommended from our members

Executing matrix multiply on a process oriented data flow machine

Author: Bic Lubomir
Nagel Mark D.
Roy John M.A.
Publication venue: eScholarship, University of California
Publication date: 01/01/1990
Field of study

The Process-Oriented Dataflow System (PODS) is an execution model that combines the von Neumann and dataflow models of computation to gain the benefits of each. Central to PODS is the concept of array distribution and its effects on partitioning and mapping of processes.In PODS arrays are partitioned by simply assigning consecutive elements to each processing element (PE) equally. Since PODS uses single assignment, there will be only one producer of each element. This producing PE owns that element and will perform the necessary computations to assign it. Using this approach the filling loop is distributed across the PEs. This simple partitioning and mapping scheme provides excellent results for executing scientific code on MIMD machines. In this way PODS allows MIMD machines to exploit vector and data parallelism easily while still providing the flexibility of MIMD over SIMD for multi-user systems.In this paper, the classic matrix multiply algorithm, with 1024 data points, is executed on a PODS simulator and the results are presented and discussed. Matrix multiply is a good example because it has several interesting properties: there are multiple code-blocks; a new array must be dynamically allocated and distributed; there is a loop-carried dependency in the innermost loop; the two input arrays have different access patterns; and the sizes of the input arrays are not known at compile time. Matrix multiply also forms the basis for many important scientific algorithms such as: LU decomposition, convolution, and the Fast-Fourier Transform.The results show that PODS is comparable to both Iannucci's Hybrid Architecture and MIT's TTDA in terms of overhead and instruction power. They also show that PODS easily distributes the work load evenly across the PEs. The key result is that PODS can scale matrix multiply in a near linear fashion until there is little or no work to be performed for each PE. Then overhead and message passing become a major component of the execution time. With larger problems (e.g., >/=16k data points) this limit would be reached at around 256 PEs

eScholarship - University of California

Flexible compiler-managed L0 buffers for clustered VLIW processors

Author: Gibert Codina Enric
González Colás Antonio María
Sánchez Navarro F. Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Wire delays are a major concern for current and forthcoming processors. One approach to attack this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the data cache remains centralized. However, as technology evolves, the latency of such a centralized cache increase leading to an important performance impact. In this paper, we propose to include flexible low-latency buffers in each cluster in order to reduce the performance impact of higher cache latencies. The reduced number of entries in each buffer permits the design of flexible ways to map data from L1 to these buffers. The proposed L0 buffers are managed by the compiler, which is responsible to decide which memory instructions make us of them. Effective instruction scheduling techniques are proposed to generate code that exploits these buffers. Results for the Mediabench benchmark suite show that the performance of a clustered VLIW processor with a unified L1 data cache is improved by 16% when such buffers are used. In addition, the proposed architecture also shows significant advantages over both MultiVLIW processors and clustered processors with a word-interleaved cache, two state-of-the-art designs with a distributed L1 data cache.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Distributed data cache designs for clustered VLIW processors

Author: Gibert Codina Enric
González Colás Antonio María
Sánchez Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Wire delays are a major concern for current and forthcoming processors. One approach to deal with this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the L1 data cache typically remains centralized in What we call partially distributed architectures. However, as technology evolves, the relative latency of such a centralized cache will increase, leading to an important impact on performance. In this paper, we propose partitioning the L1 data cache among clusters for clustered VLIW processors. We refer to this kind of design as fully distributed processors. In particular; we propose and evaluate three different configurations: a snoop-based cache coherence scheme, a word-interleaved cache, and flexible LO-buffers managed by the compiler. For each alternative, instruction scheduling techniques targeted to cyclic code are developed. Results for the Mediabench suite'show that the performance of such fully distributed architectures is always better than the performance of a partially distributed one with the same amount of resources. In addition, the key aspects of each fully distributed configuration are explored.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives

Author: Kim Sungho
Lee Sang-Won
Park Sanghyun
Roh Hongchan
Shin Mincheol
Publication venue
Publication date: 01/01/2011
Field of study

Previous research addressed the potential problems of the hard-disk oriented design of DBMSs of flashSSDs. In this paper, we focus on exploiting potential benefits of flashSSDs. First, we examine the internal parallelism issues of flashSSDs by conducting benchmarks to various flashSSDs. Then, we suggest algorithm-design principles in order to best benefit from the internal parallelism. We present a new I/O request concept, called psync I/O that can exploit the internal parallelism of flashSSDs in a single process. Based on these ideas, we introduce B+-tree optimization methods in order to utilize internal parallelism. By integrating the results of these methods, we present a B+-tree variant, PIO B-tree. We confirmed that each optimization method substantially enhances the index performance. Consequently, PIO B-tree enhanced B+-tree's insert performance by a factor of up to 16.3, while improving point-search performance by a factor of 1.2. The range search of PIO B-tree was up to 5 times faster than that of the B+-tree. Moreover, PIO B-tree outperformed other flash-aware indexes in various synthetic workloads. We also confirmed that PIO B-tree outperforms B+-tree in index traces collected inside the Postgresql DBMS with TPC-C benchmark.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Large-scale multilayer architecture of single-atom arrays with individual addressability

Author: Birkl Gerhard
de Mello Daniel Ohl
Hambach Moritz
Schlosser Malte
Schäffner Dominik
Tichelmann Sascha
Publication venue
Publication date: 21/05/2019
Field of study

We report on the realization of large-scale 3D multilayer configurations of planar arrays of individual neutral atoms with immediate applications in quantum science and technology: a microlens-generated Talbot optical lattice In this novel platform, the single-beam illumination of a microlens array constitutes a structurally robust and wavelength-universal method for the realization of 3D atom arrays with favourable scaling properties due to the inherent self-imaging of the focal structure. Thus, 3D scaling comes without the requirement of extra resources. We demonstrate the trapping and imaging of individual rubidium atoms and the in-plane assembly of defect-free single-atom arrays in several Talbot planes. We present interleaved lattices with dynamic position control and parallelized sub-lattice addressing of spin states

arXiv.org e-Print Archive

Empirical Comparison of Chirp and Multitones on Experimental UWB Software Defined Radar Prototype

Author: Denoulet J.
Garda P.
Le Kernec J.
Romain O.
Publication venue
Publication date: 01/01/2012
Field of study

This paper proposes and tests an approach for an unbiased study of radar waveforms' performances. Using the ultrawide band software defined radar prototype, the performances of Chirp and Multitones are compared in range profile and detection range. The architecture was implemented and has performances comparable to the state of the art in software defined radar prototypes. The experimental results are consistent with the simulations

HAL Descartes

Enlighten

Hal-Diderot