Search CORE

5,934 research outputs found

High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

Author: Akoglu Ali
Benkrid Khaled
Ling Cheng
Liu Ying
Song Yang
Tian Xiang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

Efficiency Resource Allocation for Device-to-Device Underlay Communication Systems: A Reverse Iterative Combinatorial Auction Based Approach

Author: Cheng Xiang
Han Zhu
Jiao Bingli
Song Lingyang
Wang Xiaoli
Xu Chen
Zhao Qun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/11/2012
Field of study

Peer-to-peer communication has been recently considered as a popular issue for local area services. An innovative resource allocation scheme is proposed to improve the performance of mobile peer-to-peer, i.e., device-to-device (D2D), communications as an underlay in the downlink (DL) cellular networks. To optimize the system sum rate over the resource sharing of both D2D and cellular modes, we introduce a reverse iterative combinatorial auction as the allocation mechanism. In the auction, all the spectrum resources are considered as a set of resource units, which as bidders compete to obtain business while the packages of the D2D pairs are auctioned off as goods in each auction round. We first formulate the valuation of each resource unit, as a basis of the proposed auction. And then a detailed non-monotonic descending price auction algorithm is explained depending on the utility function that accounts for the channel gain from D2D and the costs for the system. Further, we prove that the proposed auction-based scheme is cheat-proof, and converges in a finite number of iteration rounds. We explain non-monotonicity in the price update process and show lower complexity compared to a traditional combinatorial allocation. The simulation results demonstrate that the algorithm efficiently leads to a good performance on the system sum rate.Comment: 26 pages, 6 fgures; IEEE Journals on Selected Areas in Communications, 201

arXiv.org e-Print Archive

Crossref

University of Houston Institutional Repository (UHIR)

Sectorization and Configuration Transition in Airspace Design

Author: Bang An
Jingyan Song
Peng Cheng
Xiang Zou
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Linear attention is (maybe) all you need (to understand transformer optimization)

Author: Ahn Kwangjun
Cheng Xiang
Jadbabaie Ali
Song Minhak
Sra Suvrit
Yun Chulhee
Publication venue
Publication date: 13/03/2024
Field of study

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training Transformers by carefully studying a simple yet canonical linearized shallow Transformer model. Specifically, we train linear Transformers to solve regression tasks, inspired by J.~von Oswald et al.~(ICML 2023), and K.~Ahn et al.~(NeurIPS 2023). Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of Transformer training dynamics. Consequently, the results obtained in this paper suggest that a simple linearized Transformer model could actually be a valuable, realistic abstraction for understanding Transformer optimization.Comment: Published at ICLR 202

arXiv.org e-Print Archive