Search CORE

11 research outputs found

VLSI-sorting evaluated under the linear model

Author: Schröder H
Publication venue: Published by Elsevier Inc.
Publication date: 31/12/1988
Field of study

AbstractThere are several different models of computation used on which to base evaluations of VLSI sorting algorithms and there are different measures of complexity. This paper revises complexity results under the linear model that have been gained under the constant model. This approach is due to expected technological development (see Mangir, 1983; Thompson and Raghavan, 1984; Vitanyi, 1984a, 1984b).For the constant model we know that for medium sized keys there are AT2and AP2 optimal sorting algorithms with T ranging from ω(log n) to O(√nk) and P ranging from Ω(1) to O(√nk) (Bilardi, 1984). The main results of asymptotic analysis of sorting algorithms under the linear model are that the lower bounds allow AT2 optimal sorting algorithms only for T = Θ(√nk) but allow AP2 algorithms in the same range as under the constant model. Furthermore the sorting algorithms presented in this paper meet these lower bounds. This proves that these bounds cannot be improved for k = Θ (log n). The building block for the realization of these sorting algorithms is a comparison exchange module that compares r × s bit matrices in time TC = Θ(r + s) on an area AC = Θ(r2) (not including the storage area for the keys).For problem sizes that exceed realistic chip capacities, chip-external sorting algorithms can be used. In this paper two different chip-external sorting algorithms (BBB(S) and TWB(S)) are presented. They are designed to be implemented on a single board. They use a sorting chip S to perform the sort-split operation on blocks of data BBB(S) and TWB(S) are systolic algorithms using local communication only so that their evaluation does not depend on whether the constant or the linear model is used. Furthermore it seems obvious that their design is technically feasible whenever the sorting chip S is technically feasible.TWB has optimal asymptotic time complexity, so its existence proves that under the linear model external sorting can be done asymptotically as fast as under the constant model. The time complexity of TWB(S) is linearly dependent on the speed gs = nsts. It is shown that the speed if looked at as a function of the chip capacity C is asymptotically maximal for AT2 optimal sorting algorithms. Thus S should be a sorting algorithm similar to the M-M-sorter presented in this paper. A major disadvantage of TWB(S) is that it cannot exploit the maximal throughput ds = ns/ps of a systolic sorting algorithm S.Therefore algorithm BBB(S) is introduced. The time complexity of BBB(S) is linearly dependent on ds. It is shown that the throughput is maximal for AP2 optimal algorithms. There is a wide range of such sorting algorithms including algorithms that can be realized in a way that is independent of the length of the keys. For example, BBB(S) with S being a highly parallel version of odd-even transposition sort has this kind of flexibility. A disadvantage of BBB(S) is that it is asymptotically slower than TWB(S)

Elsevier - Publisher Connector

VHDL Design of a Scalable VLSI Sorting Device Based on Pipelined Computation

Author: Enzo Mumolo
Publication venue: 'University of Zagreb - University Computing Centre'
Publication date: 01/01/2004
Field of study

This paper describes the VHDL design of a sorting algorithm, aiming at defining an elementary sorting unit as a building block of VLSI devices which require a huge number of sorting units. As such, care was taken to reach a reasonable low value of the area-time parameter. A sorting VLSI device, in fact, can be built as a cascade of elementary sorting units which process the input stream in a pipeline fashion: as the processing goes on, a wave of sorted numbers propagates towards the output ports. The paper describes the design starting from an initial theoretical analysis of the algorithm\u27s complexity to a VHDL behavioural analysis of the proposed architecture to a structural synthesis of a sorting block based on the Alliance tools to, finally, a silicon synthesis which was worked out again using Alliance. Two points in the proposed design are particularly noteworthy. First, the sorting architecture is suitable for treating a continuous stream of input data rather than a block of data as in many other designs. Secondly, the proposed design reaches a reasonable compromise between area and time, as it yields an A T product which compares favourably with the theoretical lower bound

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Recommended from our members

The analysis and synthesis of a parallel sorting engine

Author: Ahn Byoungchul
Publication venue: 'Oregon State University'
Publication date
Field of study

This thesis is concerned with the development of a unique parallel sort-merge system suitable for implementation in VLSI. Two new sorting subsystems, a high performance VLSI sorter and a four-way merger, were also realized during the development process. In addition, the analysis of several existing parallel sorting architectures and algorithms was carried out. Algorithmic time complexity, VLSI processor performance, and chip area requirements for the existing sorting systems were evaluated. The rebound sorting algorithm was determined to be the most efficient among those considered. The rebound sorter algorithm was implemented in hardware as a systolic array with external expansion capability. The second phase of the research involved analyzing several parallel merge algorithms and their buffer management schemes. The dominant considerations for this phase of the research were the achievement of minimum VLSI chip area, design complexity, and logic delay. It was determined that the proposed merger architecture could be implemented in several ways. Selecting the appropriate microarchitecture for the merger, given the constraints of chip area and performance, was the major problem. The tradeoffs associated with this process are outlined. Finally, a pipelined sort-merge system was implemented in VLSI by combining a rebound sorter and a four-way merger on a single chip. The final chip size was 416 mils by 432 mils. Two micron CMOS technology was utilized in this chip realization. An overall throughput rate of 10M bytes/sec was achieved. The prototype system developed is capable of sorting thirty two 2-byte keys during each merge phase. If extended, this system is capable of economically sorting files of 100M bytes or more in size. In order to sort larger files, this design should be incorporated in a disk-based sort-merge system. A simplified disk I/O access model for such a system was studied. In this study the sort-merge system was assumed to be part of a disk controller subsystem

ScholarsArchive@OSU

On the synthesis and processing of high quality audio signals by parallel computers

Author: Bailey Nicholas James
How To Cite
Nicholas James Bailey
Publication venue
Publication date: 01/01/1991
Field of study

This work concerns the application of new computer architectures to the creation and manipulation of high-quality audio bandwidth signals. The configuration of both the hardware and software in such systems falls under consideration in the three major sections which present increasing levels of algorithmic concurrency. In the first section, the programs which are described are distributed in identical copies across an array of processing elements; these programs run autonomously, generating data independently, but with control parameters peculiar to each copy: this type of concurrency is referred to as isonomic}The central section presents a structure which distributes tasks across an arbitrary network of processors; the flow of control in such a program is quasi- indeterminate, and controlled on a demand basis by the rate of completion of the slave tasks and their irregular interaction with the master. Whilst that interaction is, in principle, deterministic, it is also data-dependent; the dynamic nature of task allocation demands that no a priori knowledge of the rate of task completion be required. This type of concurrency is called dianomic? Finally, an architecture is described which will support a very high level of algorithmic concurrency. The programs which make efficient use of such a machine are designed not by considering flow of control, but by considering flow of data. Each atomic algorithmic unit is made as simple as possible, which results in the extensive distribution of a program over very many processing elements. Programs designed by considering only the optimum data exchange routes are said to exhibit systolic^ concurrency. Often neglected in the study of system design are those provisions necessary for practical implementations. It was intended to provide users with useful application programs in fulfilment of this study; the target group is electroacoustic composers, who use digital signal processing techniques in the context of musical composition. Some of the algorithms in use in this field are highly complex, often requiring a quantity of processing for each sample which exceeds that currently available even from very powerful computers. Consequently, applications tend to operate not in 'real-time' (where the output of a system responds to its input apparently instantaneously), but by the manipulation of sounds recorded digitally on a mass storage device. The first two sections adopt existing, public-domain software, and seek to increase its speed of execution significantly by parallel techniques, with the minimum compromise of functionality and ease of use. Those chosen are the general- purpose direct synthesis program CSOUND, from M.I.T., and a stand-alone phase vocoder system from the C.D.P..(^4) In each case, the desired aim is achieved: to increase speed of execution by two orders of magnitude over the systems currently in use by composers. This requires substantial restructuring of the programs, and careful consideration of the best computer architectures on which they are to run concurrently. The third section examines the rationale behind the use of computers in music, and begins with the implementation of a sophisticated electronic musical instrument capable of a degree of expression at least equal to its acoustic counterparts. It seems that the flexible control of such an instrument demands a greater computing resource than the sound synthesis part. A machine has been constructed with the intention of enabling the 'gestural capture' of performance information in real-time; the structure of this computer, which has one hundred and sixty high-performance microprocessors running in parallel, is expounded; and the systolic programming techniques required to take advantage of such an array are illustrated in the Occam programming language

Durham e-Theses

CiteSeerX

Recommended from our members

Design Space Exploration of Accelerators for Warehouse Scale Computing

Author: Lottarini Andrea
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

With Moore’s law grinding to a halt, accelerators are one of the ways that new silicon can improve performance, and they are already a key component in modern datacenters. Accelerators are integrated circuits that implement parts of an application with the objective of higher energy efficiency compared to execution on a standard general purpose CPU. Many accelerators can target any particular workload, generally with a wide range of performance, and costs such as area or power. Exploring these design choices, called Design Space Exploration (DSE), is a crucial step in trying to find the most efficient accelerator design, the one that produces the largest reduction of the total cost of ownership. This work aims to improve this design space exploration phase for accelerators and to avoid pitfalls in the process. This dissertation supports the thesis that early design choices – including the level of specialization – are critical for accelerator development and therefore require benchmarks reflective of production workloads. We present three studies that support this thesis. First, we show how to benchmark datacenter applications by creating a benchmark for large video sharing infrastructures. Then, we present two studies focused on accelerators for analytical query processing. The first is an analysis on the impact of Network on Chip specialization while the second analyses the impact of the level of specialization. The first part of this dissertation introduces vbench: a video transcoding benchmark tailored to the growing video-as-a-service market. Video transcoding is not accurately represented in current computer architecture benchmarks such as SPEC or PARSEC. Despite posing a big computational burden for cloud video providers, such as YouTube and Facebook, it is not included in cloud benchmarks such as CloudSuite. Using vbench, we found that the microarchitectural profile of video transcoding is highly dependent on the input video, that SIMD extensions provide limited benefits, and that commercial hardware transcoders impose tradeoffs that are not ideal for cloud video providers. Our benchmark should spur architectural innovations for this critical workload. This work shows how to benchmark a real world warehouse scale application and the possible pitfalls in case of a mischaracterization. When considering accelerators for the different, but no less important, application of analytical query processing, design space exploration plays a critical role. We analyzed the Q100, a class of accelerators for this application domain, using TPC-H as the reference benchmark. We found that the hardware computational blocks have to be tailored to the requirements of the application, but also the Network on Chip (NoC) can be specialized. We developed an algorithm capable of producing more effective Q100 designs by tailoring the NoC to the communication requirements of the system. Our algorithm is capable of producing designs that are Pareto optimal compared to standard NoC topologies. This shows how NoC specialization is highly effective for accelerators and it should be an integral part of design space exploration for large accelerators’ designs. The third part of this dissertation analyzes the impact of the level of specialization, e.g. using an ASIC or Coarse Grain Reconfigurable Architecture (CGRA) implementation, on an accelerator performance. We developed a CGRA architecture capable of executing SQL query plans. We compare this architecture against Q100, an ASIC that targets the same class of workloads. Despite being less specialized, this programmable architecture shows comparable performance to the Q100 given an area and power budget. Resource usage explains this counterintuitive result, since a well programmed, homogeneous array of resources is able to more effectively harness silicon for the workload at hand. This suggests that a balanced accelerator research portfolio must include alternative programmable architectures – and their software stacks

Columbia University Academic Commons

Disseny microelectrnic de circuits discriminadors de polsos pel detector LHCb

Author: Gascón Fora David
Publication venue: Barcelona Univ.
Publication date: 01/01/2008
Field of study

The aim of this thesis is to present a solution for implementing the front end system of the Scintillator Pad Detector (SPD) of the calorimeter system of the LHCb experiment that will start in 2008 at the Large Hadron Collider (LHC) at CERN. The requirements of this specific system are discussed and an integrated solution is presented, both at system and circuit level. We also report some methodological achievements. In first place, a method to study the PSRR (and any transfer function) in fully differential circuits taking into account the effect of parameter mismatch is proposed. Concerning noise analysis, a method to study time variant circuits in the frequency domain is presented and justified. This would open the possibility to study the effect of 1/f noise in time variants circuits. In addition, it will be shown that the architecture developed for this system is a general solution for front ends in high luminosity experiments that must be operated with no dead time and must be robust against ballistic deficit

CERN Document Server

A bio-inspired computational model for motion detection

Author: Silva Ana Carolina Quintela Alves Vilares da
Publication venue
Publication date: 11/11/2015
Field of study

Tese de Doutoramento (Programa Doutoral em Engenharia Biomédica)Last years have witnessed a considerable interest in research dedicated to show that solutions to challenges in autonomous robot navigation can be found by taking inspiration from biology. Despite their small size and relatively simple nervous systems, insects have evolved vision systems able to perform the computations required for a safe navigation in dynamic and unstructured environments, by using simple, elegant and computationally efficient strategies. Thus, invertebrate neuroscience provides engineers with many neural circuit diagrams that can potentially be used to solve complicated engineering control problems. One major and yet unsolved problem encountered by visually guided robotic platforms is collision avoidance in complex, dynamic and inconstant light environments. In this dissertation, the main aim is to draw inspiration from recent and future findings on insect’s collision avoidance in dynamic environments and on visual strategies of light adaptation applied by diurnal insects, to develop a computationally efficient model for robotic control, able to work even in adverse light conditions. We first present a comparative analysis of three leading collision avoidance models based on a neural pathway responsible for signing collisions, the Lobula Giant Movement Detector/Desceding Contralateral Movement Detector (LGMD/DCMD), found in the locust visual system. Models are described, simulated and results are compared with biological data from literature. Due to the lack of information related to the way this collision detection neuron deals with dynamic environments, new visual stimuli were developed. Locusts Lo- custa Migratoria were stimulated with computer-generated discs that traveled along a combination of non-colliding and colliding trajectories, placed over a static and two distinct moving backgrounds, while simultaneously recording the DCMD activity extracellularly. Based on these results, an innovative model was developed. This model was tested in specially designed computer simulations, replicating the same visual conditions used for the biological recordings. The proposed model is shown to be sufficient to give rise to experimentally observed neural insect responses. Using a different approach, and based on recent findings, we present a direct approach to estimate potential collisions through a sequential computation of the image’s power spectra. This approach has been implemented in a real robotic platform, showing that distant dependent variations on image statistics are likely to be functional significant. Maintaining the collision detection performance at lower light levels is not a trivial task. Nevertheless, some insect visual systems have developed several strategies to help them to optimize visual performance over a wide range of light intensities. In this dissertation we address the neural adaptation mechanisms responsible to improve light capture on a day active insect, the bumblebee Bombus Terrestris. Behavioral analyses enabled us to investigate and infer about the spatial and temporal neural summation extent applied by those insects to improve image reliability at the different light levels. As future work, the collision avoidance model may be coupled with a bio-inspired light adaptation mechanism and used for robotic autonomous navigation.Os últimos anos têm testemunhado um aumento progressivo da investigação dedicada a demonstrar que possíveis soluções, para problemas existentes na navegação autónoma de robôs, podem ser encontradas buscando inspiração na biologia. Apesar do reduzido tamanho e da simplicidade do seu sistema nervoso, os insectos possuem sistemas de visão capazes de realizar os cálculos necessários para uma navegação segura em ambientes dinâmicos e não estruturados, por meio de estratégias simples, elegantes e computacionalmente eficientes. Assim, a área da neurociência que se debruça sobre o estudo dos invertebrados fornece, à area da engenharia, uma vasta gama de diagramas de circuitos neurais, que podem ser usados como base para a resolução de problemas complexos. Um atual e notável problema, cujas plataformas robóticas baseadas em sistemas de visão estão sujeitas, é o problema de deteção de colisões em ambientes complexos, dinâmicos e de intensidade luminosa variável. Assim, o objetivo principal do trabalho aqui apresentado é o de procurar inspiração em recentes e futuras descobertas relacionadas com os mecanismos que possibilitam a deteção de colisões em ambientes dinâmicos, bem como nas estratégias visuais de adaptação à luz, aplicadas por insectos diurnos. Numa primeira abordagem é feita uma análise comparativa dos três principais modelos, propostos na literatura, de deteção de colisões, que têm por base o funcionamento dos neurónios Lobular Gigante Detector de Movimento/ Detector de Movimento Descendente Contralateral (LGMD / DCMD), que fazem parte do sistema visual do gafanhoto. Os modelos são descritos, simulados e os resultados são comparados com os dados biológicos existentes, descritos na literatura. Devido à falta de informação relacionada com a forma como estes neurónios detectores de colisões lidam com ambientes dinâmicos, foram desenvolvidos novos estímulos visuais. A estimulação de gafanhotos Locusta Migratoria foi realizada usando-se estímulos controlados, gerados por computador, efectuando diferentes combinações de trajectórias de não-colisão e colisão, colocados sobre um fundo estático e dois fundos dinâmicos. extracelulares do neurónio DCMD. Com base nos resultados obtidos foi possível desenvolver um modelo inovador. Este foi testado sob estímulos visuais desenvolvidos computacionalmente, recriando as mesmas condições visuais usadas aquando dos registos neuronais biológicos. O modelo proposto mostrou ser capaz de reproduzir os resultados neuronais dos gafanhotos, experimentalmente obtidos. Usando uma abordagem diferente, e com base em descobertas recentes, apresentamos uma metodologia mais direta, que possibilita estimar possíveis colisões através de cálculos sequenciais dos espetros de potência das imagens captadas. Esta abordagem foi implementada numa plataforma robótica real, mostrando que, variações estatísticas nas imagens captadas, são susceptíveis de serem funcionalmente significativas. Manter o desempenho da deteção de colisões, em níveis de luz reduzida, não é uma tarefa trivial. No entanto, alguns sistemas visuais de insectos desenvolveram estratégias de forma a optimizar o seu desempenho visual numa larga gama de intensidades luminosas. Nesta dissertação, os mecanismos de adaptação neuronais, responsáveis pela melhoraria de captação de luz num inseto diurno, a abelha Bombus Terrestris, serviram como uma base de estudo. Adaptando análises comportamentais, foi-nos permitido investigar e inferir acerca da extensão dos somatórios neuronais, espaciais e temporais, aplicados por estes insetos, por forma a melhorar a qualidade das imagens captadas a diferentes níveis de luz. Como trabalho futuro, o modelo de deteção de colisões deverá ser acoplado com um mecanismo de adaptação à luz, sendo ambos bio-inspirados, e que possam ser utilizados na navegação robótica autónoma

Universidade do Minho: RepositoriUM

Abstracts on Radio Direction Finding (1899 - 1995)

Author
Publication venue
Publication date: 01/01/2018
Field of study

The files on this record represent the various databases that originally composed the CD-ROM issue of "Abstracts on Radio Direction Finding" database, which is now part of the Dudley Knox Library's Abstracts and Selected Full Text Documents on Radio Direction Finding (1899 - 1995) Collection. (See Calhoun record https://calhoun.nps.edu/handle/10945/57364 for further information on this collection and the bibliography). Due to issues of technological obsolescence preventing current and future audiences from accessing the bibliography, DKL exported and converted into the three files on this record the various databases contained in the CD-ROM. The contents of these files are: 1) RDFA_CompleteBibliography_xls.zip [RDFA_CompleteBibliography.xls: Metadata for the complete bibliography, in Excel 97-2003 Workbook format; RDFA_Glossary.xls: Glossary of terms, in Excel 97-2003 Workbookformat; RDFA_Biographies.xls: Biographies of leading figures, in Excel 97-2003 Workbook format]; 2) RDFA_CompleteBibliography_csv.zip [RDFA_CompleteBibliography.TXT: Metadata for the complete bibliography, in CSV format; RDFA_Glossary.TXT: Glossary of terms, in CSV format; RDFA_Biographies.TXT: Biographies of leading figures, in CSV format]; 3) RDFA_CompleteBibliography.pdf: A human readable display of the bibliographic data, as a means of double-checking any possible deviations due to conversion

Calhoun, Institutional Archive of the Naval Postgraduate School

A complex systems approach to education in Switzerland

Author: Frei R.
Publication venue: 'WARC Limited'
Publication date: 01/01/2011
Field of study

The insights gained from the study of complex systems in biological, social, and engineered systems enables us not only to observe and understand, but also to actively design systems which will be capable of successfully coping with complex and dynamically changing situations. The methods and mindset required for this approach have been applied to educational systems with their diverse levels of scale and complexity. Based on the general case made by Yaneer Bar-Yam, this paper applies the complex systems approach to the educational system in Switzerland. It confirms that the complex systems approach is valid. Indeed, many recommendations made for the general case have already been implemented in the Swiss education system. To address existing problems and difficulties, further steps are recommended. This paper contributes to the further establishment complex systems approach by shedding light on an area which concerns us all, which is a frequent topic of discussion and dispute among politicians and the public, where billions of dollars have been spent without achieving the desired results, and where it is difficult to directly derive consequences from actions taken. The analysis of the education system's different levels, their complexity and scale will clarify how such a dynamic system should be approached, and how it can be guided towards the desired performance

Southampton (e-Prints Soton)

Portsmouth University Research Portal (Pure)

140 A “Zero-Time ” VLSI Sorter

Author: C. K. Wong
G. Miranker
L. Tang
Publication venue
Publication date
Field of study

A hardware sorter suitable for VLSI implementation is proposed. It operates in a parallel and pipelined fashion, with the actual sorting time absorbed by the inputloutput time. A detailed VLSI implementation is described which has a very favorable device count compared to existing static RAM. 1

CiteSeerX