11 research outputs found
VLSI-sorting evaluated under the linear model
AbstractThere are several different models of computation used on which to base evaluations of VLSI sorting algorithms and there are different measures of complexity. This paper revises complexity results under the linear model that have been gained under the constant model. This approach is due to expected technological development (see Mangir, 1983; Thompson and Raghavan, 1984; Vitanyi, 1984a, 1984b).For the constant model we know that for medium sized keys there are AT2and AP2 optimal sorting algorithms with T ranging from ω(log n) to O(√nk) and P ranging from Ω(1) to O(√nk) (Bilardi, 1984). The main results of asymptotic analysis of sorting algorithms under the linear model are that the lower bounds allow AT2 optimal sorting algorithms only for T = Θ(√nk) but allow AP2 algorithms in the same range as under the constant model. Furthermore the sorting algorithms presented in this paper meet these lower bounds. This proves that these bounds cannot be improved for k = Θ (log n). The building block for the realization of these sorting algorithms is a comparison exchange module that compares r × s bit matrices in time TC = Θ(r + s) on an area AC = Θ(r2) (not including the storage area for the keys).For problem sizes that exceed realistic chip capacities, chip-external sorting algorithms can be used. In this paper two different chip-external sorting algorithms (BBB(S) and TWB(S)) are presented. They are designed to be implemented on a single board. They use a sorting chip S to perform the sort-split operation on blocks of data BBB(S) and TWB(S) are systolic algorithms using local communication only so that their evaluation does not depend on whether the constant or the linear model is used. Furthermore it seems obvious that their design is technically feasible whenever the sorting chip S is technically feasible.TWB has optimal asymptotic time complexity, so its existence proves that under the linear model external sorting can be done asymptotically as fast as under the constant model. The time complexity of TWB(S) is linearly dependent on the speed gs = nsts. It is shown that the speed if looked at as a function of the chip capacity C is asymptotically maximal for AT2 optimal sorting algorithms. Thus S should be a sorting algorithm similar to the M-M-sorter presented in this paper. A major disadvantage of TWB(S) is that it cannot exploit the maximal throughput ds = ns/ps of a systolic sorting algorithm S.Therefore algorithm BBB(S) is introduced. The time complexity of BBB(S) is linearly dependent on ds. It is shown that the throughput is maximal for AP2 optimal algorithms. There is a wide range of such sorting algorithms including algorithms that can be realized in a way that is independent of the length of the keys. For example, BBB(S) with S being a highly parallel version of odd-even transposition sort has this kind of flexibility. A disadvantage of BBB(S) is that it is asymptotically slower than TWB(S)
VHDL Design of a Scalable VLSI Sorting Device Based on Pipelined Computation
This paper describes the VHDL design of a sorting algorithm, aiming at defining an elementary sorting unit as a building block of VLSI devices which require a huge number of sorting units. As such, care was taken to reach a reasonable low value of the area-time parameter. A sorting VLSI device, in fact, can be built as a cascade of elementary sorting units which process the input stream in a pipeline fashion: as the processing goes on, a wave of sorted numbers propagates towards the output ports. The paper describes the design starting from an initial theoretical analysis of the algorithm\u27s complexity to a VHDL behavioural analysis of the proposed architecture to a structural synthesis of a sorting block based on the Alliance tools to, finally, a silicon synthesis which was worked out again using Alliance. Two points in the proposed design are particularly noteworthy. First, the sorting architecture is suitable for treating a continuous stream of input data rather than a block of data as in many other designs. Secondly, the proposed design reaches a reasonable compromise between area and time, as it yields an A T product which compares favourably with the theoretical lower bound
Recommended from our members
The analysis and synthesis of a parallel sorting engine
This thesis is concerned with the development of a unique
parallel sort-merge system suitable for implementation in VLSI.
Two new sorting subsystems, a high performance VLSI sorter and a
four-way merger, were also realized during the development
process. In addition, the analysis of several existing parallel sorting
architectures and algorithms was carried out.
Algorithmic time complexity, VLSI processor performance, and
chip area requirements for the existing sorting systems were
evaluated. The rebound sorting algorithm was determined to be the
most efficient among those considered. The rebound sorter
algorithm was implemented in hardware as a systolic array with
external expansion capability.
The second phase of the research involved analyzing several
parallel merge algorithms and their buffer management schemes.
The dominant considerations for this phase of the research were the
achievement of minimum VLSI chip area, design complexity, and logic delay. It was determined that the proposed merger
architecture could be implemented in several ways. Selecting the
appropriate microarchitecture for the merger, given the constraints
of chip area and performance, was the major problem. The tradeoffs
associated with this process are outlined.
Finally, a pipelined sort-merge system was implemented in
VLSI by combining a rebound sorter and a four-way merger on a
single chip. The final chip size was 416 mils by 432 mils. Two
micron CMOS technology was utilized in this chip realization. An
overall throughput rate of 10M bytes/sec was achieved. The
prototype system developed is capable of sorting thirty two 2-byte
keys during each merge phase. If extended, this system is capable of
economically sorting files of 100M bytes or more in size. In order to
sort larger files, this design should be incorporated in a disk-based
sort-merge system. A simplified disk I/O access model for such a
system was studied. In this study the sort-merge system was
assumed to be part of a disk controller subsystem
On the synthesis and processing of high quality audio signals by parallel computers
This work concerns the application of new computer architectures to the creation and manipulation of high-quality audio bandwidth signals. The configuration of both the hardware and software in such systems falls under consideration in the three major sections which present increasing levels of algorithmic concurrency. In the first section, the programs which are described are distributed in identical copies across an array of processing elements; these programs run autonomously, generating data independently, but with control parameters peculiar to each copy: this type of concurrency is referred to as isonomic}The central section presents a structure which distributes tasks across an arbitrary network of processors; the flow of control in such a program is quasi- indeterminate, and controlled on a demand basis by the rate of completion of the slave tasks and their irregular interaction with the master. Whilst that interaction is, in principle, deterministic, it is also data-dependent; the dynamic nature of task allocation demands that no a priori knowledge of the rate of task completion be required. This type of concurrency is called dianomic? Finally, an architecture is described which will support a very high level of algorithmic concurrency. The programs which make efficient use of such a machine are designed not by considering flow of control, but by considering flow of data. Each atomic algorithmic unit is made as simple as possible, which results in the extensive distribution of a program over very many processing elements. Programs designed by considering only the optimum data exchange routes are said to exhibit systolic^ concurrency. Often neglected in the study of system design are those provisions necessary for practical implementations. It was intended to provide users with useful application programs in fulfilment of this study; the target group is electroacoustic composers, who use digital signal processing techniques in the context of musical composition. Some of the algorithms in use in this field are highly complex, often requiring a quantity of processing for each sample which exceeds that currently available even from very powerful computers. Consequently, applications tend to operate not in 'real-time' (where the output of a system responds to its input apparently instantaneously), but by the manipulation of sounds recorded digitally on a mass storage device. The first two sections adopt existing, public-domain software, and seek to increase its speed of execution significantly by parallel techniques, with the minimum compromise of functionality and ease of use. Those chosen are the general- purpose direct synthesis program CSOUND, from M.I.T., and a stand-alone phase vocoder system from the C.D.P..(^4) In each case, the desired aim is achieved: to increase speed of execution by two orders of magnitude over the systems currently in use by composers. This requires substantial restructuring of the programs, and careful consideration of the best computer architectures on which they are to run concurrently. The third section examines the rationale behind the use of computers in music, and begins with the implementation of a sophisticated electronic musical instrument capable of a degree of expression at least equal to its acoustic counterparts. It seems that the flexible control of such an instrument demands a greater computing resource than the sound synthesis part. A machine has been constructed with the intention of enabling the 'gestural capture' of performance information in real-time; the structure of this computer, which has one hundred and sixty high-performance microprocessors running in parallel, is expounded; and the systolic programming techniques required to take advantage of such an array are illustrated in the Occam programming language
Recommended from our members
Design Space Exploration of Accelerators for Warehouse Scale Computing
With Moore’s law grinding to a halt, accelerators are one of the ways that new silicon can improve performance, and they are already a key component in modern datacenters. Accelerators are integrated circuits that implement parts of an application with the objective of higher energy efficiency compared to execution on a standard general purpose CPU. Many accelerators can target any particular workload, generally with a wide range of performance, and costs such as area or power. Exploring these design choices, called Design Space Exploration (DSE), is a crucial step in trying to find the most efficient accelerator design, the one that produces the largest reduction of the total cost of ownership.
This work aims to improve this design space exploration phase for accelerators and to avoid pitfalls in the process. This dissertation supports the thesis that early design choices – including the level of specialization – are critical for accelerator development and therefore require benchmarks reflective of production workloads. We present three studies that support this thesis. First, we show how to benchmark datacenter applications by creating a benchmark for large video sharing infrastructures. Then, we present two studies focused on accelerators for analytical query processing. The first is an analysis on the impact of Network on Chip specialization while the second analyses the impact of the level of specialization.
The first part of this dissertation introduces vbench: a video transcoding benchmark tailored to the growing video-as-a-service market. Video transcoding is not accurately represented in current computer architecture benchmarks such as SPEC or PARSEC. Despite posing a big computational burden for cloud video providers, such as YouTube and Facebook, it is not included in cloud benchmarks such as CloudSuite. Using vbench, we found that the microarchitectural profile of video transcoding is highly dependent on the input video, that SIMD extensions provide limited benefits, and that commercial hardware transcoders impose tradeoffs that are not ideal for cloud video providers. Our benchmark should spur architectural innovations for this critical workload. This work shows how to benchmark a real world warehouse scale application and the possible pitfalls in case of a mischaracterization.
When considering accelerators for the different, but no less important, application of analytical query processing, design space exploration plays a critical role. We analyzed the Q100, a class of accelerators for this application domain, using TPC-H as the reference benchmark. We found that the hardware computational blocks have to be tailored to the requirements of the application, but also the Network on Chip (NoC) can be specialized. We developed an algorithm capable of producing more effective Q100 designs by tailoring the NoC to the communication requirements of the system. Our algorithm is capable of producing designs that are Pareto optimal compared to standard NoC topologies. This shows how NoC specialization is highly effective for accelerators and it should be an integral part of design space exploration for large accelerators’ designs.
The third part of this dissertation analyzes the impact of the level of specialization, e.g. using an ASIC or Coarse Grain Reconfigurable Architecture (CGRA) implementation, on an accelerator performance. We developed a CGRA architecture capable of executing SQL query plans. We compare this architecture against Q100, an ASIC that targets the same class of workloads. Despite being less specialized, this programmable architecture shows comparable performance to the Q100 given an area and power budget. Resource usage explains this counterintuitive result, since a well programmed, homogeneous array of resources is able to more effectively harness silicon for the workload at hand. This suggests that a balanced accelerator research portfolio must include alternative programmable architectures – and their software stacks
Disseny microelectrnic de circuits discriminadors de polsos pel detector LHCb
The aim of this thesis is to present a solution for implementing the front end system of the Scintillator Pad Detector (SPD) of the calorimeter system of the LHCb experiment that will start in 2008 at the Large Hadron Collider (LHC) at CERN. The requirements of this specific system are discussed and an integrated solution is presented, both at system and circuit level. We also report some methodological achievements. In first place, a method to study the PSRR (and any transfer function) in fully differential circuits taking into account the effect of parameter mismatch is proposed. Concerning noise analysis, a method to study time variant circuits in the frequency domain is presented and justified. This would open the possibility to study the effect of 1/f noise in time variants circuits. In addition, it will be shown that the architecture developed for this system is a general solution for front ends in high luminosity experiments that must be operated with no dead time and must be robust against ballistic deficit
A bio-inspired computational model for motion detection
Tese de Doutoramento (Programa Doutoral em Engenharia Biomédica)Last years have witnessed a considerable interest in research dedicated to show that
solutions to challenges in autonomous robot navigation can be found by taking inspiration
from biology.
Despite their small size and relatively simple nervous systems, insects have evolved
vision systems able to perform the computations required for a safe navigation in dynamic
and unstructured environments, by using simple, elegant and computationally
efficient strategies. Thus, invertebrate neuroscience provides engineers with many
neural circuit diagrams that can potentially be used to solve complicated engineering
control problems.
One major and yet unsolved problem encountered by visually guided robotic platforms
is collision avoidance in complex, dynamic and inconstant light environments.
In this dissertation, the main aim is to draw inspiration from recent and future findings
on insect’s collision avoidance in dynamic environments and on visual strategies
of light adaptation applied by diurnal insects, to develop a computationally efficient
model for robotic control, able to work even in adverse light conditions.
We first present a comparative analysis of three leading collision avoidance models
based on a neural pathway responsible for signing collisions, the Lobula Giant Movement
Detector/Desceding Contralateral Movement Detector (LGMD/DCMD), found
in the locust visual system. Models are described, simulated and results are compared
with biological data from literature.
Due to the lack of information related to the way this collision detection neuron
deals with dynamic environments, new visual stimuli were developed. Locusts Lo-
custa Migratoria were stimulated with computer-generated discs that traveled along
a combination of non-colliding and colliding trajectories, placed over a static and two
distinct moving backgrounds, while simultaneously recording the DCMD activity extracellularly.
Based on these results, an innovative model was developed. This model was tested
in specially designed computer simulations, replicating the same visual conditions used
for the biological recordings. The proposed model is shown to be sufficient to give rise to experimentally observed neural insect responses.
Using a different approach, and based on recent findings, we present a direct approach
to estimate potential collisions through a sequential computation of the image’s
power spectra. This approach has been implemented in a real robotic platform, showing
that distant dependent variations on image statistics are likely to be functional
significant.
Maintaining the collision detection performance at lower light levels is not a trivial
task. Nevertheless, some insect visual systems have developed several strategies to
help them to optimize visual performance over a wide range of light intensities. In
this dissertation we address the neural adaptation mechanisms responsible to improve
light capture on a day active insect, the bumblebee Bombus Terrestris. Behavioral
analyses enabled us to investigate and infer about the spatial and temporal neural
summation extent applied by those insects to improve image reliability at the different
light levels.
As future work, the collision avoidance model may be coupled with a bio-inspired
light adaptation mechanism and used for robotic autonomous navigation.Os últimos anos têm testemunhado um aumento progressivo da investigação dedicada
a demonstrar que possíveis soluções, para problemas existentes na navegação autónoma
de robôs, podem ser encontradas buscando inspiração na biologia.
Apesar do reduzido tamanho e da simplicidade do seu sistema nervoso, os insectos
possuem sistemas de visão capazes de realizar os cálculos necessários para uma navegação
segura em ambientes dinâmicos e não estruturados, por meio de estratégias simples,
elegantes e computacionalmente eficientes. Assim, a área da neurociência que se debruça
sobre o estudo dos invertebrados fornece, à area da engenharia, uma vasta gama de
diagramas de circuitos neurais, que podem ser usados como base para a resolução de
problemas complexos.
Um atual e notável problema, cujas plataformas robóticas baseadas em sistemas
de visão estão sujeitas, é o problema de deteção de colisões em ambientes complexos,
dinâmicos e de intensidade luminosa variável.
Assim, o objetivo principal do trabalho aqui apresentado é o de procurar inspiração
em recentes e futuras descobertas relacionadas com os mecanismos que possibilitam
a deteção de colisões em ambientes dinâmicos, bem como nas estratégias visuais de
adaptação à luz, aplicadas por insectos diurnos.
Numa primeira abordagem é feita uma análise comparativa dos três principais modelos,
propostos na literatura, de deteção de colisões, que têm por base o funcionamento
dos neurónios Lobular Gigante Detector de Movimento/ Detector de Movimento Descendente
Contralateral (LGMD / DCMD), que fazem parte do sistema visual do gafanhoto.
Os modelos são descritos, simulados e os resultados são comparados com os dados biológicos
existentes, descritos na literatura.
Devido à falta de informação relacionada com a forma como estes neurónios detectores
de colisões lidam com ambientes dinâmicos, foram desenvolvidos novos estímulos visuais.
A estimulação de gafanhotos Locusta Migratoria foi realizada usando-se estímulos
controlados, gerados por computador, efectuando diferentes combinações de trajectórias
de não-colisão e colisão, colocados sobre um fundo estático e dois fundos dinâmicos. extracelulares do neurónio DCMD.
Com base nos resultados obtidos foi possível desenvolver um modelo inovador.
Este foi testado sob estímulos visuais desenvolvidos computacionalmente, recriando as
mesmas condições visuais usadas aquando dos registos neuronais biológicos. O modelo
proposto mostrou ser capaz de reproduzir os resultados neuronais dos gafanhotos,
experimentalmente obtidos.
Usando uma abordagem diferente, e com base em descobertas recentes, apresentamos
uma metodologia mais direta, que possibilita estimar possíveis colisões através de
cálculos sequenciais dos espetros de potência das imagens captadas. Esta abordagem
foi implementada numa plataforma robótica real, mostrando que, variações estatísticas
nas imagens captadas, são susceptíveis de serem funcionalmente significativas.
Manter o desempenho da deteção de colisões, em níveis de luz reduzida, não é uma
tarefa trivial. No entanto, alguns sistemas visuais de insectos desenvolveram estratégias
de forma a optimizar o seu desempenho visual numa larga gama de intensidades
luminosas. Nesta dissertação, os mecanismos de adaptação neuronais, responsáveis
pela melhoraria de captação de luz num inseto diurno, a abelha Bombus Terrestris,
serviram como uma base de estudo. Adaptando análises comportamentais, foi-nos
permitido investigar e inferir acerca da extensão dos somatórios neuronais, espaciais e
temporais, aplicados por estes insetos, por forma a melhorar a qualidade das imagens
captadas a diferentes níveis de luz.
Como trabalho futuro, o modelo de deteção de colisões deverá ser acoplado com
um mecanismo de adaptação à luz, sendo ambos bio-inspirados, e que possam ser
utilizados na navegação robótica autónoma
Abstracts on Radio Direction Finding (1899 - 1995)
The files on this record represent the various databases that originally composed the CD-ROM issue of "Abstracts on Radio Direction Finding" database, which is now part of the Dudley Knox Library's Abstracts and Selected Full Text Documents on Radio Direction Finding (1899 - 1995) Collection. (See Calhoun record https://calhoun.nps.edu/handle/10945/57364 for further information on this collection and the bibliography).
Due to issues of technological obsolescence preventing current and future audiences from accessing the bibliography, DKL exported and converted into the three files on this record the various databases contained in the CD-ROM.
The contents of these files are:
1) RDFA_CompleteBibliography_xls.zip [RDFA_CompleteBibliography.xls: Metadata for the complete bibliography, in Excel 97-2003 Workbook format; RDFA_Glossary.xls: Glossary of terms, in Excel 97-2003 Workbookformat; RDFA_Biographies.xls: Biographies of leading figures, in Excel 97-2003 Workbook format];
2) RDFA_CompleteBibliography_csv.zip [RDFA_CompleteBibliography.TXT: Metadata for the complete bibliography, in CSV format; RDFA_Glossary.TXT: Glossary of terms, in CSV format; RDFA_Biographies.TXT: Biographies of leading figures, in CSV format];
3) RDFA_CompleteBibliography.pdf: A human readable display of the bibliographic data, as a means of double-checking any possible deviations due to conversion
A complex systems approach to education in Switzerland
The insights gained from the study of complex systems in biological, social, and engineered systems enables us not only to observe and understand, but also to actively design systems which will be capable of successfully coping with complex and dynamically changing situations. The methods and mindset required for this approach have been applied to educational systems with their diverse levels of scale and complexity. Based on the general case made by Yaneer Bar-Yam, this paper applies the complex systems approach to the educational system in Switzerland. It confirms that the complex systems approach is valid. Indeed, many recommendations made for the general case have already been implemented in the Swiss education system. To address existing problems and difficulties, further steps are recommended. This paper contributes to the further establishment complex systems approach by shedding light on an area which concerns us all, which is a frequent topic of discussion and dispute among politicians and the public, where billions of dollars have been spent without achieving the desired results, and where it is difficult to directly derive consequences from actions taken. The analysis of the education system's different levels, their complexity and scale will clarify how such a dynamic system should be approached, and how it can be guided towards the desired performance
140 A “Zero-Time ” VLSI Sorter
A hardware sorter suitable for VLSI implementation is proposed. It operates in a parallel and pipelined fashion, with the actual sorting time absorbed by the inputloutput time. A detailed VLSI implementation is described which has a very favorable device count compared to existing static RAM. 1