69 research outputs found
A configurable vector processor for accelerating speech coding algorithms
The growing demand for voice-over-packer (VoIP) services and multimedia-rich
applications has made increasingly important the efficient, real-time implementation of
low-bit rates speech coders on embedded VLSI platforms. Such speech coders are
designed to substantially reduce the bandwidth requirements thus enabling dense multichannel
gateways in small form factor. This however comes at a high computational cost
which mandates the use of very high performance embedded processors.
This thesis investigates the potential acceleration of two major ITU-T speech coding
algorithms, namely G.729A and G.723.1, through their efficient implementation on a
configurable extensible vector embedded CPU architecture. New scalar and vector ISAs
were introduced which resulted in up to 80% reduction in the dynamic instruction count
of both workloads. These instructions were subsequently encapsulated into a parametric,
hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research
and implementation of the vector datapath of this vector coprocessor which is tightly-coupled
to a Sparc-V8 compliant CPU, the optimization and simulation methodologies
employed and the use of Electronic System Level (ESL) techniques to rapidly design
SIMD datapaths
STATISTICAL MACHINE LEARNING BASED MODELING FRAMEWORK FOR DESIGN SPACE EXPLORATION AND RUN-TIME CROSS-STACK ENERGY OPTIMIZATION FOR MANY-CORE PROCESSORS
The complexity of many-core processors continues to grow as a larger number of heterogeneous cores are integrated on a single chip. Such systems-on-chip contains computing structures ranging from complex out-of-order cores, simple in-order cores, digital signal processors (DSPs), graphic processing units (GPUs), application specific processors, hardware accelerators, I/O subsystems, network-on-chip interconnects, and large caches arranged in complex hierarchies. While the industry focus is on putting higher number of cores on a single chip, the key challenge is to optimally architect these many-core processors such that performance, energy and area constraints are satisfied. The traditional approach to processor design through extensive cycle accurate simulations are ill-suited for designing many-core processors due to the large microarchitecture design space that must be explored. Additionally it is hard to optimize such complex processors and the applications that run on them statically at design time such that performance and energy constraints are met under dynamically changing operating conditions.
The dissertation establishes statistical machine learning based modeling framework that enables the efficient design and operation of many-core processors that meets performance, energy and area constraints. We apply the proposed framework to rapidly design the microarchitecture of a many-core processor for multimedia, computer graphics rendering, finance, and data mining applications derived from the Parsec benchmark. We further demonstrate the application of the framework in the joint run-time adaptation of both the application and microarchitecture such that energy availability
constraints are met
Recommended from our members
Overcoming the Intuition Wall: Measurement and Analysis in Computer Architecture
These are exciting times for computer architecture research. Today there is significant demand to improve the performance and energy-efficiency of emerging, transformative applications which are being hammered out by the hundreds for new computing platforms and usage models. This booming growth of applications and the variety of programming languages used to create them is challenging our ability as architects to rapidly and rigorously characterize these applications. Concurrently, hardware has become more complex with the emergence of accelerators, multicore systems, and heterogeneity caused by further divergence between processor market segments. No one architect can now understand all the complexities of many systems and reason about the full impact of changes or new applications.
To that end, this dissertation presents four case studies in quantitative methods. Each case study attacks a different application and proposes a new measurement or analytical technique. In each case study we find at least one surprising or unintuitive result which would likely not have been found without the application of our method
High-Order Epistasis Detection in High Performance Computing Systems
Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo]
Nos últimos anos, os estudos de asociación do xenoma completo (Genome-Wide
Association Studies, GWAS) están a gañar moita popularidade de cara a buscar unha
explicación xenética á presenza ou ausencia de certas enfermidades nos humanos.Hai
un consenso nestes estudos sobre a existencia de interaccións xenéticas que condicionan
a expresión de enfermidades complexas, un fenómeno coñecido como epistasia.
Esta tese céntrase no estudo deste fenómeno empregando a computación de altas
prestacións (High-Performance Computing, HPC) e dende a súa perspectiva estadística:
a desviación da expresión dun fenotipo como a suma dos efectos individuais de
múltiples variantes xenéticas. Con este obxectivo desenvolvemos unha primeira ferramenta,
chamada MPI3SNP, que identifica interaccións de tres variantes a partir dun
conxunto de datos de entrada. MPI3SNP implementa unha busca exhaustiva empregando
un test de asociación baseado na Información Mutua, e explota os recursos de
clústeres de CPUs ou GPUs para acelerar a busca. Coa axuda desta ferramenta avaliamos
o estado da arte da detección de epistasia a través dun estudo que compara o rendemento
de vintesete ferramentas. A conclusión máis importante desta comparativa
é a incapacidade dos métodos non exhaustivos de atopar interacción ante a ausencia
de efectos marxinais (pequenos efectos de asociación das variantes individuais que
participan na epistasia). Por isto, esta tese continuou centrándose na optimización da
busca exhaustiva de epistasia. Por unha parte, mellorouse a eficiencia do test de asociación
a través dunha implantación vectorial do mesmo. Por outro lado, creouse un
algoritmo distribuído que implementa unha busca exhaustiva capaz de atopar epistasia
de calquera orden. Estes dous fitos lógranse en Fiuncho, unha ferramenta que integra
toda a investigación realizada, obtendo un rendemento en clústeres de CPUs que
supera a todas as súas alternativas no estado da arte. Adicionalmente, desenvolveuse
unha libraría para simular escenarios biolóxicos con epistasia chamada Toxo. Esta
libraría permite a simulación de epistasia seguindo modelos de interacción xenética
existentes para orde alto.[Resumen]
En los últimos años, los estudios de asociación del genoma completo (Genome-
Wide Association Studies, GWAS) están ganando mucha popularidad de cara a buscar
una explicación genética a la presencia o ausencia de ciertas enfermedades en los seres
humanos. Existe un consenso entre estos estudios acerca de que muchas enfermedades
complejas presentan interacciones entre los diferentes genes que intervienen en su
expresión, un fenómeno conocido como epistasia. Esta tesis se centra en el estudio de
este fenómeno empleando la computación de altas prestaciones (High-Performance
Computing, HPC) y desde su perspectiva estadística: la desviación de la expresión de
un fenotipo como suma de los efectos de múltiples variantes genéticas. Para ello se
ha desarrollado una primera herramienta, MPI3SNP, que identifica interacciones de
tres variantes a partir de un conjunto de datos de entrada. MPI3SNP implementa una
búsqueda exhaustiva empleando un test de asociación basado en la Información Mutua,
y explota los recursos de clústeres de CPUs o GPUs para acelerar la búsqueda.
Con la ayuda de esta herramienta, hemos evaluado el estado del arte de la detección
de epistasia a través de un estudio que compara el rendimiento de veintisiete herramientas.
La conclusión más importante de esta comparativa es la incapacidad de los
métodos no exhaustivos de localizar interacciones ante la ausencia de efectos marginales
(pequeños efectos de asociación de variantes individuales pertenecientes a una
relación epistática). Por ello, esta tesis continuó centrándose en la optimización de la
búsqueda exhaustiva. Por un lado, se mejoró la eficiencia del test de asociación a través
de una implementación vectorial del mismo. Por otra parte, se diseñó un algoritmo
distribuido que implementa una búsqueda exhaustiva capaz de encontrar relaciones
epistáticas de cualquier tamaño. Estos dos hitos se logran en Fiuncho, una herramienta
que integra toda la investigación realizada, obteniendo un rendimiento en clústeres
de CPUs que supera a todas sus alternativas del estado del arte. A mayores, también se
ha desarrollado una librería para simular escenarios biológicos con epistasia llamada
Toxo. Esta librería permite la simulación de epistasia siguiendomodelos de interacción
existentes para orden alto.[Abstract]
In recent years, Genome-Wide Association Studies (GWAS) have become more and
more popular with the intent of finding a genetic explanation for the presence or absence
of particular diseases in human studies. There is consensus about the presence
of genetic interactions during the expression of complex diseases, a phenomenon
called epistasis. This thesis focuses on the study of this phenomenon, employingHigh-
Performance Computing (HPC) for this purpose and from a statistical definition of the
problem: the deviation of the expression of a phenotype from the addition of the individual
contributions of genetic variants. For this purpose, we first developedMPI3SNP,
a programthat identifies interactions of three variants froman input dataset. MPI3SNP
implements an exhaustive search of epistasis using an association test based on the
Mutual Information and exploits the resources of clusters of CPUs or GPUs to speed up
the search. Then, we evaluated the state-of-the-art methods with the help of MPI3SNP
in a study that compares the performance of twenty-seven tools. The most important
conclusion of this study is the inability of non-exhaustive approaches to locate epistasis
in the absence of marginal effects (small association effects of individual variants
that partake in an epistasis interaction). For this reason, this thesis continued focusing
on the optimization of the exhaustive search. First, we improved the efficiency of
the association test through a vector implementation of this procedure. Then, we developed
a distributed algorithm capable of locating epistasis interactions of any order.
These two milestones were achieved in Fiuncho, a program that incorporates all the
research carried out, obtaining the best performance in CPU clusters out of all the alternatives
of the state-of-the-art. In addition, we also developed a library to simulate
particular scenarios with epistasis called Toxo. This library allows for the simulation of
epistasis that follows existing interaction models for high-order interactions
Characterizing and Accelerating Bioinformatics Workloads on Modern Microarchitectures
Bioinformatics, the use of computer techniques to analyze biological data, has been a particularly active research field in the last two decades. Advances in this field have contributed to the collection of enormous amounts of data, and the sheer amount of available data has started to overtake the processing capability possible with current computer systems. Clearly, computer architects need to have a better understanding of how bioinformatics applications work and what kind of architectural techniques could be used to accelerate these important scientific workloads on future processors. In this dissertation, we develop a bioinformatic benchmark suite and provide a detailed characterization of these applications in common use today from a computer architect's point of view. We analyze a wide range of detailed execution characteristics including instruction mix, IPC measurements, L1 and L2 cache misses on a real architecture; and proceed to analyze the workloads' memory access characteristics. We then concentrate on accelerating a particularly computationally intensive bioinformatics workload on the novel Cell Broadband Engine multiprocessor architecture. The HMMER workload is used for protein profile searching using hidden Markov models, and most of its execution time is spent running the Viterbi algorithm. We parallelize and partition the HMMER application to implement it on the Cell Broadband Engine. In order to run the Viterbi algorithm on the 256KB local stores of the Cell BE synergistic processing units (SPEs), we present a method to develop a fast SIMD implementation of the Viterbi algorithm that reduces the storage requirements significantly. Our HMMER implementation for the Cell BE architecture, Cell-HMMER, exploits the multiple levels of parallelism inherent in this application, and can run protein profile searches up to 27.98 times faster than a modern dual-core x86 microprocessor
- …