Search CORE

13 research outputs found

Accelerating Transactional Memory by Exploiting Platform Specificity

Author: Ruan Wenjia
Publication venue: Lehigh Preserve
Publication date
Field of study

Transactional Memory (TM) is one of the most promising alternatives to lock-based concurrency, but there still remain obstacles that keep TM from being utilized in the real world. Performance, in terms of high scalability and low latency, is always one of the most important keys to general purpose usage. While most of the research in this area focuses on improving a specific single TM implementation and some default platform (a certain operating system, compiler and/or processor), little has been conducted on improving performance more generally, and across platforms.We found that by utilizing platform specificity, we could gain tremendous performance improvement and avoid unnecessary costs due to false assumptions of platform properties, on not only a single TM implementation, but many. In this dissertation, we will present our findings in four sections: 1) we discover and quantify hidden costs from inappropriate compiler instrumentations, and provide sug- gestions and solutions; 2) we boost a set of mainstream timestamp-based TM implementations with the x86-specific hardware cycle counter; 3) we explore compiler opportunities to reduce the transaction abort rate, by reordering read-modify-write operations — the whole technique can be applied to all TM implementations, and could be more effective with some help from compilers; and 4) we coordinate the state-of-the-art Intel Haswell TSX hardware TM with a software TM “Cohorts”, and develop a safe and flexible Hybrid TM, “HyCo”, to be our final performance boost in this dissertation.The impact of our research extends beyond Transactional Memory, to broad areas of concurrent programming. Some of our solutions and discussions, such as the synchronization between accesses of the hardware cycle counter and memory loads and stores, can be utilized to boost concurrent data structures and many timestamp-based systems and applications. Others, such as discussions of compiler instrumentation costs and reordering opportunities, provide additional insights to compiler designers. Our findings show that platform specificity must be taken into consideration to achieve peak performance

Lehigh University: Lehigh Preserve

Conception et implémentation de processeurs dédiés pour des systèmes de traitement vidéo temps réel

Author: Bouyela Ngoyi Gérard Armand
Publication venue
Publication date: 01/03/2009
Field of study

RÉSUMÉ Les systèmes de traitement vidéo se caractérisent par des demandes de performance de plus en plus exigeantes. Les nouvelles normes, telles le HMDI 1.3 (High Definition Media Interface), requièrent des bandes passantes allant jusqu'à 340 Méga-pixels par seconde et par canal. Il en découle que les processeurs traitant ce type d’information doivent être très performants. Les nouvelles méthodologies de conception basées sur un langage de description d’architecture (ADL) apparaissent pour répondre à ces défis. Elles nous permettent de concevoir des processeurs dédiés de bout en bout, avec un maximum de flexibilité. Cette flexibilité, grande force de ce type de langage (tels LISA 2.0), nous permet par ajout d’instructions spécialisées et modification de l’architecture (ajout de registres spécialisés, modification de largeur de bus), de créer un processeur dédié à partir d’architectures de base considérées comme des processeurs d’usage général. Dans le cadre de nos travaux, nous nous sommes concentrés sur un type d’algorithmes de traitement d’image, le désentrelacement. Le désentrelacement est un traitement qui permet de reconstruire une séquence vidéo complète à partir d’une séquence vidéo entrelacée pour des raisons telles que la réduction de bande passante. Tout au long de nos travaux, nous avons eu un souci constant de développer des méthodologies, les plus générales possibles, pouvant être utilisées pour d’autres algorithmes. L’une des contributions de ce mémoire est le développement d’architectures de testcomplètes et modulaires, permettant d’implémenter un processeur de traitement vidéo temps réel. Nous avons également développé une interface de gestion de RAM qui permet au cours du développement des processeurs de les tester sans modifier le système au complet. Le développement de deux méthodologies innovatrices représente un apport supplémentaire dans la conception de processeurs dédiés. Ces deux méthodologies, qui se basent sur un langage ADL, sont synergiques et permettent d’implémenter et d’accélérer des algorithmes de traitements vidéo temps réel. Nous obtenons dans un premier temps un facteur d’accélération de 11 pour la première méthodologie puis un facteur d’accélération de 282 pour la deuxième. ----------ABSTRACT Video processing systems are characterized by rising performance specifications. New standards such as the HDMI 1.3 require bandwidths as high as 340 megapixels per second and per channel, resulting in greater information processing power. New conceptual methodologies based on architectural descriptions (ADL) seem to respond to this challenge. Design methods and languages for architectural descriptions (such as LISA 2.0), allow developing tailor-made high performance processors in a very flexible way. The flexibility of these languages let the user add specialized instructions to an instruction set processor. They also allow modifying its architecture to create a processor with much improved performance compared to some baseline general purpose processsor. Our study focuses on a specific type of video processing algorithm called deinterlacing. Deinterlacing allows reconstructing a complete video sequence from an interlaced video sequence. Despite this algorithmic focus, in the course of this study, we were concerned with developing broadly applicable methodologies usable for other algorithms. This thesis aims to contribute to the existing body of work in the field by developing complete and modular test architectures allowing to implement processors capable of real time video processing. The development of two innovative design methodologies represents an additional contribution. These synergetic methodologies are based on ADL (Architecture Description Language). Our results confirm that they allow implementing processors capable of real-time video processing. We obtained an acceleration factor of 11 with a first design method and the acceleration factor was further improved to 282 with a second method

PolyPublie

Characterization and reduction of memory usage in 64-bit Java Virtual Machines

Author: Venstermans Kris
Publication venue: Ghent University. Faculty of Engineering
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography

Feasibility study for a numerical aerodynamic simulation facility. Volume 2: Hardware specifications/descriptions

Author: Green F. M.
Resnick D. R.
Publication venue
Publication date
Field of study

An FMP (Flow Model Processor) was designed for use in the Numerical Aerodynamic Simulation Facility (NASF). The NASF was developed to simulate fluid flow over three-dimensional bodies in wind tunnel environments and in free space. The facility is applicable to studying aerodynamic and aircraft body designs. The following general topics are discussed in this volume: (1) FMP functional computer specifications; (2) FMP instruction specification; (3) standard product system components; (4) loosely coupled network (LCN) specifications/description; and (5) three appendices: performance of trunk allocation contention elimination (trace) method, LCN channel protocol and proposed LCN unified second level protocol

NASA Technical Reports Server

Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

Author: Becker Jürgen
Hübner Michael
Lagadec Loïc
Sander Oliver
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2010
Field of study

ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

KITopen

Virtualization services: scalable methods for virtualizing multicore systems

Author: Raj Himanshu
Publication venue: Georgia Institute of Technology
Publication date: 10/01/2008
Field of study

Multi-core technology is bringing parallel processing capabilities from servers to laptops and even handheld devices. At the same time, platform support for system virtualization is making it easier to consolidate server and client resources, when and as needed by applications. This consolidation is achieved by dynamically mapping the virtual machines on which applications run to underlying physical machines and their processing cores. Low cost processor and I/O virtualization methods efficiently scaled to different numbers of processing cores and I/O devices are key enablers of such consolidation. This dissertation develops and evaluates new methods for scaling virtualization functionality to multi-core and future many-core systems. Specifically, it re-architects virtualization functionality to improve scalability and better exploit multi-core system resources. Results from this work include a self-virtualized I/O abstraction, which virtualizes I/O so as to flexibly use different platforms' processing and I/O resources. Flexibility affords improved performance and resource usage and most importantly, better scalability than that offered by current I/O virtualization solutions. Further, by describing system virtualization as a service provided to virtual machines and the underlying computing platform, this service can be enhanced to provide new and innovative functionality. For example, a virtual device may provide obfuscated data to guest operating systems to maintain data privacy; it could mask differences in device APIs or properties to deal with heterogeneous underlying resources; or it could control access to data based on the ``trust' properties of the guest VM. This thesis demonstrates that extended virtualization services are superior to existing operating system or user-level implementations of such functionality, for multiple reasons. First, this solution technique makes more efficient use of key performance-limiting resource in multi-core systems, which are memory and I/O bandwidth. Second, this solution technique better exploits the parallelism inherent in multi-core architectures and exhibits good scalability properties, in part because at the hypervisor level, there is greater control in precisely which and how resources are used to realize extended virtualization services. Improved control over resource usage makes it possible to provide value-added functionalities for both guest VMs and the platform. Specific instances of virtualization services described in this thesis are the network virtualization service that exploits heterogeneous processing cores, a storage virtualization service that provides location transparent access to block devices by extending the functionality provided by network virtualization service, a multimedia virtualization service that allows efficient media device sharing based on semantic information, and an object-based storage service with enhanced access control.Ph.D.Committee Chair: Schwan, Karsten; Committee Member: Ahamad, Mustaq; Committee Member: Fujimoto, Richard; Committee Member: Gavrilovska, Ada; Committee Member: Owen, Henry; Committee Member: Xenidis, Jim

Scholarly Materials And Research @ Georgia Tech

Okvir za alokaciju softverskih komponenata na heterogenoj računalnoj platformi

Author: Švogor Ivan
Publication venue: University of Zagreb. Faculty of Organization and Informatics Varaždin.
Publication date: 29/04/2016
Field of study

A recent development of heterogeneous platforms (i.e. those containing different types of computing units such as multicore CPUs, GPUs, and FPGAs) has enabled significant improvements in performance of real-time data processing. However, due to increased development efforts for such platforms, they are not fully exploited. To use the full potential of such platforms, we need new frameworks and methods for capturing the optimal configuration of the software. Different configurations, i.e. allocations of software components to different computing unit types can be essential for getting the maximal utilization of the platform. For more complex systems it is difficult to find ad hoc, good enough or the best configuration.This research suggests the application of component based software engineering(CBSE) principles, by which it is possible to achieve the same functionality of software components across various computing units of different types, however with different extrafunctional properties (EFP). The objective of this research is to construct a framework which optimizes the allocation of software components on a heterogeneous computing platform with respect to specified extra-functional requirements.The I-IV allocation framework, proposed by this research, consist of formalisms necessary for modeling of a heterogeneous computing platform and exploring the designspace, which results with an optimal design decision. The I-IV allocation frameworkwas verified in two steps, focusing on two EFPs; the average power consumption andthe average execution time. The experimental platform was a tracked robot, developed for the purpose of this research. It contains a CPU, a GPU and an FPGA, along with 32software components deployable onto these units. Both steps resulted in a positive result confirming the claim that the I-IV framework, along with its Component allocation model M correctly represents the heterogeneous system performance, with consideration to multiple criteria.Usprkos tome da je u posljednjih nekoliko godina povećanje radnog takta središnje procesne jedinice (CPU) usporeno, ako ne i zaustavljeno, performanse suvremenih računala i dalje rastu, ali ne zbog radnog takta. To znaci da se i performanse racunalnih programa više na ovaj nacin ne mogu unaprijediti, čak što više, daljnje povečavanje radnogčtakta CPU-a pokazalo se neučinkovitim. Zbog toga, došlo je do suštinske promjene u građi procesora, odnosno to repliciranja procesnih jezgri te ugradbom dodatnih namjenskih procesnih jedinica koje su specijalizirane za određeni tip zadataka. Najcešce su to graficka procesna jedinica (GPU), programirljiva polja logickih blokova (FPGA), integrirani krugovi specificne namjene (ASIC), itd. Istovremeno, zajednica prepoznala je veliki istraživacki potencijal heterogenih racunalnih sustava, odnosno sustava sa mnoštvom procesnih jedinica razlicitog tipa, obzirom da omogućuju izuzetna poboljšanja performansi softvera.Mnogi se istraživaci već dulje vrijeme bave heterogenim racunalstvom, što znaci da to nije nova ideja, no u posljednjih nekoliko godina, zbog fizickih ogranicenja vezanih uz arhitekturu procesnih jedinica, heterogeno racunalstvo postaje sve popularnija istraživacka tema. Uz izuzetno povećanje procesne moći, heterogeno racunalstvo donosii mnogo izazova, prvenstveno za softverske inženjere. Naime, razvoj softvera za takve sustave vrlo je zahtjevan zbog primjerice, potrebe za rukovanjem sa više razlicitih tipova podataka ili programskih jezika unutar istog racunalnog programa, kompatibilnosti pojedinih procesnih jedinica i konverzije tipova podataka, potrebe za specijaliziranim bibliotekama koda, korištenja razlicitih struktura podataka kroz više arhitekturalni slojeva racunala i racunalnog programa, itd. Osim toga, obzirom na to da se heterogeni sustavi prvenstveno koriste kao elementi ugradbenih racunala u industriji, softverski inženjeri uz funkcionalne zahtjeve, dodatnu pozornost moraju dati nefunkcionalnim zahtjevima (EFP).Kako bi se upravljalo funkcionalnim i nefunkcionalnim zahtjevima softvera, u složenim heterogenim racunalnim sustavima,cesto se primjenjuju nacela komponentno orijentiranog softverskog inženjerstva (CBSE), koja su u softverskoj zajednici dobro poznata i dokazana. CBSE obuhvaća modele, metode i smjernice za softverske inženjere koji razvijaju sustave temeljene na komponentama, odnosno građevnim jedinicama koje komuniciraju putem ugovorno definiranih sucelja, koje se mogu samostalnougra divati i jednostavno zamjenjivati. Time, CBSE daje snažne temelje za prethodno spomenute vezane uz razvoj softvera namijenjenog za heterogene racunalne sustave.U tom kontekstu, CBSE omogućuje postizanje jednake funkcionalnosti komponenata softvera alociranih na (razlicite) procesne jedinice (razlicitog tipa), no sa drugacijimne-funkcionalnim svojstvima. To znaci da pojedine alokacije komponenata softveramogu biti više ili manje ucinkovite obzirom na scenarije njihove primjene, odnosnonjihove ulazne parametre, što za sobom povlaci i pitanje ukupnih performansi sustava. Prema tome, zadatak arhitekta softvera najprije je definirati svojstva najbolje alokacije obzirom na više kriterija, poput dostupnosti resurs, ne-funkcionalna svojstva i ogranicenja, a potom na konkretnoj heterogenoj racunalnoj platformi ucinkovito i pronaći takvu alokaciju.Temeljni cilj ovog istraživanja je konstruirati okvir za optimizaciju alokacije kompo- nenti softvera na heterogenoj racunalnoj platformi, koji uzimajući u obzir ogranicenja resursa dostupnih na racunalnim jedinicama (razlicitog tipa), specifikacije komponenata softvera i ogranicenja koja definira arhitekt sustava ucinkovito pronalazi najbolju alokaciju. Ova disertacija predlaže Alokacijski okvir I-IV sastavljen od formalnih elemenata koji omogućuju stvaranje modela heterogenog racunalnog sustava te pretraživanje prostora potencijalnih alokacija, te definira korake kojima se postiže optimalna arhitektura sustava. Kako u ovom slucaju prostor potencijalnih rješenja, odnosno alokacija eksponencijalno raste (uzmdostupnih racunalnih jedinica tendostupnih komponentisoftvera, prostor rješenja jemn), razvijen je i prototip alata koji automatizira Alokacijskiokvir I-IV, što je inace dugotrajan ili cak neizvediv proces. Za opis nefunkcionalnih svojstava heterogenih sustava, koristi se Model za alokaciju komponenata M. Taj model,primjenom težinske funkcije omogućuje kvantifikaciju pojedinih alokacija čime je omogućena njihova usporedba te procjena prikladnosti korištenja istih. Istovremeno,težinska funkcija daje uvid u performanse sustava u njegovoj ranoj fazi razvoja (cak prije nego su komponente razvijene).Vjerodostojnost Alokacijskog okvira I-IV provjerena je u dva koraka (eksperimenta),pri cemu je fokus bio na dva nefunkcionalna svojstva sustava: prosjecni elektricni učinak elektricne struje i prosjecno vrijeme izvođenja operacija softvera. Eksperimentalna platforma bila su robotska kolica sa heterogenim racunalnim sustavom sacinjenim odCPU-a, GPU-a te FPGA-a, zajedno sa tridesetak komponenata softvera koje je moguće alocirati na te racunalne jedinice.Prvi korak provjere odnosio se na provjeru tocnosti, odnosno procjenu prikladnosti težinske funkcije w da kvantificira performanse pojedine alokacije. Postupak je proveden primjenom šest razlicitih alokacija koje predstavljaju dva razlicita scenarija izvođenja.Odabrane alokacije, nakon što su kvantificirane težinskom funkcijom w, zapisane sutablicno i rangirane prema predviđenim performansama. Nakon toga, te iste alokacije su implementirane na stvarnom sustavu, ranije spomenutim robotskim kolicima. Iscrpnim mjerenje (u intervalu pouzdanosti od 95%), zabilježene su performanse alokacija i ponovno su rangirane u rang listu. Rezultat oba rangiranja bio je jednak,cime slijedi damodel za raspodjelu komponenata M, te njegova težinska funkcija w mogu korektno predvidjeti performanse pojedine alokacije u realnom sustavu. Ovakav ishod, doveo IV je do drugog koraka provjere koji se odnosi na scenarij(e) u kojem postoji izrazito veliki broj komponenti softvera te racunalnih jedinica,cime prostor potencijalnih rješenja postaje toliko velik pronalaženje najbolje alokacije metodom iscrpnog pretraživanja nije moguće ucinkovito provesti.Obzirom da trenutna implementacija Alokacijskog modela I-IV definira heuristicke metode za rješavanje navedenog problema, drugi korak provjere za cilj ima procijeniti sub-optimalno rješenje genetskog algoritma i metode simuliranog kaljenja. Uz heuristicke metode, generirane su i proizvoljne alokacije, jer u nekim slucajevima su takve alokacije podjednako dobre ili cak bolje od heuristickih metoda. U prvoj iteraciji, provjeravala se preciznost navedenih metoda, odnosno njihovo odstupanje od optimalne alokacije dane iscrpnim pretraživanje u prostoru do 512. Pokazalo se kako genetski algoritam daje najbolja rješenja, odnosno alokacije koje minimalno odstupaju od optimalnog rješenja. Nadalje, za prostore rješenja između 1020do3070 gdje iscrpno pretraživanje nije ucinkovito, usporedba je pokazala da obje heuristicke metode daju bolja suboptimalne alokacije od proizvoljno definiranih alokacija i to u najkraćem vremenu.Iako je statisticki vjerojatno, ni u jednom slucaju (u 55 ponavljanja, s povećavanjem prostora rješenja) nije zabilježeno da proizvoljno generirana alokacija daje bolje performanse od alokacije dobivene predloženim heuristickim metodama, cime je završila validacija predloženog okvira i svih njegovih elemenata

University of Zagreb Repository

Okvir za alokaciju softverskih komponenata na heterogenoj računalnoj platformi

Author: Švogor Ivan
Publication venue: University of Zagreb. Faculty of Organization and Informatics Varaždin.
Publication date: 29/04/2016
Field of study

Croatian Digital Dissertations Repository

Okvir za alokaciju softverskih komponenata na heterogenoj računalnoj platformi

Author: Švogor Ivan
Publication venue: University of Zagreb. Faculty of Organization and Informatics Varaždin.
Publication date: 29/04/2016
Field of study

Faculty of Organization and Informatics - Digital Repository

Croatian Digital Dissertations Repository

University of Zagreb Repository

Performance analysis and optimization of the Java memory system

Author: Lebsack Carl Stephen
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2008
Field of study

Digital Repository @ Iowa State University (ISU)