Search CORE

1,307 research outputs found

Methodology for complex dataflow application development

Author: Voss Nils
Publication venue: Computing, Imperial College London
Publication date: 01/06/2021
Field of study

This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist. The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance. The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms. The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.Open Acces

Spiral - Imperial College Digital Repository

Supercomputing Frontiers

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/07/2022
Field of study

This open access book constitutes the refereed proceedings of the 7th Asian Conference Supercomputing Conference, SCFA 2022, which took place in Singapore in March 2022. The 8 full papers presented in this book were carefully reviewed and selected from 21 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling

Directory of Open Access Books (DOAB)

MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

Author: Caballero J.A.
Dorado G.
Díaz David
Esteban F.J.
Guevara A
Gálvez Sergio
Hernández Pilar
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications

Repositorio Institucional de la Universidad de Córdoba

Directory of Open Access Journals

PubMed Central

Digital.CSIC

Recent Advances in Embedded Computing, Intelligence and Applications

Author
Publication venue: 'MDPI AG'
Publication date: 21/06/2022
Field of study

The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

Directory of Open Access Books (DOAB)

Sustainable modular IoT solution for smart cities applications supported by machine learning algorithms

Author: Glória André Filipe Xavier da
Publication venue
Publication date: 08/11/2021
Field of study

The Internet of Things (IoT) and Smart Cities are nowadays a big trend, but with the proliferation of these systems several challenges start to appear and put in jeopardy the acceptance by the population, mainly in terms of sustainability and environmental issues. This Thesis introduces a new system composed by a modular IoT smart node that is self-configurable and sustainable with the support of machine learning techniques, as well as the research and development to achieve a innovative solution considering data analysis, wireless communications and hardware and software development. For all these, concepts are introduced, research methodologies, tests and results are presented and discussed as well as the development and implementation. The developed research and methodology shows that Random Forest was the best choice for the data analysis in the self-configuration of the hardware and communication systems and that Edge Computing has an advantage in terms of energy efficiency and latency. The autonomous communication system was able to create a 65% more sustainable node, in terms of energy consumption, with only a 13% decrease in quality of service. The modular approach for the smart node presented advantages in the integration, scalability and implementation of smart cities projects when facing traditional implementations, reducing up to 45% the energy consumption of the overall system and 60% of messages exchanged, without compromising the system performance. The deployment of this new system will help Smart Cities, in a worldwide fashion, to decrease their environmental issues and comply with rules and regulations to reduce CO2 emission.A Internet das Coisas (IoT) e as Cidades Inteligentes são hoje uma grande tendência, mas com a rápida evolução destes sistemas são vários os desafios que põem em causa a sua aceitação por parte das populações, maioritariamente devido a problemas ambientais e de sustentabilidade. Esta Tese introduz um novo sistema composto por nós de IoT inteligentes que são auto-configuáveis e sustentáveis suportados por de aprendizagem automática, e o trabalho de investigação e desenvolvimento para se obter uma solução inovadora que considera a análise de dados, comunicações sem fios e o desenvolvimento do hardware e software. Para todos estes, os conceitos chave são introduzidos, as metodologias de investigação, testes e resultados são apresentados e discutidos, bem como todo o desenvolvimento e implementação. Através do trabalho desenvolvido mostra-se que as Árvores Aleatórias são a melhor escolha para análise de dados em termos da autoconfiguração do hardware e sistema de comunicações e que a computação nos nós tem uma vantagem em termos de eficiência energética e latência. O sistema de configuração autónoma de comunicações foi capaz de criar um nós 65% mais sustentável, em termos en- ergéticos, comprometendo apenas em 13% a qualidade do servi ̧co. A solução modular do nó inteligente apresentou vantagens na integração, escalabilidade e implementação de projectos para Cidades Inteligentes quando comparado com soluções tradicionais, reduzindo em 45% o consumo energético e 60% a troca de mensagens, sem comprometer a qualidade do sistema. A implementação deste novo sistema irá ajudar as cidades inteligentes, em todo o mundo, a diminuir os seus problemas ambientais e a cumprir com as normas e regulamentos para reduzir as emissões de CO2

Repositório Institucional do ISCTE-IUL

Recommended from our members

General Purpose Programming on Modern Graphics Hardware

Author: Fleming Robert
Publication venue: 'University of North Texas Libraries'
Publication date: 01/05/2008
Field of study

I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming

UNT Digital Library

Final Project conclusions

Author: Allmen L., von
Ambroggi F., de
Andree M.
Ansari J.
Attalah L.
Bennebroek M.
Blasi D.
Bosisio A.
Bouwens F.
Carenini A.
Corongiu A.
Cugola G.
Decotignie J.D.
Flügel C.
Fohler G.
Fraboulet A.
Garcia O.
Gomez L.
Hauspie M.
Hogewerf P.H.
Ingelrest F.
Karl H.
Lachenman A.
Lo B.
Lokhorst C.
Lukkien J.
Neugebauer M.
Oliver R.
Riemer B.
Schuster M.
Steine M.
Stocklöw C.
Stok P., van der
Verhoeven R.
Publication venue: Philips
Publication date
Field of study

Wageningen University & Research Publications

Statistical speech translation system based on voice recognition optimization using multimodal sources of knowledge and characteristics vectors

Author: Alejandro Canovas
Amengual
Amengual
Baum
Blatz
Brown
Brown
Brown
Casacuberta
Casacuberta
Casacuberta
Duda
Dymetman
Hao Shi
Jaime Lloret
Jesus Tomás
Katz
Koehn
Koehn
Koehn
Koehn
Lavie
Loof
Marcu
Miguel García
Nakamura
Nedler
Ney
Ney
Och
Och
Och
Och
Och
Och
Och
San-Segundo
Sanchis
Tomas
Tomas
Tomas
Tomas
Tomás
Ueffing
Vogel
Wagner
Wang
Wolfgang
Zens
Zens
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/09/2013
Field of study

Synergic combination of different sources of knowledge is a key issue for the development of modern statistical translators. In this work, a speech translation statistical system that adds additional other-than-voice information in a voice translation system is presented. The additional information serves as a base for the log-linear combination of several statistical models. We describe the theoretical framework of the problem, summarize the overall architecture of the system, and show how the system is enhanced with the additional information. Our real prototype implements a real-time speech translation system from Spanish to English that is adapted to specific teaching-related environments.This work has been partially supported by the Generalitat Valenciana and the Universidad Politecnica de Valencia.Canovas Solbes, A.; Tomás Gironés, J.; Lloret, J.; García Pineda, M. (2013). Statistical speech translation system based on voice recognition optimization using multimodal sources of knowledge and characteristics vectors. Computer Standards and Interfaces. 35(5):490-506. doi:10.1016/j.csi.2012.09.003S49050635

Crossref

RiuNet

HAIL: An Algorithm for the Hardware Accelerated Identification of Languages, Master\u27s Thesis, May 2006

Author: Kastner Charles M.
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2006
Field of study

This thesis examines in detail the Hardware-Accelerated Identification of Languages (HAIL) project. The goal of HAIL is to provide an accurate means to identify the language and encoding used in streaming content, such as documents passed over a high-speed network. HAIL has been implemented on the Field-programmable Port eXtender (FPX), an open hardware platform developed at Washington University in St. Louis. HAIL can accurately identify the primary languages and encodings used in text at rates much higher than what can be achieved by software algorithms running on microprocessors

Washington University St. Louis: Open Scholarship

Profile-directed specialisation of custom floating-point hardware

Author: Brown Ashley W.
Brown Ashley W.
Publication venue: Computing, Imperial College London
Publication date: 01/05/2010
Field of study

We present a methodology for generating floating-point arithmetic hardware designs which are, for suitable applications, much reduced in size, while still retaining performance and IEEE-754 compliance. Our system uses three key parts: a profiling tool, a set of customisable floating-point units and a selection of system integration methods. We use a profiling tool for floating-point behaviour to identify arithmetic operations where fundamental elements of IEEE-754 floating-point may be compromised, without generating erroneous results in the common case. In the uncommon case, we use simple detection logic to determine when operands lie outside the range of capabilities of the optimised hardware. Out-of-range operations are handled by a separate, fully capable, floatingpoint implementation, either on-chip or by returning calculations to a host processor. We present methods of system integration to achieve this errorcorrection. Thus the system suffers no compromise in IEEE-754 compliance, even when the synthesised hardware would generate erroneous results. In particular, we identify from input operands the shift amounts required for input operand alignment and post-operation normalisation. For operations where these are small, we synthesise hardware with reduced-size barrel-shifters. We also propose optimisations to take advantage of other profile-exposed behaviours, including removing the hardware required to swap operands in a floating-point adder or subtractor, and reducing the exponent range to fit observed values. We present profiling results for a range of applications, including a selection of computational science programs, Spec FP 95 benchmarks and the FFMPEG media processing tool, indicating which would be amenable to our method. Selected applications which demonstrate potential for optimisation are then taken through to a hardware implementation. We show up to a 45% decrease in hardware size for a floating-point datapath, with a correctable error-rate of less then 3%, even with non-profiled datasets

Spiral - Imperial College Digital Repository