1,307 research outputs found

    Methodology for complex dataflow application development

    Get PDF
    This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist. The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance. The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms. The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.Open Acces

    Supercomputing Frontiers

    Get PDF
    This open access book constitutes the refereed proceedings of the 7th Asian Conference Supercomputing Conference, SCFA 2022, which took place in Singapore in March 2022. The 8 full papers presented in this book were carefully reviewed and selected from 21 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling

    MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

    Get PDF
    We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    Sustainable modular IoT solution for smart cities applications supported by machine learning algorithms

    Get PDF
    The Internet of Things (IoT) and Smart Cities are nowadays a big trend, but with the proliferation of these systems several challenges start to appear and put in jeopardy the acceptance by the population, mainly in terms of sustainability and environmental issues. This Thesis introduces a new system composed by a modular IoT smart node that is self-configurable and sustainable with the support of machine learning techniques, as well as the research and development to achieve a innovative solution considering data analysis, wireless communications and hardware and software development. For all these, concepts are introduced, research methodologies, tests and results are presented and discussed as well as the development and implementation. The developed research and methodology shows that Random Forest was the best choice for the data analysis in the self-configuration of the hardware and communication systems and that Edge Computing has an advantage in terms of energy efficiency and latency. The autonomous communication system was able to create a 65% more sustainable node, in terms of energy consumption, with only a 13% decrease in quality of service. The modular approach for the smart node presented advantages in the integration, scalability and implementation of smart cities projects when facing traditional implementations, reducing up to 45% the energy consumption of the overall system and 60% of messages exchanged, without compromising the system performance. The deployment of this new system will help Smart Cities, in a worldwide fashion, to decrease their environmental issues and comply with rules and regulations to reduce CO2 emission.A Internet das Coisas (IoT) e as Cidades Inteligentes sĂŁo hoje uma grande tendĂȘncia, mas com a rĂĄpida evolução destes sistemas sĂŁo vĂĄrios os desafios que pĂ”em em causa a sua aceitação por parte das populaçÔes, maioritariamente devido a problemas ambientais e de sustentabilidade. Esta Tese introduz um novo sistema composto por nĂłs de IoT inteligentes que sĂŁo auto-configuĂĄveis e sustentĂĄveis suportados por de aprendizagem automĂĄtica, e o trabalho de investigação e desenvolvimento para se obter uma solução inovadora que considera a anĂĄlise de dados, comunicaçÔes sem fios e o desenvolvimento do hardware e software. Para todos estes, os conceitos chave sĂŁo introduzidos, as metodologias de investigação, testes e resultados sĂŁo apresentados e discutidos, bem como todo o desenvolvimento e implementação. AtravĂ©s do trabalho desenvolvido mostra-se que as Árvores AleatĂłrias sĂŁo a melhor escolha para anĂĄlise de dados em termos da autoconfiguração do hardware e sistema de comunicaçÔes e que a computação nos nĂłs tem uma vantagem em termos de eficiĂȘncia energĂ©tica e latĂȘncia. O sistema de configuração autĂłnoma de comunicaçÔes foi capaz de criar um nĂłs 65% mais sustentĂĄvel, em termos en- ergĂ©ticos, comprometendo apenas em 13% a qualidade do servi ̧co. A solução modular do nĂł inteligente apresentou vantagens na integração, escalabilidade e implementação de projectos para Cidades Inteligentes quando comparado com soluçÔes tradicionais, reduzindo em 45% o consumo energĂ©tico e 60% a troca de mensagens, sem comprometer a qualidade do sistema. A implementação deste novo sistema irĂĄ ajudar as cidades inteligentes, em todo o mundo, a diminuir os seus problemas ambientais e a cumprir com as normas e regulamentos para reduzir as emissĂ”es de CO2

    Statistical speech translation system based on voice recognition optimization using multimodal sources of knowledge and characteristics vectors

    Full text link
    Synergic combination of different sources of knowledge is a key issue for the development of modern statistical translators. In this work, a speech translation statistical system that adds additional other-than-voice information in a voice translation system is presented. The additional information serves as a base for the log-linear combination of several statistical models. We describe the theoretical framework of the problem, summarize the overall architecture of the system, and show how the system is enhanced with the additional information. Our real prototype implements a real-time speech translation system from Spanish to English that is adapted to specific teaching-related environments.This work has been partially supported by the Generalitat Valenciana and the Universidad Politecnica de Valencia.Canovas Solbes, A.; Tomås Gironés, J.; Lloret, J.; García Pineda, M. (2013). Statistical speech translation system based on voice recognition optimization using multimodal sources of knowledge and characteristics vectors. Computer Standards and Interfaces. 35(5):490-506. doi:10.1016/j.csi.2012.09.003S49050635

    HAIL: An Algorithm for the Hardware Accelerated Identification of Languages, Master\u27s Thesis, May 2006

    Get PDF
    This thesis examines in detail the Hardware-Accelerated Identification of Languages (HAIL) project. The goal of HAIL is to provide an accurate means to identify the language and encoding used in streaming content, such as documents passed over a high-speed network. HAIL has been implemented on the Field-programmable Port eXtender (FPX), an open hardware platform developed at Washington University in St. Louis. HAIL can accurately identify the primary languages and encodings used in text at rates much higher than what can be achieved by software algorithms running on microprocessors

    Profile-directed specialisation of custom floating-point hardware

    No full text
    We present a methodology for generating floating-point arithmetic hardware designs which are, for suitable applications, much reduced in size, while still retaining performance and IEEE-754 compliance. Our system uses three key parts: a profiling tool, a set of customisable floating-point units and a selection of system integration methods. We use a profiling tool for floating-point behaviour to identify arithmetic operations where fundamental elements of IEEE-754 floating-point may be compromised, without generating erroneous results in the common case. In the uncommon case, we use simple detection logic to determine when operands lie outside the range of capabilities of the optimised hardware. Out-of-range operations are handled by a separate, fully capable, floatingpoint implementation, either on-chip or by returning calculations to a host processor. We present methods of system integration to achieve this errorcorrection. Thus the system suffers no compromise in IEEE-754 compliance, even when the synthesised hardware would generate erroneous results. In particular, we identify from input operands the shift amounts required for input operand alignment and post-operation normalisation. For operations where these are small, we synthesise hardware with reduced-size barrel-shifters. We also propose optimisations to take advantage of other profile-exposed behaviours, including removing the hardware required to swap operands in a floating-point adder or subtractor, and reducing the exponent range to fit observed values. We present profiling results for a range of applications, including a selection of computational science programs, Spec FP 95 benchmarks and the FFMPEG media processing tool, indicating which would be amenable to our method. Selected applications which demonstrate potential for optimisation are then taken through to a hardware implementation. We show up to a 45% decrease in hardware size for a floating-point datapath, with a correctable error-rate of less then 3%, even with non-profiled datasets
    • 

    corecore