14 research outputs found

    FPGA-BASED MULTI-CORE PROCESSOR

    Get PDF
    The paper presents the results of investigations concerning the possibilities of using programmable logic devices (FPGA) for building virtual multi-core processors dedicated to the chosen application. The paper shows the designed architecture of multi-core processor specialized for performing a particular task and discuss its computation efficiency depending on the number of cores being used. The evaluation of the results are also discussed

    Revisiting the high-performance reconfigurable computing for future datacenters

    Get PDF
    Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects-discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well

    Класифікація та архітектурні особливості програмованих мультипроцесорних систем-на-кристалі

    Get PDF
    Provided general information on embedded multiprocessor systems-on-chip based on FPGA (FPGA-MPSoC). Completed a comprehensive analysis of the architectural features and provided Shih rock classification FPGA-MPSoC. Powered overview of recent research in the development of FPGA-MPSoC. A wide circle of such systems in order to study trends in architecture and all problems solvedПредоставлено общую информацию о встроенных мультипроцессорных систем-на-кристалле на базе ПЛИС (FPGA-MPSoC). Выполнено всесторонний анализ архитектурных особенностей и предоставлена ​​широкая классификация FPGA-MPSoC. Приведены обзор последних исследований в области разработки FPGA-MPSoC. Представлен широкий круг таких систем с целью исследования всех тенденциях архитектуры и решаемых задачПредоставлено общую информацию о встроенных мультипроцессорных систем-на-кристалле на базе ПЛИС (FPGA-MPSoC). Выполнено всесторонний анализ архитектурных особенностей и предоставлена ​​широкая классификация FPGA-MPSoC. Приведены обзор последних исследований в области разработки FPGA-MPSoC. Представлен широкий круг таких систем с целью исследования всех тенденциях архитектуры и решаемых зада

    Optimising runtime reconfigurable designs for high performance applications

    Get PDF
    This thesis proposes novel optimisations for high performance runtime reconfigurable designs. For a reconfigurable design, the proposed approach investigates idle resources introduced by static design approaches, and exploits runtime reconfiguration to eliminate the inefficient resources. The approach covers the circuit level, the function level, and the system level. At the circuit level, a method is proposed for tuning reconfigurable designs with two analytical models: a resource model for computational and memory resources and memory bandwidth, and a performance model for estimating execution time. This method is applied to tuning implementations of finite-difference algorithms, optimising arithmetic operators and memory bandwidth based on algorithmic parameters, and eliminating idle resources by runtime reconfiguration. At the function level, a method is proposed to automatically identify and exploit runtime reconfiguration opportunities while optimising resource utilisation. The method is based on Reconfiguration Data Flow Graph, a new hierarchical graph structure enabling runtime reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and runtime solution generation. At the system level, a method is proposed for optimising reconfigurable designs by dynamically adapting the designs to available runtime resources in a reconfigurable system. This method includes two steps: compile-time optimisation and runtime scaling, which enable efficient workload distribution, asynchronous communication scheduling, and domain-specific optimisations. It can be used in developing effective servers for high performance applications.Open Acces

    FedBiometric: Image Features Based Biometric Presentation Attack Detection Using Hybrid CNNs-SVM in Federated Learning

    Get PDF
    In the past few years, biometric identification systems have become popular for personal, national, and global security. In addition to other biometric modalities, facial and fingerprint recognition have gained popularity due to their uniqueness, stability, convenience, and cost-effectiveness compared to other biometric modalities. However, the evolution of fake biometrics, such as printed materials, 2D or 3D faces, makeup, and cosmetics, has brought new challenges. As a result of these modifications, several facial and fingerprint Presentation Attack Detection methods have been proposed to distinguish between live and spoof faces or fingerprints. Federated learning can play a significant role in this problem due to its distributed learning setting and privacy-preserving advantages. This work proposes a hybrid ResNet50-SVM based federated learning model for facial Presentation Attack Detection utilizing Local Binary Pattern (LBP), or Gabor filter-based extracted image features. For fingerprint Presentation Attack Detection (PAD), this work proposes a hybrid CNN-SVM based federated learning model utilizing Local Binary Pattern (LBP), or Histograms of Oriented Gradient (HOG)-based extracted image features

    Parallel architectures and runtime systems co-design for task-based programming models

    Get PDF
    The increasing parallelism levels in modern computing systems has extolled the need for a holistic vision when designing multiprocessor architectures taking in account the needs of the programming models and applications. Nowadays, system design consists of several layers on top of each other from the architecture up to the application software. Although this design allows to do a separation of concerns where it is possible to independently change layers due to a well-known interface between them, it is hampering future systems design as the Law of Moore reaches to an end. Current performance improvements on computer architecture are driven by the shrinkage of the transistor channel width, allowing faster and more power efficient chips to be made. However, technology is reaching physical limitations were the transistor size will not be able to be reduced furthermore and requires a change of paradigm in systems design. This thesis proposes to break this layered design, and advocates for a system where the architecture and the programming model runtime system are able to exchange information towards a common goal, improve performance and reduce power consumption. By making the architecture aware of runtime information such as a Task Dependency Graph (TDG) in the case of dataflow task-based programming models, it is possible to improve power consumption by exploiting the critical path of the graph. Moreover, the architecture can provide hardware support to create such a graph in order to reduce the runtime overheads and making possible the execution of fine-grained tasks to increase the available parallelism. Finally, the current status of inter-node communication primitives can be exposed to the runtime system in order to perform a more efficient communication scheduling, and also creates new opportunities of computation and communication overlap that were not possible before. An evaluation of the proposals introduced in this thesis is provided and a methodology to simulate and characterize the application behavior is also presented.El aumento del paralelismo proporcionado por los sistemas de cómputo modernos ha provocado la necesidad de una visión holística en el diseño de arquitecturas multiprocesador que tome en cuenta las necesidades de los modelos de programación y las aplicaciones. Hoy en día el diseño de los computadores consiste en diferentes capas de abstracción con una interfaz bien definida entre ellas. Las limitaciones de esta aproximación junto con el fin de la ley de Moore limitan el potencial de los futuros computadores. La mayoría de las mejoras actuales en el diseño de los computadores provienen fundamentalmente de la reducción del tamaño del canal del transistor, lo cual permite chips más rápidos y con un consumo eficiente sin apenas cambios fundamentales en el diseño de la arquitectura. Sin embargo, la tecnología actual está alcanzando limitaciones físicas donde no será posible reducir el tamaño de los transistores motivando así un cambio de paradigma en la construcción de los computadores. Esta tesis propone romper este diseño en capas y abogar por un sistema donde la arquitectura y el sistema de tiempo de ejecución del modelo de programación sean capaces de intercambiar información para alcanzar una meta común: La mejora del rendimiento y la reducción del consumo energético. Haciendo que la arquitectura sea consciente de la información disponible en el modelo de programación, como puede ser el grafo de dependencias entre tareas en los modelos de programación dataflow, es posible reducir el consumo energético explotando el camino critico del grafo. Además, la arquitectura puede proveer de soporte hardware para crear este grafo con el objetivo de reducir el overhead de construir este grado cuando la granularidad de las tareas es demasiado fina. Finalmente, el estado de las comunicaciones entre nodos puede ser expuesto al sistema de tiempo de ejecución para realizar una mejor planificación de las comunicaciones y creando nuevas oportunidades de solapamiento entre cómputo y comunicación que no eran posibles anteriormente. Esta tesis aporta una evaluación de todas estas propuestas, así como una metodología para simular y caracterizar el comportamiento de las aplicacionesPostprint (published version

    Optimal program variant generation for hybrid manycore systems

    Get PDF
    Field Programmable Gate Arrays promise to deliver superior energy efficiency in heterogeneous high performance computing, as compared to multicore CPUs and GPUs. The rate of adoption is however hampered by the relative difficulty of programming FPGAs. High-level synthesis tools such as Xilinx Vivado, Altera OpenCL or Intel's HLS address a large part of the programmability issue by synthesizing a Hardware Description Languages representation from a high-level specification of the application, given in programming languages such as OpenCL C, typically used to program CPUs and GPUs. Although HLS solutions make programming easier, they fail to also lighten the burden of optimization. Application developers must rely on expert knowledge to manually optimize their applications for each target device, meaning that traditional HLS solutions do not offer a solution to the issue of performance portability. This state of fact prompted the development of compiler frameworks such as TyTra that operate at an even higher level of abstraction that is amenable to the use of Design Space Exploration (DSE). With DSE the initial program specification can be seen as the starting location in a search-space of correct-by-construction program transformations. In TyTra the search-space is generated from the transitive-closure of term-level transformations derived from type-level transformations. Compiler frameworks such as TyTra theoretically solve the issue of performance portability by providing a way to automatically generate alternative correct program variants. They however suffer from the very practical issue that the generated space is often too large to fully explore. As a consequence, the globally optimal solution may be overlooked. In this work we provide a novel solution to issue performance portability by deriving an efficient yet effective DSE strategy for the TyTra compiler framework. We make use of categorical data types to derive categorical semantics for the formal languages that describe the terms, types, cost-performance estimates and their transformations. From these we define a category of interpretations for TyTra applications, from which we derive a DSE strategy that finds the globally optimal transformation sequence in polynomial time. This is achieved by reducing the size of the generated search space. We formally state and prove a theorem for this claim and then show that the polynomial run-time for our DSE strategy has practically negligible coefficients leading to sub-second exploration times for realistic applications

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Securing the Internet with digital signatures

    Get PDF
    The security and reliability of the Internet are essential for many functions of a modern society. Currently, the Internet lacks efficient network level security solutions and is vulnerable to various attacks, especially to distributed denial-of-service attacks. Traditional end-to-end security solutions such as IPSec only protect the communication end-points and are not effective if the underlying network infrastructure is attacked and paralyzed. This thesis describes and evaluates Packet Level Authentication (PLA), which is a novel method to secure the network infrastructure and provide availability with public key digital signatures. PLA allows any node in the network to verify independently the authenticity and integrity of every received packet, without previously established relationships with the sender or intermediate nodes that have handled the packet. As a result, various attacks against the network and its users can be more easily detected and mitigated, before they can cause significant damage or disturbance. PLA is compatible with the existing Internet infrastructure, and can be used with complementary end-to-end security solutions, such as IPSec and HIP. While PLA was originally designed for securing current IP networks, it is also suitable for securing future data-oriented networking approaches. PLA has been designed to scale from lightweight wireless devices to Internet core network, which is a challenge since public key cryptography operations are very resource intensive. Nevertheless, this work shows that digital signature algorithms and their hardware implementations developed for PLA are scalable to fast core network routers. Furthermore, the additional energy consumption of cryptographic operations is significantly lower than the energy cost of wireless transmission, making PLA feasible for lightweight wireless devices. Digital signature algorithms used by PLA also offer small key and signature sizes and therefore PLA's bandwidth overhead is relatively low. Strong security mechanisms offered by PLA can also be utilized for various other tasks. This work investigates how PLA can be utilized for controlling incoming connections, secure user authentication and billing, and for providing a strong accountability without an extensive data retention by network service providers

    Pseudo-contractions as Gentle Repairs

    Get PDF
    Updating a knowledge base to remove an unwanted consequence is a challenging task. Some of the original sentences must be either deleted or weakened in such a way that the sentence to be removed is no longer entailed by the resulting set. On the other hand, it is desirable that the existing knowledge be preserved as much as possible, minimising the loss of information. Several approaches to this problem can be found in the literature. In particular, when the knowledge is represented by an ontology, two different families of frameworks have been developed in the literature in the past decades with numerous ideas in common but with little interaction between the communities: applications of AGM-like Belief Change and justification-based Ontology Repair. In this paper, we investigate the relationship between pseudo-contraction operations and gentle repairs. Both aim to avoid the complete deletion of sentences when replacing them with weaker versions is enough to prevent the entailment of the unwanted formula. We show the correspondence between concepts on both sides and investigate under which conditions they are equivalent. Furthermore, we propose a unified notation for the two approaches, which might contribute to the integration of the two areas
    corecore