23 research outputs found

    Exploiting Reconfigurable SWP Operators for Multimedia Applications

    Get PDF
    International audienceImplementing image processing applications in embedded systems is a difficult challenge due to the drastic constraints in terms of cost, energy consumption and real time execution. Reconfigurable archi- tectures are good candidates to take-up this challenge and especially when the architecture is able to support different word-lengths of pixel through Sub-Word Parallelism (SWP) capabilities. Exploiting the diversity of supported data-types requires automation tools able to optimize the data word-length under an accuracy constraint. In this paper, a new approach for word-length optimization in the case of SWP operations is proposed. Compared to existing approaches the optimization time is significantly reduced without sacrificing the quality of the optimized solution. The results show the ability of our approach to exploit the SWP capabilities associated with multimedia processors

    Study and development of innovative strategies for energy-efficient cross-layer design of digital VLSI systems based on Approximate Computing

    Get PDF
    The increasing demand on requirements for high performance and energy efficiency in modern digital systems has led to the research of new design approaches that are able to go beyond the established energy-performance tradeoff. Looking at scientific literature, the Approximate Computing paradigm has been particularly prolific. Many applications in the domain of signal processing, multimedia, computer vision, machine learning are known to be particularly resilient to errors occurring on their input data and during computation, producing outputs that, although degraded, are still largely acceptable from the point of view of quality. The Approximate Computing design paradigm leverages the characteristics of this group of applications to develop circuits, architectures, algorithms that, by relaxing design constraints, perform their computations in an approximate or inexact manner reducing energy consumption. This PhD research aims to explore the design of hardware/software architectures based on Approximate Computing techniques, filling the gap in literature regarding effective applicability and deriving a systematic methodology to characterize its benefits and tradeoffs. The main contributions of this work are: -the introduction of approximate memory management inside the Linux OS, allowing dynamic allocation and de-allocation of approximate memory at user level, as for normal exact memory; - the development of an emulation environment for platforms with approximate memory units, where faults are injected during the simulation based on models that reproduce the effects on memory cells of circuital and architectural techniques for approximate memories; -the implementation and analysis of the impact of approximate memory hardware on real applications: the H.264 video encoder, internally modified to allocate selected data buffers in approximate memory, and signal processing applications (digital filter) using approximate memory for input/output buffers and tap registers; -the development of a fully reconfigurable and combinatorial floating point unit, which can work with reduced precision formats

    Segment Routing: a Comprehensive Survey of Research Activities, Standardization Efforts and Implementation Results

    Full text link
    Fixed and mobile telecom operators, enterprise network operators and cloud providers strive to face the challenging demands coming from the evolution of IP networks (e.g. huge bandwidth requirements, integration of billions of devices and millions of services in the cloud). Proposed in the early 2010s, Segment Routing (SR) architecture helps face these challenging demands, and it is currently being adopted and deployed. SR architecture is based on the concept of source routing and has interesting scalability properties, as it dramatically reduces the amount of state information to be configured in the core nodes to support complex services. SR architecture was first implemented with the MPLS dataplane and then, quite recently, with the IPv6 dataplane (SRv6). IPv6 SR architecture (SRv6) has been extended from the simple steering of packets across nodes to a general network programming approach, making it very suitable for use cases such as Service Function Chaining and Network Function Virtualization. In this paper we present a tutorial and a comprehensive survey on SR technology, analyzing standardization efforts, patents, research activities and implementation results. We start with an introduction on the motivations for Segment Routing and an overview of its evolution and standardization. Then, we provide a tutorial on Segment Routing technology, with a focus on the novel SRv6 solution. We discuss the standardization efforts and the patents providing details on the most important documents and mentioning other ongoing activities. We then thoroughly analyze research activities according to a taxonomy. We have identified 8 main categories during our analysis of the current state of play: Monitoring, Traffic Engineering, Failure Recovery, Centrally Controlled Architectures, Path Encoding, Network Programming, Performance Evaluation and Miscellaneous...Comment: SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIAL

    Customising compilers for customisable processors

    Get PDF
    The automatic generation of instruction set extensions to provide application-specific acceleration for embedded processors has been a productive area of research in recent years. There have been incremental improvements in the quality of the algorithms that discover and select which instructions to add to a processor. The use of automatic algorithms, however, result in instructions which are radically different from those found in conventional, human-designed, RISC or CISC ISAs. This has resulted in a gap between the hardware’s capabilities and the compiler’s ability to exploit them. This thesis proposes and investigates the use of a high-level compiler pass that uses graph-subgraph isomorphism checking to exploit these complex instructions. Operating in a separate pass permits techniques to be applied that are uniquely suited for mapping complex instructions, but unsuitable for conventional instruction selection. The existing, mature, compiler back-end can then handle the remainder of the compilation. With this method, the high-level pass was able to use 1965 different automatically produced instructions to obtain an initial average speed-up of 1.11x over 179 benchmarks evaluated on a hardware-verified cycle-accurate simulator. This result was improved following an investigation of how the produced instructions were being used by the compiler. It was established that the models the automatic tools were using to develop instructions did not take account of how well the compiler could realistically use them. Adding additional parameters to the search heuristic to account for compiler issues increased the speed-up from 1.11x to 1.24x. An alternative approach using a re-designed hardware interface was also investigated and this achieved a speed-up of 1.26x while reducing hardware and compiler complexity. A complementary, high-level, method of exploiting dual memory banks was created to increase memory bandwidth to accommodate the increased data-processing bandwidth provided by extension instructions. Finally, the compiler was considered for use in a non-conventional role where rather than generating code it is used to apply source-level transformations prior to the generation of extension instructions and thus affect the shape of the instructions that are generated

    Architectures and technologies for quality of service provisioning in next generation networks

    Get PDF
    A NGN is a telecommunication network that differs from classical dedicated networks because of its capability to provide voice, video, data and cellular services on the same infrastructure (Quadruple-Play). The ITU-T standardization body has defined the NGN architecture in three different and well-defined strata: the transport stratum which takes care of maintaining end-to-end connectivity, the service stratum that is responsible for enabling the creation and the delivery of services, and finally the application stratum where applications can be created and executed. The most important separation in this architecture is relative to transport and service stratum. The aim is to enable the flexibility to add, maintain and remove services without any impact on the transport layer; to enable the flexibility to add, maintain and remove transport technologies without any impact on the access to service, application, content and information; and finally the efficient cohesistence of multiple terminals, access technologies and core transport technologies. The Service Oriented Architecture (SOA) is a paradigm often used in systems deployment and integration for organizing and utilizing distributed capabilities under the control of different ownership domains. In this thesis, the SOA technologies in network architetures are surveyed following the NGN functional architecture as defined by the ITU-T. Within each stratum, the main logical functions that have been the subject of investigation according to a service-oriented approach have been highlighted. Moreover, a new definition of the NGN transport stratum functionalities according to the SOA paradigm is proposed; an implementation of the relevant services interfaces to analyze this approach with experimental results shows some insight on the potentialities of the proposed strategy. Within NGN architectures research topic, especially in IP-based network architectures, Traffic Engineering (TE) is referred to as a set of policies and algorithms aimed at balancing network traffic load so as to improve network resource utilization and guarantee the service specific end-to-end QoS. DS-TE technology extends TE functionalities to a per-class basis implementation by introducing a higher level of traffic classification which associates to each class type (CT) a constraint on bandwidth utilization. These constraints are set by defining and configuring a bandwidth constraint (BC) model whih drives resource utilization aiming to higher load balancing, higher QoS performance and lower call blocking rate. Default TE implementations relies on a centralized approach to bandwidth and routing management, that require external management entities which periodically collect network status information and provide management actions. However, due to increasing network complexity, it is desiderable that nodes automatically discover their environment, self-configure and update to adapt to changes. In this thesis the bandwidth management problem is approached adopting an autonomic and distributed approach. Each node has a self-management module, which monitors the unreserved bandwidth in adjacent nodes and adjusts the local bandwidth constraints so as to reduce the differences in the unreserved bandwidth of neighbor nodes. With this distributed and autonomic algorithm, BC are dinamically modified to drive routing decision toward the traffic balancing respecting the QoS constraints for each class-type traffic requests. Finally, Video on Demand (VoD) is a service that provides a video whenever the customer requests it. Realizing a VoD system by means of the Internet network requires architectures tailored to video features such as guaranteed bandwidths and constrained transmission delays: these are hard to be provided in the traditional Internet architecture that is not designed to provide an adequate quality of service (QoS) and quality of experience (QoE) to the final user. Typical VoD solutions can be grouped in four categories: centralized, proxy-based, Content Delivery Network(CDN) and Hybrid architectures. Hybrid architectures combine the employment of a centralized server with that of a Peer-to-peer (P2P) network. This approach can effectively reduce the server load and avoid network congestions close to the server site because the peers support the delivery of the video to other peers using a cache-and-relay strategy making use of their upload bandwidth. Anyway, in a peer-to-peer network each peer is free to join and leave the network without notice, bringing to the phenomena of peer churns. These dynamics are dangerous for VoD architectures, affecting the integrity and retainability of the service. In this thesis, a study aimed to evaluate the impact of the peer churn on the system performance is proposed. Starting from important relationships between system parameters such as playback buffer length, peer request rate, peer average lifetime and server upload rate, four different analytic models are proposed

    Architectures and technologies for quality of service provisioning in next generation networks

    Get PDF
    A NGN is a telecommunication network that differs from classical dedicated networks because of its capability to provide voice, video, data and cellular services on the same infrastructure (Quadruple-Play). The ITU-T standardization body has defined the NGN architecture in three different and well-defined strata: the transport stratum which takes care of maintaining end-to-end connectivity, the service stratum that is responsible for enabling the creation and the delivery of services, and finally the application stratum where applications can be created and executed. The most important separation in this architecture is relative to transport and service stratum. The aim is to enable the flexibility to add, maintain and remove services without any impact on the transport layer; to enable the flexibility to add, maintain and remove transport technologies without any impact on the access to service, application, content and information; and finally the efficient cohesistence of multiple terminals, access technologies and core transport technologies. The Service Oriented Architecture (SOA) is a paradigm often used in systems deployment and integration for organizing and utilizing distributed capabilities under the control of different ownership domains. In this thesis, the SOA technologies in network architetures are surveyed following the NGN functional architecture as defined by the ITU-T. Within each stratum, the main logical functions that have been the subject of investigation according to a service-oriented approach have been highlighted. Moreover, a new definition of the NGN transport stratum functionalities according to the SOA paradigm is proposed; an implementation of the relevant services interfaces to analyze this approach with experimental results shows some insight on the potentialities of the proposed strategy. Within NGN architectures research topic, especially in IP-based network architectures, Traffic Engineering (TE) is referred to as a set of policies and algorithms aimed at balancing network traffic load so as to improve network resource utilization and guarantee the service specific end-to-end QoS. DS-TE technology extends TE functionalities to a per-class basis implementation by introducing a higher level of traffic classification which associates to each class type (CT) a constraint on bandwidth utilization. These constraints are set by defining and configuring a bandwidth constraint (BC) model whih drives resource utilization aiming to higher load balancing, higher QoS performance and lower call blocking rate. Default TE implementations relies on a centralized approach to bandwidth and routing management, that require external management entities which periodically collect network status information and provide management actions. However, due to increasing network complexity, it is desiderable that nodes automatically discover their environment, self-configure and update to adapt to changes. In this thesis the bandwidth management problem is approached adopting an autonomic and distributed approach. Each node has a self-management module, which monitors the unreserved bandwidth in adjacent nodes and adjusts the local bandwidth constraints so as to reduce the differences in the unreserved bandwidth of neighbor nodes. With this distributed and autonomic algorithm, BC are dinamically modified to drive routing decision toward the traffic balancing respecting the QoS constraints for each class-type traffic requests. Finally, Video on Demand (VoD) is a service that provides a video whenever the customer requests it. Realizing a VoD system by means of the Internet network requires architectures tailored to video features such as guaranteed bandwidths and constrained transmission delays: these are hard to be provided in the traditional Internet architecture that is not designed to provide an adequate quality of service (QoS) and quality of experience (QoE) to the final user. Typical VoD solutions can be grouped in four categories: centralized, proxy-based, Content Delivery Network(CDN) and Hybrid architectures. Hybrid architectures combine the employment of a centralized server with that of a Peer-to-peer (P2P) network. This approach can effectively reduce the server load and avoid network congestions close to the server site because the peers support the delivery of the video to other peers using a cache-and-relay strategy making use of their upload bandwidth. Anyway, in a peer-to-peer network each peer is free to join and leave the network without notice, bringing to the phenomena of peer churns. These dynamics are dangerous for VoD architectures, affecting the integrity and retainability of the service. In this thesis, a study aimed to evaluate the impact of the peer churn on the system performance is proposed. Starting from important relationships between system parameters such as playback buffer length, peer request rate, peer average lifetime and server upload rate, four different analytic models are proposed

    FPGAs in Bioinformatics: Implementation and Evaluation of Common Bioinformatics Algorithms in Reconfigurable Logic

    Get PDF
    Life. Much effort is taken to grant humanity a little insight in this fascinating and complex but fundamental topic. In order to understand the relations and to derive consequences humans have begun to sequence their genomes, i.e. to determine their DNA sequences to infer information, e.g. related to genetic diseases. The process of DNA sequencing as well as subsequent analysis presents a computational challenge for recent computing systems due to the large amounts of data alone. Runtimes of more than one day for analysis of simple datasets are common, even if the process is already run on a CPU cluster. This thesis shows how this general problem in the area of bioinformatics can be tackled with reconfigurable hardware, especially FPGAs. Three compute intensive problems are highlighted: sequence alignment, SNP interaction analysis and genotype imputation. In the area of sequence alignment the software BLASTp for protein database searches is exemplarily presented, implemented and evaluated.SNP interaction analysis is presented with three applications performing an exhaustive search for interactions including the corresponding statistical tests: BOOST, iLOCi and the mutual information measurement. All applications are implemented in FPGA-hardware and evaluated, resulting in an impressive speedup of more than in three orders of magnitude when compared to standard computers. The last topic of genotype imputation presents a two-step process composed of the phasing step and the actual imputation step. The focus lies on the phasing step which is targeted by the SHAPEIT2 application. SHAPEIT2 is discussed with its underlying mathematical methods in detail, and finally implemented and evaluated. A remarkable speedup of 46 is reached here as well

    NFC based remote control of services for interactive spaces

    Full text link
    Ubiquitous computing (one person, many computers) is the third era in the history of computing. It follows the mainframe era (many people, one computer) and the PC era (one person, one computer). Ubiquitous computing empowers people to communicate with services by interacting with their surroundings. Most of these so called smart environments contain sensors sensing users’ actions and try to predict the users’ intentions and necessities based on sensor data. The main drawback of this approach is that the system might perform unexpected or unwanted actions, making the user feel out of control. In this master thesis we propose a different procedure based on Interactive Spaces: instead of predicting users’ intentions based on sensor data, the system reacts to users’ explicit predefined actions. To that end, we present REACHeS, a server platform which enables communication among services, resources and users located in the same environment. With REACHeS, a user controls services and resources by interacting with everyday life objects and using a mobile phone as a mediator between himself/herself, the system and the environment. REACHeS’ interfaces with a user are built upon NFC (Near Field Communication) technology. NFC tags are attached to objects in the environment. A tag stores commands that are sent to services when a user touches the tag with his/her NFC enabled device. The prototypes and usability tests presented in this thesis show the great potential of NFC to build such user interfaces

    A predictive fault-tolerance framework for IoT systems

    Get PDF
    As Internet of Things (IoT) systems scale, attributes such as availability, reliability, safety, maintainability, security, and performance become increasingly more important. A key challenge to realise IoT is how to provide a dependable infrastructure for the billions of expected IoT devices. A dependable IoT system is one that can defensibly be trusted to deliver its intended service within a given time period. To define a FT-support solution that is applicable to all IoT systems, it is important that error definition is a generic, language-agnostic process, so that FT can be applied as a software pattern. It must also be interoperable, so that FT support can be easily 'plugged into' any existing IoT system, which is facilitated by an adherence to standards and protocols. Lastly, it is important that FT support is, itself, fault tolerant, so that it can be depended on to provide correct support for IoT systems. The work in this thesis considers how real-time and historical data analysis techniques can be combined to monitor an IoT environment and analyse its short- and long-term data to make the system as resilient to failure as possible. Specifically, complex event processing (CEP) is proposed for real-time error detection based on the analysis of stream data in an IoT system, where errors are defined as nondeterministic finite automata (NFA). For long-term error analysis, machine learning (ML) is proposed to predict when an error is likely to occur and mitigate imminent system faults based on previous experience of erroneous system behaviour in the IoT system. The contribution is threefold: (1) a language-agnostic approach to error definition using NFAs, designed to provide 'FT as a service' for easy deployment and integration into existing IoT systems; (2) an implementation of NFAs on a bespoke CEP system, BoboCEP, that provides distributed, resilient event processing at the network edge via active replication; and (3) a ML approach to intelligent FT that can learn from system errors over time to ensure correct long-term FT support. The proposed solution was evaluated using two vertical-farming testbeds and a dataset from a real-world vertical farm. Results showed that the proposed solution could detect and predict the successful detection and recovery of erroneous system behaviours. A performance analysis of BoboCEP was conducted with favourable results

    FPGAs in der Bioinformatik: Implementierung und Evaluierung bekannter bioinformatischer Algorithmen in rekonfigurierbarer Logik

    Get PDF
    Life. Much effort is taken to grant humanity a little insight in this fascinating and complex but fundamental topic. In order to understand the relations and to derive consequences humans have begun to sequence their genomes, i.e. to determine their DNA sequences to infer information, e.g. related to genetic diseases. The process of DNA sequencing as well as subsequent analysis presents a computational challenge for recent computing systems due to the large amounts of data alone. Runtimes of more than one day for analysis of simple datasets are common, even if the process is already run on a CPU cluster. This thesis shows how this general problem in the area of bioinformatics can be tackled with reconfigurable hardware, especially FPGAs. Three compute intensive problems are highlighted: sequence alignment, SNP interaction analysis and genotype imputation. In the area of sequence alignment the software BLASTp for protein database searches is exemplarily presented, implemented and evaluated. SNP interaction analysis is presented with three applications performing an exhaustive search for interactions including the corresponding statistical tests: BOOST, iLOCi and the mutual information measurement. All applications are implemented in FPGA-hardware and evaluated, resulting in an impressive speedup of more than in three orders of magnitude when compared to standard computers. The last topic of genotype imputation presents a two-step process composed of the phasing step and the actual imputation step. The focus lies on the phasing step which is targeted by the SHAPEIT2 application. SHAPEIT2 is discussed with its underlying mathematical methods in detail, and finally implemented and evaluated. A remarkable speedup of 46 is reached here as well.Das Leben. Sehr viel Aufwand wird getrieben um der Menschheit einen Einblick in dieses faszinierende und komplexe, aber fundamentale Thema zu erlauben. Um ZusammenhĂ€nge zu verstehen und Folgen ableiten zu können hat der Mensch begonnen sein Genom zu sequenzieren, d.h. seine DNA zu bestimmen um daraus Informationen, z.B. in Bezug auf Erbkrankheiten folgern zu können. Der Prozess der DNA-Sequenzierung sowie die darauffolgenden Analysen sind schon allein wegen der riesigen Datenmengen eine Herausforderung fĂŒr aktuelle Rechensysteme. Laufzeiten von ĂŒber einen Tag fĂŒr die Analyse einfacher DatensĂ€tze sind ĂŒblich, selbst wenn der Prozess bereits auf einem Computercluster ausgefĂŒhrt wird. Diese Arbeit zeigt, wie dieses gĂ€ngige Problem im Bereich der Bioinformatik mit rekonfigurierbarer Hardware, speziell FPGAs, angegangen werden kann. Es werden drei rechenintensive Themengebiete hervorgehoben: Sequenzalignment, SNP-Interaktionsanalyse und Genotyp-Imputation. Beispielhaft wird im Bereich des Sequenzalignments die Software BLASTp fĂŒr die Suche in Proteinsequenzdatenbanken vorgestellt, implementiert und evaluiert. Die SNP-Interaktionsanalyse wird mit drei Verfahren zur vollstĂ€ndigen Suche von Interaktionen inklusive des dazugehörigen statistischen Tests vorgestellt: BOOST, iLOCi und die Messung der Transinformation. Alle Verfahren werden auf FPGA-Hardware implementiert und evaluiert, mit einer bestechenden Beschleunigung im dreistelligen Bereich gegenĂŒber Standard-Rechnern. Das letzte Gebiet der Genotyp-Imputierung ist ein zweiteiliges Verfahren bestehend aus dem Phasing und der eigentlichen Imputation. Der Schwerpunkt liegt im Phasing-Schritt, der mit dem SHAPEIT2-Tool adressiert wird. SHAPEIT2 wird ausfĂŒhrlich mit den zugrunde liegenden mathematischen Methoden diskutiert, und schließlich implementiert und evaluiert. Auch hier wird ein beachtlicher Speedup von 46 erreicht