2,369 research outputs found

    Hardware acceleration of the trace transform for vision applications

    Get PDF
    Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

    Noise-agnostic adaptive image filtering without training references on an evolvable hardware platform

    Get PDF
    One of the main concerns of evolvable and adaptive systems is the need of a training mechanism, which is normally done by using a training reference and a test input. The fitness function to be optimized during the evolution (training) phase is obtained by comparing the output of the candidate systems against the reference. The adaptivity that this type of systems may provide by re-evolving during operation is especially important for applications with runtime variable conditions. However, fully automated self-adaptivity poses additional problems. For instance, in some cases, it is not possible to have such reference, because the changes in the environment conditions are unknown, so it becomes difficult to autonomously identify which problem requires to be solved, and hence, what conditions should be representative for an adequate re-evolution. In this paper, a solution to solve this dependency is presented and analyzed. The system consists of an image filter application mapped on an evolvable hardware platform, able to evolve using two consecutive frames from a camera as both test and reference images. The system is entirely mapped in an FPGA, and native dynamic and partial reconfiguration is used for evolution. It is also shown that using such images, both of them being noisy, as input and reference images in the evolution phase of the system is equivalent or even better than evolving the filter with offline images. The combination of both techniques results in the completely autonomous, noise type/level agnostic filtering system without reference image requirement described along the paper

    Hardware support for real-time network security and packet classification using field programmable gate arrays

    Get PDF
    Deep packet inspection and packet classification are the most computationally expensive operations in a Network Intrusion Detection (NID) system. Deep packet inspection involves content matching where the payload of the incoming packets is matched against a set of signatures in the database. Packet classification involves inspection of the packet header fields and is basically a multi-dimensional matching problem. Any matching in software is very slow in comparison to current network speeds. Also, both of these problems need a solution which is scalable and can work at high speeds. Due to the high complexity of these matching problems, only Field-Programmable Gate Array (FPGA) or Application-Specific Integrated Circuit (ASIC) platforms can facilitate efficient designs. Two novel FPGA-based NID solutions were developed and implemented that not only carry out pattern matching at high speed but also allow changes to the set of stored patterns without resource/hardware reconfiguration; to their advantage, the solutions can easily be adopted by software or ASIC approaches as well. In both solutions, the proposed NID system can run while pattern updates occur. The designs can operate at 2.4 Gbps line rates, and have a memory consumption of around 17 bits per character and a logic cell usage of around 0.05 logic cells per character, which are the smallest compared to any other existing FPGA-based solution. In addition to these solutions for pattern matching, a novel packet classification algorithm was developed and implemented on a FPGA. The method involves a two-field matching process at a time that then combines the constituent results to identify longer matches involving more header fields. The design can achieve a throughput larger than 9.72 Gbps and has an on-chip memory consumption of around 256Kbytes when dealing with more than 10,000 rules (without using external RAM). This memory consumption is the lowest among all the previously proposed FPGA-based designs for packet classification

    Statically-analyzed stream monitoring for cyber-physical Systems

    Get PDF
    Cyber-physical systems are digital systems interacting with the physical world. Even though this induces an inherent complexity, they are responsible for safety-critical tasks like governing nuclear power plants or controlling autonomous vehicles. To preserve trust into the safety of such systems, this thesis presents a runtime verification approach designed to generate trustworthy monitors from a formal specification. These monitors are responsible for observing the cyber-physical system during runtime and ensuring its safety. As underlying language, I present the asynchronous real-time specification language RTLola. It contains primitives for arithmetic properties and grants precise control over the timing of the monitor. With this, it enables specifiers to express properties relevant to cyber-physical systems. The thesis further presents a static analysis that identifies inconsistencies in the specification and provides insights into the dynamic behavior of the monitor. As a result, the resource consumption of the monitor becomes predictable. The generation of the monitor produces either a hardware description synthesizable onto programmable hardware, or Rust code with verification annotation. These annotations allow for proving the correctness of the monitor with respect to the semantics of RTLola. Last, I present the construction of a conservative hybrid model of the underlying system using information extracted from the specification. This model enables further verification steps.Cyber-physische Systeme sind digitale Systeme, die mit der physischen Welt interagieren. Obwohl das zu einer inhärenten Komplexität führt, sind sie verantwortlich für sicherheitskritische Aufgaben wie der Steuerung von Kernkraftwerken oder autonomen Fahrzeugen. Umdas Vertrauen in deren Sicherheit zu wahren, präsentiert diese Doktorarbeit einen Ansatz zur Laufzeitverifikation, konzipiert, um vertrauenswürdige Monitore aus einer formalen Spezifikation zu generieren. Diese Monitore sind dafür verantwortlich, das cyber-physische System zur Laufzeit zu überwachen und dessen Sicherheit zu gewährleisten. Als zugrundeliegende Sprache präsentiere ich die asynchrone Echtzeit-Spezifikationssprache RTLola. Sie enthält Primitiven für arithmetische Eigenschaften und gewährt präzise Kontrolle über das Timing des Monitors. Damit wird es Spezifizierenden ermöglicht Eigenschaften auszudrücken, die für Cyber-physische Systeme relevant sind. Weiterhin präsentiert diese Doktorarbeit eine statische Analyse, die Unstimmigkeiten in der Spezifikation identifiziert und Einblicke in das dynamische Verhalten des Monitors liefert. Aufgrund dessen wird der Ressourcenverbrauch des Monitors vorhersehbar. Die Generierung des Monitors erzeugt entweder eine Hardwarebeschreibung, die auf programmierbarer Hardware synthetisiert werden kann, oder Rust Code mit Verifikationsannotationen. Diese Annotationen erlauben es, die Korrektheit des Monitors bezogen auf die Semantik von RTLola zu beweisen. Abschließend präsentiere ich die Konstruktion von einem konservativen hybriden Modell des zugrundeliegenden Systems anhand von Informationen, die aus der Spezifikation gewonnen wurden. Dieses Modell ermöglicht weitere Verifikationsschritte

    An investigation into alternative methods for the simulation and analysis of growth models

    Get PDF
    Complex systems are a rapidly increasing area of research covering numerous disciplines including economics and even cancer research, as such the optimisation of the simulations of these systems is important. This thesis will look specifically at two cellular automata based growth models the Eden growth model and the Invasion Percolation model. These models tend to be simulated storing the cluster within a simple array. This work demonstrates that for models which are highly sparse this method has drawbacks in both the memory consumed and the overall runtime of the system. It demonstrates that more modern data structures such as the HSH tree can offer considerable benefits to these models.Next, instead of optimising the software simulation of the Eden growth model, we detail a memristive-based cellular automata architecture that is capable of simulating the Eden growth model called the MEden model. It is demonstrated that not only is this method faster, up to 12; 704 times faster than the software simulation, it also allows for the same system to be used for the simulation of both EdenB and EdenC clusters without the need to be reconfigured; this is achieved through the use of two different parameters present in the model Pmax and Pchance. Giving the model a broader range of possible clusters which can aid with Monte-Carlo simulations of the model.Finally, two methods were developed to be able to identify a difference between fractally identical clusters; connected component labelling and convolution neural networks are the methods used to achieve this. It is demonstrated that both of these methods allow for the identification of individual Eden clusters able to classify them as either an EdenA, EdenB, or EdenC cluster, a highly nontrivial matter with current methods. It is also able to tell when a cluster was not an Eden cluster even though it fell in the fractal range of an Eden cluster. These features mean that the verification of a new method for the simulation of the Eden model could now be automated

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores

    Zolotarev polynomials utilization in spectral analysis

    Get PDF
    Tato práce je zaměřena na vybrané problémy Zolotarevových polynomů a jejich vyuľití ke spektrální analýze. Pokud jde o Zolotarevovy polynomy, jsou popsány základní vlastnosti symetrických Zolotarevových polynomů včetně ortogonality. Rovněľ se provádí prozkoumání numerických vlastností algoritmů generujících dokonce Zolotarevovy polynomy. Pokud jde o aplikaci Zolotarevových polynomů na spektrální analýzu, je implementována aproximovaná diskrétní Zolotarevova transformace, která umoľňuje výpočet spektrogramu (zologramu) v reálném čase. Aproximovaná diskrétní zolotarevská transformace je navíc upravena tak, aby lépe fungovala při analýze tlumených exponenciálních signálů. A nakonec je navrľena nová diskrétní Zolotarevova transformace implementovaná plně v časové oblasti. Tato transformace také ukazuje, ľe některé rysy pozorované u aproximované diskrétní Zolotarevovy transformace jsou důsledkem pouľití Zolotarevových polynomů.This thesis is focused on selected problems of symmetrical Zolotarev polynomials and their use in spectral analysis. Basic properties of symmetrical Zolotarev polynomials including orthogonality are described. Also, the exploration of numerical properties of algorithms generating even Zolotarev polynomials is performed. As regards to the application of Zolotarev polynomials to spectral analysis the Approximated Discrete Zolotarev Transform is implemented so that it enables computing of zologram in real–time. Moreover, the Approximated Discrete Zolotarev Transform is modified to perform better in the analysis of damped exponential signals. And finally, a novel Discrete Zolotarev Transform implemented fully in the time domain is suggested. This transform also shows that some features observed using the Approximated Discrete Zolotarev Transform are a consequence of using Zolotarev polynomials

    NOVEL DENSE STEREO ALGORITHMS FOR HIGH-QUALITY DEPTH ESTIMATION FROM IMAGES

    Get PDF
    This dissertation addresses the problem of inferring scene depth information from a collection of calibrated images taken from different viewpoints via stereo matching. Although it has been heavily investigated for decades, depth from stereo remains a long-standing challenge and popular research topic for several reasons. First of all, in order to be of practical use for many real-time applications such as autonomous driving, accurate depth estimation in real-time is of great importance and one of the core challenges in stereo. Second, for applications such as 3D reconstruction and view synthesis, high-quality depth estimation is crucial to achieve photo realistic results. However, due to the matching ambiguities, accurate dense depth estimates are difficult to achieve. Last but not least, most stereo algorithms rely on identification of corresponding points among images and only work effectively when scenes are Lambertian. For non-Lambertian surfaces, the brightness constancy assumption is no longer valid. This dissertation contributes three novel stereo algorithms that are motivated by the specific requirements and limitations imposed by different applications. In addressing high speed depth estimation from images, we present a stereo algorithm that achieves high quality results while maintaining real-time performance. We introduce an adaptive aggregation step in a dynamic-programming framework. Matching costs are aggregated in the vertical direction using a computationally expensive weighting scheme based on color and distance proximity. We utilize the vector processing capability and parallelism in commodity graphics hardware to speed up this process over two orders of magnitude. In addressing high accuracy depth estimation, we present a stereo model that makes use of constraints from points with known depths - the Ground Control Points (GCPs) as referred to in stereo literature. Our formulation explicitly models the influences of GCPs in a Markov Random Field. A novel regularization prior is naturally integrated into a global inference framework in a principled way using the Bayes rule. Our probabilistic framework allows GCPs to be obtained from various modalities and provides a natural way to integrate information from various sensors. In addressing non-Lambertian reflectance, we introduce a new invariant for stereo correspondence which allows completely arbitrary scene reflectance (bidirectional reflectance distribution functions - BRDFs). This invariant can be used to formulate a rank constraint on stereo matching when the scene is observed by several lighting configurations in which only the lighting intensity varies

    Application analyses of ultra-low-energy processor

    Get PDF
    Abstract. Low energy consumption has become a critical design feature in modern systems. Internet of Things, wearables and other portable devices create increasing demand for low power design where device size is dictated by battery and low energy means longer battery life and smaller physical size. These are crucial features for wearables and especially implantable medical devices. There are several low power and energy efficient techniques which are applied at different abstraction levels of the system design. A technique usually utilizing software control and hardware features is DVFS (dynamic voltage and frequency scaling), a dynamic power management technique which decreases processor clock frequency and supply voltage. Reduction in energy consumption is achieved with the cost of reduced performance. One of the questions with DVFS is how the execution frequencies are defined. This thesis presents a method for frequency optimization for applications executed on a single core processor. Execution trace data is used to profile the application. FreeRTOS operating system is used although tracing can be implemented with any real-time operating system executing tasks as separate threads. Based on profiling and user-defined data, task execution frequencies are defined assuming that execution time scales linearly with the frequency. A near-threshold ARM Cortex M3 with integrated power management and phase-locked loop is used for measurements. The measurements show that energy savings can be achieved without affecting correct application execution. However, the reduction in energy consumption depends highly on the system used and the application execution profile. Iterative testing and frequency optimization are required to ensure adequate performance. For energy efficiency optimization, energy consumption needs to be considered in every phase of the design.Matalan energiankulutuksen prosessorin sovellusanalyysi. Tiivistelmä. Matala energiankulutus on keskeinen ominaisuus nykyisten järjestelmien suunnittelussa. Esineiden Internet ja puettava tietotekniikka luovat tarpeen yhä pienemmälle energiankulutukselle. Laitteen koko määräytyy akun koon mukana. Matala tehonkulutus tarkoittaa pidempää akunkestoa ja pienempää fyysista kokoa. Nämä ovat ratkaisevia ominaisuuksia, erityisesti implantoitaville lääkinnällisille laitteille. Energiatehokkuuteen ja matalaan energiankulutukseen tähtääviä menetelmiä voidaan soveltaa eri abstraktiotasoilla järjestelmän suunnittelussa. Dynaaminen jännitteen ja taajuuden skaalaus on menetelmä, millä pyritään alentamaan dynaamista tehonkulutusta säätelemällä käyttöjännitettä ja kellotaajuutta. Suorituskyvyn kustannuksella on mahdollista saavuttaa matalampi energiankulutus. Keskeinen kysymys on, miten käytettävät kellotaajuudet tulee määritellä. Tässä diplomityössä kehitetään menetelmä, jota voidaan käyttää optimaalisten kellotaajuuksien määrittämiseen. Suorituksen aikana kerättävää dataa käytetään ohjelman profilointiin ja optimointimallin luomiseen. Suoritusdatan kerääminen on kehitetty FreeRTOS-käyttöjärjestelmälle, mutta periaate on sovellettavissa käyttöjärjestelmille, joissa tehtävät suoritetaan erillisissä prosesseissa. Profilointidata hyödynnetään yhdessä käyttäjän syöttämän data kanssa kellotaajuuksien määrittämiseen olettaen, että suoritusaika skaalautuu lineaarisesti kellotaajuden kanssa. Suositustaajuudet määritetään jokaiselle prosessille erikseen. Mittauksissa käytettiin ARM Cortex M3 prosessoria integroidulla tehonhallinnalla ja vaihelukolla. Mittaustulokset osoittavat, että energiankulutusta voidaan pienentää vaikuttamatta sovelluksen virheettömään suoritukseen. Saavutettava hyöty tehonkulutuksessa on riippuvainen käytettävästä järjestelmästä ja sovelluksen suoritusprofiilista. Riittävä suorituskyky täytyy varmistaa iteratiivisella testaamisella ja kellotaajuuksien optimoinnilla. Tehonkulutus ja energiatehokkuus täytyy huomioida suunnitteluprosessin jokaisella osa-alueella, jotta parhaat tulokset saavutetaan
    corecore