9 research outputs found

    Systematic Design Space Exploration of Dynamic Dataflow Programs for Multi-core Platforms

    Get PDF
    The limitations of clock frequency and power dissipation of deep sub-micron CMOS technology have led to the development of massively parallel computing platforms. They consist of dozens or hundreds of processing units and offer a high degree of parallelism. Taking advantage of that parallelism and transforming it into high program performances requires the usage of appropriate parallel programming models and paradigms. Currently, a common practice is to develop parallel applications using methods evolving directly from sequential programming models. However, they lack the abstractions to properly express the concurrency of the processes. An alternative approach is to implement dataflow applications, where the algorithms are described in terms of streams and operators thus their parallelism is directly exposed. Since algorithms are described in an abstract way, they can be easily ported to different types of platforms. Several dataflow models of computation (MoCs) have been formalized so far. They differ in terms of their expressiveness (ability to handle dynamic behavior) and complexity of analysis. So far, most of the research efforts have focused on the simpler cases of static dataflow MoCs, where many analyses are possible at compile-time and several optimization problems are greatly simplified. At the same time, for the most expressive and the most difficult to analyze dynamic dataflow (DDF), there is still a dearth of tools supporting a systematic and automated analysis minimizing the programming efforts of the designer. The objective of this Thesis is to provide a complete framework to analyze, evaluate and refactor DDF applications expressed using the RVC-CAL language. The methodology relies on a systematic design space exploration (DSE) examining different design alternatives in order to optimize the chosen objective function while satisfying the constraints. The research contributions start from a rigorous DSE problem formulation. This provides a basis for the definition of a complete and novel analysis methodology enabling systematic performance improvements of DDF applications. Different stages of the methodology include exploration heuristics, performance estimation and identification of refactoring directions. All of the stages are implemented as appropriate software tools. The contributions are substantiated by several experiments performed with complex dynamic applications on different types of physical platforms

    Performance Estimation Based Multicriteria Partitioning Approach for Dynamic Dataflow Programs

    Get PDF

    High-level synthesis of dataflow programs for heterogeneous platforms:design flow tools and design space exploration

    Get PDF
    The growing complexity of digital signal processing applications implemented in programmable logic and embedded processors make a compelling case the use of high-level methodologies for their design and implementation. Past research has shown that for complex systems, raising the level of abstraction does not necessarily come at a cost in terms of performance or resource requirements. As a matter of fact, high-level synthesis tools supporting such a high abstraction often rival and on occasion improve low-level design. In spite of these successes, high-level synthesis still relies on programs being written with the target and often the synthesis process, in mind. In other words, imperative languages such as C or C++, most used languages for high-level synthesis, are either modified or a constrained subset is used to make parallelism explicit. In addition, a proper behavioral description that permits the unification for hardware and software design is still an elusive goal for heterogeneous platforms. A promising behavioral description capable of expressing both sequential and parallel application is RVC-CAL. RVC-CAL is a dataflow programming language that permits design abstraction, modularity, and portability. The objective of this thesis is to provide a high-level synthesis solution for RVC-CAL dataflow programs and provide an RVC-CAL design flow for heterogeneous platforms. The main contributions of this thesis are: a high-level synthesis infrastructure that supports the full specification of RVC-CAL, an action selection strategy for supporting parallel read and writes of list of tokens in hardware synthesis, a dynamic fine-grain profiling for synthesized dataflow programs, an iterative design space exploration framework that permits the performance estimation, analysis, and optimization of heterogeneous platforms, and finally a clock gating strategy that reduces the dynamic power consumption. Experimental results on all stages of the provided design flow, demonstrate the capabilities of the tools for high-level synthesis, software hardware Co-Design, design space exploration, and power optimization for reconfigurable hardware. Consequently, this work proves the viability of complex systems design and implementation using dataflow programming, not only for system-level simulation but real heterogeneous implementations

    Feasibility Study of High-Level Synthesis : Implementation of a Real-Time HEVC Intra Encoder on FPGA

    Get PDF
    High-Level Synthesis (HLS) on automatisoitu suunnitteluprosessi, joka pyrkii parantamaan tuottavuutta perinteisiin suunnittelumenetelmiin verrattuna, nostamalla suunnittelun abstraktiota rekisterisiirtotasolta (RTL) käyttäytymistasolle. Erilaisia kaupallisia HLS-työkaluja on ollut markkinoilla aina 1990-luvulta lähtien, mutta vasta äskettäin ne ovat alkaneet saada hyväksyntää teollisuudessa sekä akateemisessa maailmassa. Hidas käyttöönottoaste on johtunut pääasiassa huonommasta tulosten laadusta (QoR) kuin mitä on ollut mahdollista tavanomaisilla laitteistokuvauskielillä (HDL). Uusimmat HLS-työkalusukupolvet ovat kuitenkin kaventaneet QoR-aukkoa huomattavasti. Tämä väitöskirja tutkii HLS:n soveltuvuutta videokoodekkien kehittämiseen. Se esittelee useita HLS-toteutuksia High Efficiency Video Coding (HEVC) -koodaukselle, joka on keskeinen mahdollistava tekniikka lukuisille nykyaikaisille mediasovelluksille. HEVC kaksinkertaistaa koodaustehokkuuden edeltäjäänsä Advanced Video Coding (AVC) -standardiin verrattuna, saavuttaen silti saman subjektiivisen visuaalisen laadun. Tämä tyypillisesti saavutetaan huomattavalla laskennallisella lisäkustannuksella. Siksi reaaliaikainen HEVC vaatii automatisoituja suunnittelumenetelmiä, joita voidaan käyttää rautatoteutus- (HW ) ja varmennustyön minimoimiseen. Tässä väitöskirjassa ehdotetaan HLS:n käyttöä koko enkooderin suunnitteluprosessissa. Dataintensiivisistä koodaustyökaluista, kuten intra-ennustus ja diskreetit muunnokset, myös enemmän kontrollia vaativiin kokonaisuuksiin, kuten entropiakoodaukseen. Avoimen lähdekoodin Kvazaar HEVC -enkooderin C-lähdekoodia hyödynnetään tässä työssä referenssinä HLS-suunnittelulle sekä toteutuksen varmentamisessa. Suorituskykytulokset saadaan ja raportoidaan ohjelmoitavalla porttimatriisilla (FPGA). Tämän väitöskirjan tärkein tuotos on HEVC intra enkooderin prototyyppi. Prototyyppi koostuu Nokia AirFrame Cloud Server palvelimesta, varustettuna kahdella 2.4 GHz:n 14-ytiminen Intel Xeon prosessorilla, sekä kahdesta Intel Arria 10 GX FPGA kiihdytinkortista, jotka voidaan kytkeä serveriin käyttäen joko peripheral component interconnect express (PCIe) liitäntää tai 40 gigabitin Ethernettiä. Prototyyppijärjestelmä saavuttaa reaaliaikaisen 4K enkoodausnopeuden, jopa 120 kuvaa sekunnissa. Lisäksi järjestelmän suorituskykyä on helppo skaalata paremmaksi lisäämällä järjestelmään käytännössä minkä tahansa määrän verkkoon kytkettäviä FPGA-kortteja. Monimutkaisen HEVC:n tehokas mallinnus ja sen monipuolisten ominaisuuksien mukauttaminen reaaliaikaiselle HW HEVC enkooderille ei ole triviaali tehtävä, koska HW-toteutukset ovat perinteisesti erittäin aikaa vieviä. Tämä väitöskirja osoittaa, että HLS:n avulla pystytään nopeuttamaan kehitysaikaa, tarjoamaan ennen näkemätöntä suunnittelun skaalautuvuutta, ja silti osoittamaan kilpailukykyisiä QoR-arvoja ja absoluuttista suorituskykyä verrattuna olemassa oleviin toteutuksiin.High-Level Synthesis (HLS) is an automated design process that seeks to improve productivity over traditional design methods by increasing design abstraction from register transfer level (RTL) to behavioural level. Various commercial HLS tools have been available on the market since the 1990s, but only recently they have started to gain adoption across industry and academia. The slow adoption rate has mainly stemmed from lower quality of results (QoR) than obtained with conventional hardware description languages (HDLs). However, the latest HLS tool generations have substantially narrowed the QoR gap. This thesis studies the feasibility of HLS in video codec development. It introduces several HLS implementations for High Efficiency Video Coding (HEVC) , that is the key enabling technology for numerous modern media applications. HEVC doubles the coding efficiency over its predecessor Advanced Video Coding (AVC) standard for the same subjective visual quality, but typically at the cost of considerably higher computational complexity. Therefore, real-time HEVC calls for automated design methodologies that can be used to minimize the HW implementation and verification effort. This thesis proposes to use HLS throughout the whole encoder design process. From data-intensive coding tools, like intra prediction and discrete transforms, to more control-oriented tools, such as entropy coding. The C source code of the open-source Kvazaar HEVC encoder serves as a design entry point for the HLS flow, and it is also utilized in design verification. The performance results are gathered with and reported for field programmable gate array (FPGA) . The main contribution of this thesis is an HEVC intra encoder prototype that is built on a Nokia AirFrame Cloud Server equipped with 2.4 GHz dual 14-core Intel Xeon processors and two Intel Arria 10 GX FPGA Development Kits, that can be connected to the server via peripheral component interconnect express (PCIe) generation 3 or 40 Gigabit Ethernet. The proof-of-concept system achieves real-time. 4K coding speed up to 120 fps, which can be further scaled up by adding practically any number of network-connected FPGA cards. Overcoming the complexity of HEVC and customizing its rich features for a real-time HEVC encoder implementation on hardware is not a trivial task, as hardware development has traditionally turned out to be very time-consuming. This thesis shows that HLS is able to boost the development time, provide previously unseen design scalability, and still result in competitive performance and QoR over state-of-the-art hardware implementations

    Architectures for Adaptive Low-Power Embedded Multimedia Systems

    Get PDF
    This Ph.D. thesis describes novel hardware/software architectures for adaptive low-power embedded multimedia systems. Novel techniques for run-time adaptive energy management are proposed, such that both HW & SW adapt together to react to the unpredictable scenarios. A complete power-aware H.264 video encoder was developed. Comparison with state-of-the-art demonstrates significant energy savings while meeting the performance constraint and keeping the video quality degradation unnoticeable

    Exploring MPEG HEVC decoder parallelism for the efficient porting onto many-core platforms

    No full text
    MPEG High Efficient Video Coding (HEVC) is likely to emerge as the video coding standard for HD and Ultra-HD TV resolutions. The two elements that push HEVC beyond the previous standards are a higher compression efficiency of about a factor of two, and the introduction of new coding tools, tiles and wavefront that are intended to ease the largely increased encoding complexity particularly for Ultra HD resolutions such as 4K and 8K. However, for HEVC decoder implementations, the achievement of the desired performance on massive parallel platforms cannot rely on the use of such optional (not enforced by MPEG profiles) tools. This paper reports results about the intrinsic parallelism of compliant HEVC decoding algorithms obtained by analyzing a dataflow implementation written using the standard language specified in ISO/IEC 23001-4 and structured attempting to maximize the algorithmic potential parallelism. The experimental results show what is the parallelism achieved by different dataflow architectures and how it can be further combined with the parallelism achieved by relying on tiles and wavefront, whenever they would be available, for porting a compliant HEVC decoder on massive parallel many-core platforms. © 2014 IEEE

    Entrega de conteúdos multimédia em over-the-top: caso de estudo das gravações automáticas

    Get PDF
    Doutoramento em Engenharia EletrotécnicaOver-The-Top (OTT) multimedia delivery is a very appealing approach for providing ubiquitous, exible, and globally accessible services capable of low-cost and unrestrained device targeting. In spite of its appeal, the underlying delivery architecture must be carefully planned and optimized to maintain a high Qualityof- Experience (QoE) and rational resource usage, especially when migrating from services running on managed networks with established quality guarantees. To address the lack of holistic research works on OTT multimedia delivery systems, this Thesis focuses on an end-to-end optimization challenge, considering a migration use-case of a popular Catch-up TV service from managed IP Television (IPTV) networks to OTT. A global study is conducted on the importance of Catch-up TV and its impact in today's society, demonstrating the growing popularity of this time-shift service, its relevance in the multimedia landscape, and tness as an OTT migration use-case. Catch-up TV consumption logs are obtained from a Pay-TV operator's live production IPTV service containing over 1 million subscribers to characterize demand and extract insights from service utilization at a scale and scope not yet addressed in the literature. This characterization is used to build demand forecasting models relying on machine learning techniques to enable static and dynamic optimization of OTT multimedia delivery solutions, which are able to produce accurate bandwidth and storage requirements' forecasts, and may be used to achieve considerable power and cost savings whilst maintaining a high QoE. A novel caching algorithm, Most Popularly Used (MPU), is proposed, implemented, and shown to outperform established caching algorithms in both simulation and experimental scenarios. The need for accurate QoE measurements in OTT scenarios supporting HTTP Adaptive Streaming (HAS) motivates the creation of a new QoE model capable of taking into account the impact of key HAS aspects. By addressing the complete content delivery pipeline in the envisioned content-aware OTT Content Delivery Network (CDN), this Thesis demonstrates that signi cant improvements are possible in next-generation multimedia delivery solutions.A entrega de conteúdos multimédia em Over-The-Top (OTT) e uma proposta atractiva para fornecer um serviço flexível e globalmente acessível, capaz de alcançar qualquer dispositivo, com uma promessa de baixos custos. Apesar das suas vantagens, e necessario um planeamento arquitectural detalhado e optimizado para manter níveis elevados de Qualidade de Experiência (QoE), em particular aquando da migração dos serviços suportados em redes geridas com garantias de qualidade pré-estabelecidas. Para colmatar a falta de trabalhos de investigação na área de sistemas de entrega de conteúdos multimédia em OTT, esta Tese foca-se na optimização destas soluções como um todo, partindo do caso de uso de migração de um serviço popular de Gravações Automáticas suportado em redes de Televisão sobre IP (IPTV) geridas, para um cenário de entrega em OTT. Um estudo global para aferir a importância das Gravações Automáticas revela a sua relevância no panorama de serviços multimédia e a sua adequação enquanto caso de uso de migração para cenários OTT. São obtidos registos de consumos de um serviço de produção de Gravações Automáticas, representando mais de 1 milhão de assinantes, para caracterizar e extrair informação de consumos numa escala e âmbito não contemplados ate a data na literatura. Esta caracterização e utilizada para construir modelos de previsão de carga, tirando partido de sistemas de machine learning, que permitem optimizações estáticas e dinâmicas dos sistemas de entrega de conteúdos em OTT através de previsões das necessidades de largura de banda e armazenamento, potenciando ganhos significativos em consumo energético e custos. Um novo mecanismo de caching, Most Popularly Used (MPU), demonstra um desempenho superior as soluções de referencia, quer em cenários de simulação quer experimentais. A necessidade de medição exacta da QoE em streaming adaptativo HTTP motiva a criaçao de um modelo capaz de endereçar aspectos específicos destas tecnologias adaptativas. Ao endereçar a cadeia completa de entrega através de uma arquitectura consciente dos seus conteúdos, esta Tese demonstra que são possíveis melhorias de desempenho muito significativas nas redes de entregas de conteúdos em OTT de próxima geração

    Personality Identification from Social Media Using Deep Learning: A Review

    Get PDF
    Social media helps in sharing of ideas and information among people scattered around the world and thus helps in creating communities, groups, and virtual networks. Identification of personality is significant in many types of applications such as in detecting the mental state or character of a person, predicting job satisfaction, professional and personal relationship success, in recommendation systems. Personality is also an important factor to determine individual variation in thoughts, feelings, and conduct systems. According to the survey of Global social media research in 2018, approximately 3.196 billion social media users are in worldwide. The numbers are estimated to grow rapidly further with the use of mobile smart devices and advancement in technology. Support vector machine (SVM), Naive Bayes (NB), Multilayer perceptron neural network, and convolutional neural network (CNN) are some of the machine learning techniques used for personality identification in the literature review. This paper presents various studies conducted in identifying the personality of social media users with the help of machine learning approaches and the recent studies that targeted to predict the personality of online social media (OSM) users are reviewed
    corecore