427 research outputs found

    Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

    Get PDF
    This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz

    Mobile graphics: SIGGRAPH Asia 2017 course

    Get PDF
    Peer ReviewedPostprint (published version

    Computer vision algorithms on reconfigurable logic arrays

    Full text link

    Decoupled Sampling for Graphics Pipelines

    Get PDF
    We propose a generalized approach to decoupling shading from visibility sampling in graphics pipelines, which we call decoupled sampling. Decoupled sampling enables stochastic supersampling of motion and defocus blur at reduced shading cost, as well as controllable or adaptive shading rates which trade off shading quality for performance. It can be thought of as a generalization of multisample antialiasing (MSAA) to support complex and dynamic mappings from visibility to shading samples, as introduced by motion and defocus blur and adaptive shading. It works by defining a many-to-one hash from visibility to shading samples, and using a buffer to memoize shading samples and exploit reuse across visibility samples. Decoupled sampling is inspired by the Reyes rendering architecture, but like traditional graphics pipelines, it shades fragments rather than micropolygon vertices, decoupling shading from the geometry sampling rate. Also unlike Reyes, decoupled sampling only shades fragments after precise computation of visibility, reducing overshading. We present extensions of two modern graphics pipelines to support decoupled sampling: a GPU-style sort-last fragment architecture, and a Larrabee-style sort-middle pipeline. We study the architectural implications of decoupled sampling and blur, and derive end-to-end performance estimates on real applications through an instrumented functional simulator. We demonstrate high-quality motion and defocus blur, as well as variable and adaptive shading rates

    Rinnakkainen toteutus H.265 videokoodaus standardille

    Get PDF
    The objective of this study was to research the scalability of the parallel features in the new H.265 video compression standard, also know as High Efficiency Video Coding (HEVC). Compared to its predecessor, the H.264 standard, H.265 typically achieves around 50% bitrate reduction for the same subjective video quality. Especially videos with higher resolution (Full HD and beyond) achieve better compression ratios. Also a better utilization of parallel computing resources is provided. H.265 introduces two novel parallelization features: Tiles and Wavefront Parallel Processing (WPP). In Tiles, each video frame is divided into areas that can be decoded without referencing to other areas in the same frame. In WPP, the relations between code blocks in a frame are encoded so that the decoding process can progress through the frame as a front using multiple threads. In this study, the reference implementation for the H.265 decoder was augmented to support both of these parallelization features. The performance of the parallel implementations was measured using three different setups. From the measurement results it could be seen that the introduction of more CPU cores reduced the total decode time of the video frames to a certain point. When using the Tiles feature, it was observed that the encoding geometry, i.e. how each frame was divided into individually decodable areas, had a noticeable effect on the decode times with certain thread counts. When using WPP, it was observed that what was mostly synchronization overhead, sometimes had a negative effect on the decode times when using larger (4-12) amounts of threads.Tämän tutkimuksen aiheena oli tutkia uuden H.265 videonpakkausstandardin (tunnetaan myös nimellä HEVC (engl. High Efficiency Video Coding)) rinnakkaisuusominaisuuksien skaalautuvuutta. Verrattuna edeltäjäänsä, H.264 videonpakkaustandardiin, H.265 tyypillisesti saavuttaa samalla kuvanlaadulla noin 50% pienemmän pakkauskoon. Erityisesti suuren resoluution videoilla (Full HD ja suuremmat) pakkaustehokkuuden paremmuus korostuu. Huomiota on kiinnitetty myös moniydinprosessoreiden hyödyntämiseen videokoodauksessa. H.265 tarjoaa kaksi uutta rinnakkaisuusominaisuutta: niin kutsutut Tiles- ja WPP-menetelmät (engl. \emph{Wavefront Parallel Processing}). Tiles-menetelmässä jokainen videon kuva jaetaan alueisiin, jotka voidaan purkaa viittaamatta saman kuvan muihin alueisiin. WPP-menetelmässä suhteet kuvan lohkoihin pakataan siten että purkamisprosessi pystyy etenemään kuvan läpi rintamana hyödyntäen useampia säikeitä. Tässä tutkimuksessa H.265 videodekooderin referenssitoteutusta laajennettiin tukemaan molempia näistä rinnakkaisuusominaisuuksista. Suorituskykyä mitattiin käyttäen kolmea eri mittausasetelmaa. Mittaustuloksista ilmeni, että prosessoriydinten lukumäärän kasvattaminen nopeutti videoiden purkamista tiettyyn pisteeseen asti. Tiles-menetelmää mitatessa havaittiin, että alueiden geometrialla, eli kuinka kuva jaettiin riippumattomiin alueisiin, on huomattava vaikutus purkamisnopeuteen tietyillä säiemäärillä. WPP-menetelmää mitattaessa havaittiin että korkeampiin säiemääriin (4-12) siirryttäessä purkamisnopeus alkoi hidastua. Tämä johtui pääasiassa säikeiden keskinäiseen synkronointiin kuluvasta ajasta

    A Survey of multimedia streaming in wireless sensor networks: progress, issues and design challenges

    Full text link
    Advancements in Complementary Metal Oxide Semiconductor (CMOS) technology have enabled Wireless Sensor Networks (WSN) to gather, process and transport multimedia (MM) data as well and not just limited to handling ordinary scalar data anymore. This new generation of WSN type is called Wireless Multimedia Sensor Networks (WMSNs). Better and yet relatively cheaper sensors that are able to sense both scalar data and multimedia data with more advanced functionalities such as being able to handle rather intense computations easily have sprung up. In this paper, the applications, architectures, challenges and issues faced in the design of WMSNs are explored. Security and privacy issues, over all requirements, proposed and implemented solutions so far, some of the successful achievements and other related works in the field are also highlighted. Open research areas are pointed out and a few solution suggestions to the still persistent problems are made, which, to the best of my knowledge, so far have not been explored yet

    Decoupled Sampling for Real-Time Graphics Pipelines

    Get PDF
    We propose decoupled sampling, an approach that decouples shading from visibility sampling in order to enable motion blur and depth-of-field at reduced cost. More generally, it enables extensions of modern real-time graphics pipelines that provide controllable shading rates to trade off quality for performance. It can be thought of as a generalization of GPU-style multisample antialiasing (MSAA) to support unpredictable shading rates, with arbitrary mappings from visibility to shading samples as introduced by motion blur, depth-of-field, and adaptive shading. It is inspired by the Reyes architecture in offline rendering, but targets real-time pipelines by driving shading from visibility samples as in GPUs, and removes the need for micropolygon dicing or rasterization. Decoupled Sampling works by defining a many-to-one hash from visibility to shading samples, and using a buffer to memoize shading samples and exploit reuse across visibility samples. We present extensions of two modern GPU pipelines to support decoupled sampling: a GPU-style sort-last fragment architecture, and a Larrabee-style sort-middle pipeline. We study the architectural implications and derive end-to-end performance estimates on real applications through an instrumented functional simulator. We demonstrate high-quality motion blur and depth-of-field, as well as variable and adaptive shading rates
    corecore