407 research outputs found

    A 58.6 mW 30 Frames/s Real-Time Programmable Multiobject Detection Accelerator With Deformable Parts Models on Full HD 1920Ă—1080 Videos

    Get PDF
    This paper presents a programmable, energy-efficient, and real-time object detection hardware accelerator for low power and high throughput applications using deformable parts models, with 2x higher detection accuracy than traditional rigid body models. Three methods are used to address the high computational complexity of eight deformable parts detection: classification pruning for 33x fewer part classification, vector quantization for 15x memory size reduction, and feature basis projection for 2x reduction in the cost of each classification. The chip was fabricated in a 65 nm CMOS technology, and can process full high definition 1920 Ă— 1080 videos at 60 frames/s without any OFF-chip storage. The chip has two programmable classification engines (CEs) for multiobject detection. At 30 frames/s, the chip consumes only 58.6 mW (0.94 nJ/pixel, 1168 GOPS/W). At a higher throughput of 60 frames/s, the CEs can be time multiplexed to detect even more than two object classes. This proposed accelerator enables object detection to be as energy-efficient as video compression, which is found in most cameras today.United States. Defense Advanced Research Projects AgencyTexas Instruments Incorporate

    Network vector quantization

    Get PDF
    We present an algorithm for designing locally optimal vector quantizers for general networks. We discuss the algorithm's implementation and compare the performance of the resulting "network vector quantizers" to traditional vector quantizers (VQs) and to rate-distortion (R-D) bounds where available. While some special cases of network codes (e.g., multiresolution (MR) and multiple description (MD) codes) have been studied in the literature, we here present a unifying approach that both includes these existing solutions as special cases and provides solutions to previously unsolved examples

    Kompresija slika bez gubitaka uz iskorištavanje tokovnog modela za izvođenje na višejezgrenim računalima

    Get PDF
    Image and video coding play a critical role in present multimedia systems ranging from entertainment to specialized applications such as telemedicine. Usually, they are hand–customized for every intended architecture in order to meet performance requirements. This approach is neither portable nor scalable. With the advent of multicores new challenges emerged for programmers related to both efficient utilization of additional resources and scalable performance. For image and video processing applications, streaming model of computation showed to be effective in tackling these challenges. In this paper, we report the efforts to improve the execution performance of the CBPC, our compute intensive lossless image compression algorithm described in [1]. The algorithm is based on highly adaptive and predictive modeling, outperforming many other methods in compression efficiency, although with increased complexity. We employ a high–level performance optimization approach which exploits streaming model for scalability and portability. We obtain this by detecting computationally demanding parts of the algorithm and implementing them in StreamIt, an architecture–independent stream language which goal is to improve programming productivity and parallelization efficiency by exposing the parallelism and communication pattern. We developed an interface that enables the integration and hosting of streaming kernels into the host application developed in general–purpose language.Postupci obrade slikovnih podataka su iznimno zastupljeni u postojećim multimedijskim sustavima, počev od zabavnih sustava pa do specijaliziranih aplikacija u telemedicini. Vrlo često, zbog svojih računskih zahtjeva, ovi programski odsječci su iznimno optimirani i to na niskoj razini, što predstavlja poteškoće u prenosivosti i skalabilnosti konačnog rješenja. Nadolaskom višejezgrenih računala pojavljuju se novi izazovi kao što su učinkovito iskorištavanje računskih jezgri i postizanje skalabilnosti rješenja obzirom na povećanje broja jezgri. U ovom radu prikazan je novi pristup poboljšanja izvedbenih performansi metode za kompresiju slika bez gubitaka CBPC koja se odlikuje adaptivnim modelom predviđanja koji omogućuje postizanje boljih stupnjeva kompresije uz povećanje računske složenosti [1]. Pristup koji je primjenjen sastoji se u implementaciji računski zahtjevnog predikcijskog modela u tokovnom programskom jeziku koji omogućuje paralelizaciju izvornog programa. Ovako projektiran predikcijski model može se iskoristiti kroz sučelje koje smo razvili a koje omogućuje pozivanje tokovnih računskih modula i njihovo paralelno izvođenje uz iskorištavanje više jezgri

    A graph-based framework for optimal semantic web service composition

    Get PDF
    Web services are self-described, loosely coupled software components that are network-accessible through standardized web protocols, whose characteristics are described in XML. One of the key promises of Web services is to provide better interoperability and to enable a faster integration between systems. In order to generate robust service oriented architectures, automatic composition algorithms are required in order to combine the functionality of many single services into composite services that are able to respond to demanding user requests, even when there is no single service capable of performing such task. Service composition consists of a combination of single services into composite services that are executed in sequence or in a different order, imposed by a set of control constructions that can be specified using standard languages such as OWL-s or BPEL4WS. In the last years several papers have dealt with composition of web services. Some approaches treat the service composition as a planning problem, where a sequence of actions lead from a initial state to a goal state. However, most of these proposals have some drawbacks: high complexity, high computational cost and inability to maximize the parallel execution of web services. Other approaches consider the problem as a graph search problem, where search algorithms are applied over a web service dependency graph in order to find a solution for a particular request. These proposals are simpler than their counterparts and also many can exploit the parallel execution of web services. However, most of these approaches rely on very complex dependency graphs that have not been optimized to remove data redundancy, which may negatively affect the overall performance and scalability of these techniques in large service registries. Therefore, it is necessary to identify, characterize and optimize the different tasks involved in the automatic service composition process in order to develop better strategies to efficiently obtain optimal solutions. The main goal of this dissertation is to develop a graph-based framework for automatic service composition that generate optimal input-output based compositions not only in terms of complexity of the solutions, but also in terms of overall quality of service solutions. More specifically, the objectives of this thesis are: (1) Analysis of the characteristics of services and compositions. The aim of this objective is to characterize and identify the main steps that are part for the service composition process. (2) Framework for automatic graph-based composition. This objective will focus on developing a framework that enables the efficient input-output based service composition, exploring the integration with other tasks that are part of the composition process, such as service discovery. (3) Development of optimal algorithms for automatic service composition. This objective focuses on the development of a set of algorithms and optimization techniques for the generation of optimal compositions, optimizing the complexity of the solutions and the overall Quality-of- Service. (4) Validation of the algorithms with standard datasets so they can be compared with other proposals

    Técnicas de compresión de imágenes hiperespectrales sobre hardware reconfigurable

    Get PDF
    Tesis de la Universidad Complutense de Madrid, Facultad de Informática, leída el 18-12-2020Sensors are nowadays in all aspects of human life. When possible, sensors are used remotely. This is less intrusive, avoids interferces in the measuring process, and more convenient for the scientist. One of the most recurrent concerns in the last decades has been sustainability of the planet, and how the changes it is facing can be monitored. Remote sensing of the earth has seen an explosion in activity, with satellites now being launched on a weekly basis to perform remote analysis of the earth, and planes surveying vast areas for closer analysis...Los sensores aparecen hoy en día en todos los aspectos de nuestra vida. Cuando es posible, de manera remota. Esto es menos intrusivo, evita interferencias en el proceso de medida, y además facilita el trabajo científico. Una de las preocupaciones recurrentes en las últimas décadas ha sido la sotenibilidad del planeta, y cómo menitoirzar los cambios a los que se enfrenta. Los estudios remotos de la tierra han visto un gran crecimiento, con satélites lanzados semanalmente para analizar la superficie, y aviones sobrevolando grades áreas para análisis más precisos...Fac. de InformáticaTRUEunpu

    Online Modeling and Tuning of Parallel Stream Processing Systems

    Get PDF
    Writing performant computer programs is hard. Code for high performance applications is profiled, tweaked, and re-factored for months specifically for the hardware for which it is to run. Consumer application code doesn\u27t get the benefit of endless massaging that benefits high performance code, even though heterogeneous processor environments are beginning to resemble those in more performance oriented arenas. This thesis offers a path to performant, parallel code (through stream processing) which is tuned online and automatically adapts to the environment it is given. This approach has the potential to reduce the tuning costs associated with high performance code and brings the benefit of performance tuning to consumer applications where otherwise it would be cost prohibitive. This thesis introduces a stream processing library and multiple techniques to enable its online modeling and tuning. Stream processing (also termed data-flow programming) is a compute paradigm that views an application as a set of logical kernels connected via communications links or streams. Stream processing is increasingly used by computational-x and x-informatics fields (e.g., biology, astrophysics) where the focus is on safe and fast parallelization of specific big-data applications. A major advantage of stream processing is that it enables parallelization without necessitating manual end-user management of non-deterministic behavior often characteristic of more traditional parallel processing methods. Many big-data and high performance applications involve high throughput processing, necessitating usage of many parallel compute kernels on several compute cores. Optimizing the orchestration of kernels has been the focus of much theoretical and empirical modeling work. Purely theoretical parallel programming models can fail when the assumptions implicit within the model are mis-matched with reality (i.e., the model is incorrectly applied). Often it is unclear if the assumptions are actually being met, even when verified under controlled conditions. Full empirical optimization solves this problem by extensively searching the range of likely configurations under native operating conditions. This, however, is expensive in both time and energy. For large, massively parallel systems, even deciding which modeling paradigm to use is often prohibitively expensive and unfortunately transient (with workload and hardware). In an ideal world, a parallel run-time will re-optimize an application continuously to match its environment, with little additional overhead. This work presents methods aimed at doing just that through low overhead instrumentation, modeling, and optimization. Online optimization provides a good trade-off between static optimization and online heuristics. To enable online optimization, modeling decisions must be fast and relatively accurate. Online modeling and optimization of a stream processing system first requires the existence of a stream processing framework that is amenable to the intended type of dynamic manipulation. To fill this void, we developed the RaftLib C++ template library, which enables usage of the stream processing paradigm for C++ applications (it is the run-time which is the basis of almost all the work within this dissertation). An application topology is specified by the user, however almost everything else is optimizable by the run-time. RaftLib takes advantage of the knowledge gained during the design of several prior streaming languages (notably Auto-Pipe). The resultant framework enables online migration of tasks, auto-parallelization, online buffer-reallocation, and other useful dynamic behaviors that were not available in many previous stream processing systems. Several benchmark applications have been designed to assess the performance gains through our approaches and compare performance to other leading stream processing frameworks. Information is essential to any modeling task, to that end a low-overhead instrumentation framework has been developed which is both dynamic and adaptive. Discovering a fast and relatively optimal configuration for a stream processing application often necessitates solving for buffer sizes within a finite capacity queueing network. We show that a generalized gain/loss network flow model can bootstrap the process under certain conditions. Any modeling effort, requires that a model be selected; often a highly manual task, involving many expensive operations. This dissertation demonstrates that machine learning methods (such as a support vector machine) can successfully select models at run-time for a streaming application. The full set of approaches are incorporated into the open source RaftLib framework

    Efficient Point-Cloud Processing with Primitive Shapes

    Get PDF
    This thesis presents methods for efficient processing of point-clouds based on primitive shapes. The set of considered simple parametric shapes consists of planes, spheres, cylinders, cones and tori. The algorithms developed in this work are targeted at scenarios in which the occurring surfaces can be well represented by this set of shape primitives which is the case in many man-made environments such as e.g. industrial compounds, cities or building interiors. A primitive subsumes a set of corresponding points in the point-cloud and serves as a proxy for them. Therefore primitives are well suited to directly address the unavoidable oversampling of large point-clouds and lay the foundation for efficient point-cloud processing algorithms. The first contribution of this thesis is a novel shape primitive detection method that is efficient even on very large and noisy point-clouds. Several applications for the detected primitives are subsequently explored, resulting in a set of novel algorithms for primitive-based point-cloud processing in the areas of compression, recognition and completion. Each of these application directly exploits and benefits from one or more of the detected primitives' properties such as approximation, abstraction, segmentation and continuability
    • …
    corecore