143,513 research outputs found

    A Flexible Architecture for the Computation of Direct and Inverse Transforms in H.264/AVC Video Codecs

    Get PDF
    A new high throughput and scalable architecture for unified transform coding in H.264/AVC is proposed in this paper. Such flexible structure is capable of computing all the 4x4 and 2x2 transforms for Ultra High Definition Video (UHDV) applications (4320x7680@ 30fps) in real-time and with low hardware cost. These significantly high performance levels were proven with the implementation of several different configurations of the proposed structure using both FPGA and ASIC 90 nm technologies. In addition, such experimental evaluation also demonstrated the high area efficiency of theproposed architecture, which in terms of Data Throughput per Unit of Area (DTUA) is at least 1.5 times more efficient than its more prominent related designs(1)

    Scalable Multiprocessor for High-Speed Computing in Space

    Get PDF
    A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks

    Performance-Engineered Network Overlays for High Quality Interaction in Virtual Worlds

    Get PDF
    Overlay hosting systems such as PlanetLab, and cloud computing environments such as Amazon’s EC2, provide shared infrastructures within which new applications can be developed and deployed on a global scale. This paper ex-plores how systems of this sort can be used to enable ad-vanced network services and sophisticated applications that use those services to enhance performance and provide a high quality user experience. Specifically, we investigate how advanced overlay hosting environments can be used to provide network services that enable scalable virtual world applications and other large-scale distributed applications requiring consistent, real-time performance. We propose a novel network architecture called Forest built around per-session tree-structured communication channels that we call comtrees. Comtrees are provisioned and support both unicast and multicast packet delivery. The multicast mechanism is designed to be highly scalable and light-weight enough to support the rapid changes to multicast subscriptions needed for efficient support of state updates within virtual worlds. We evaluate performance using a combination of analysis and experimental measurement of a partial system prototype that supports fully functional distributed game sessions. Our results provide the data needed to enable accurate projections of performance for a variety of session and system configurations

    Different aspects of workflow scheduling in large-scale distributed systems

    Get PDF
    As large-scale distributed systems gain momentum, the scheduling of workflow applications with multiple requirements in such computing platforms has become a crucial area of research. In this paper, we investigate the workflow scheduling problem in large-scale distributed systems, from the Quality of Service (QoS) and data locality perspectives. We present a scheduling approach, considering two models of synchronization for the tasks in a workflow application: (a) communication through the network and (b) communication through temporary files. Specifically, we investigate via simulation the performance of a heterogeneous distributed system, where multiple soft real-time workflow applications arrive dynamically. The applications are scheduled under various tardiness bounds, taking into account the communication cost in the first case study and the I/O cost and data locality in the second.The work presented in this paper has been partially supported by EU, under the COST program Action IC1305, “Network for Sustainable Ultrascale Computing (NESUS)”, and by the Ministerio de Economía y Competitividad, Spain, under the project TIN2013-41350-P, “Scalable Data Management Techniques for High-End Computing Systems”

    MASSIVELY PARALLEL ALGORITHMS FOR POINT CLOUD BASED OBJECT RECOGNITION ON HETEROGENEOUS ARCHITECTURE

    Get PDF
    With the advent of new commodity depth sensors, point cloud data processing plays an increasingly important role in object recognition and perception. However, the computational cost of point cloud data processing is extremely high due to the large data size, high dimensionality, and algorithmic complexity. To address the computational challenges of real-time processing, this work investigates the possibilities of using modern heterogeneous computing platforms and its supporting ecosystem such as massively parallel architecture (MPA), computing cluster, compute unified device architecture (CUDA), and multithreaded programming to accelerate the point cloud based object recognition. The aforementioned computing platforms would not yield high performance unless the specific features are properly utilized. Failing that the result actually produces an inferior performance. To achieve the high-speed performance in image descriptor computing, indexing, and matching in point cloud based object recognition, this work explores both coarse and fine grain level parallelism, identifies the acceptable levels of algorithmic approximation, and analyzes various performance impactors. A set of heterogeneous parallel algorithms are designed and implemented in this work. These algorithms include exact and approximate scalable massively parallel image descriptors for descriptor computing, parallel construction of k-dimensional tree (KD-tree) and the forest of KD-trees for descriptor indexing, parallel approximate nearest neighbor search (ANNS) and buffered ANNS (BANNS) on the KD-tree and the forest of KD-trees for descriptor matching. The results show that the proposed massively parallel algorithms on heterogeneous computing platforms can significantly improve the execution time performance of feature computing, indexing, and matching. Meanwhile, this work demonstrates that the heterogeneous computing architectures, with appropriate architecture specific algorithms design and optimization, have the distinct advantages of improving the performance of multimedia applications

    On Optimal and Fair Service Allocation in Mobile Cloud Computing

    Get PDF
    This paper studies the optimal and fair service allocation for a variety of mobile applications (single or group and collaborative mobile applications) in mobile cloud computing. We exploit the observation that using tiered clouds, i.e. clouds at multiple levels (local and public) can increase the performance and scalability of mobile applications. We proposed a novel framework to model mobile applications as a location-time workflows (LTW) of tasks; here users mobility patterns are translated to mobile service usage patterns. We show that an optimal mapping of LTWs to tiered cloud resources considering multiple QoS goals such application delay, device power consumption and user cost/price is an NP-hard problem for both single and group-based applications. We propose an efficient heuristic algorithm called MuSIC that is able to perform well (73% of optimal, 30% better than simple strategies), and scale well to a large number of users while ensuring high mobile application QoS. We evaluate MuSIC and the 2-tier mobile cloud approach via implementation (on real world clouds) and extensive simulations using rich mobile applications like intensive signal processing, video streaming and multimedia file sharing applications. Our experimental and simulation results indicate that MuSIC supports scalable operation (100+ concurrent users executing complex workflows) while improving QoS. We observe about 25% lower delays and power (under fixed price constraints) and about 35% decrease in price (considering fixed delay) in comparison to only using the public cloud. Our studies also show that MuSIC performs quite well under different mobility patterns, e.g. random waypoint and Manhattan models

    Navigation Recommender:Real-Time iGNSS QoS Prediction for Navigation Services

    Get PDF
    Global Navigation Satellite Systems (GNSSs), especially Global Positioning System (GPS), have become commonplace in mobile devices and are the most preferred geo-positioning sensors for many location-based applications. Besides GPS, other GNSSs under development or deployment are GLONASS, Galileo, and Compass. These four GNSSs are planned to be integrated in the near future. It is anticipated that integrated GNSSs (iGNSSs) will improve the overall satellite-based geo-positioning performance. However, one major shortcoming of any GNSS and iGNSSs is Quality of Service (QoS) degradation due to signal blockage and attenuation by the surrounding environments, particularly in obstructed areas. GNSS QoS uncertainty is the root cause of positioning ambiguity, poor localization performance, application freeze, and incorrect guidance in navigation applications. In this research, a methodology, called iGNSS QoS prediction, that can provide GNSS QoS on desired and prospective routes is developed. Six iGNSS QoS parameters suitable for navigation are defined: visibility, availability, accuracy, continuity, reliability, and flexibility. The iGNSS QoS prediction methodology, which includes a set of algorithms, encompasses four modules: segment sampling, point-based iGNSS QoS prediction, tracking-based iGNSS QoS prediction, and iGNSS QoS segmentation. Given that iGNSS QoS prediction is data- and compute-intensive and navigation applications require real-time solutions, an efficient satellite selection algorithm is developed and distributed computing platforms, mainly grids and clouds, for achieving real-time performance are explored. The proposed methodology is unique in several respects: it specifically addresses the iGNSS positioning requirements of navigation systems/services; it provides a new means for route choices and routing in navigation systems/services; it is suitable for different modes of travel such as driving and walking; it takes high-resolution 3D data into account for GNSS positioning; and it is based on efficient algorithms and can utilize high-performance and scalable computing platforms such as grids and clouds to provide real-time solutions. A number of experiments were conducted to evaluate the developed methodology and the algorithms using real field test data (GPS coordinates). The experimental results show that the methodology can predict iGNSS QoS in various areas, especially in problematic areas
    • …
    corecore