Search CORE

4,222 research outputs found

Kinetic AGN Feedback Effects on Cluster Cool Cores Simulated using SPH

Author: Barai Paramita
Borgani Stefano
Gaspari Massimo
Granato Gian Luigi
Monaco Pierluigi
Murante Giuseppe
Ragone-Figueroa Cinthia
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

We implement novel numerical models of AGN feedback in the SPH code GADGET-3, where the energy from a supermassive black hole (BH) is coupled to the surrounding gas in the kinetic form. Gas particles lying inside a bi-conical volume around the BH are imparted a one-time velocity (10,000 km/s) increment. We perform hydrodynamical simulations of isolated cluster (total mass 10^14 /h M_sun), which is initially evolved to form a dense cool core, having central T<10^6 K. A BH resides at the cluster center, and ejects energy. The feedback-driven fast wind undergoes shock with the slower-moving gas, which causes the imparted kinetic energy to be thermalized. Bipolar bubble-like outflows form propagating radially outward to a distance of a few 100 kpc. The radial profiles of median gas properties are influenced by BH feedback in the inner regions (r<20-50 kpc). BH kinetic feedback, with a large value of the feedback efficiency, depletes the inner cool gas and reduces the hot gas content, such that the initial cool core of the cluster is heated up within a time 1.9 Gyr, whereby the core median temperature rises to above 10^7 K, and the central entropy flattens. Our implementation of BH thermal feedback (using the same efficiency as kinetic), within the star-formation model, cannot do this heating, where the cool core remains. The inclusion of cold gas accretion in the simulations produces naturally a duty cycle of the AGN with a periodicity of 100 Myr.Comment: 22 pages, 11 figures, version accepted for publication in MNRAS, references and minor revisions adde

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

OA@INAF - Istituto Nazionale di Astrofisica

Energy Efficient Load Latency Tolerance: Single-Thread Performance for the Multi-Core Era

Author: Hilton Andrew D
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

Around 2003, newly activated power constraints caused single-thread performance growth to slow dramatically. The multi-core era was born with an emphasis on explicitly parallel software. Continuing to grow single-thread performance is still important in the multi-core context, but it must be done in an energy efficient way. One significant impediment to performance growth in both out-of-order and in-order processors is the long latency of last-level cache misses. Prior work introduced the idea of load latency tolerance---the ability to dynamically remove miss-dependent instructions from critical execution structures, continue execution under the miss, and re-execute miss-dependent instructions after the miss returns. However, previously proposed designs were unable to improve performance in an energy-efficient way---they introduced too many new large, complex structures and re-executed too many instructions. This dissertation describes a new load latency tolerant design that is both energy-efficient, and applicable to both in-order and out-of-order cores. Key novel features include formulation of slice re-execution as an alternative use of multi-threading support, efficient schemes for register and memory state management, and new pruning mechanisms for drastically reducing load latency tolerance\u27s dynamic execution overheads. Area analysis shows that energy-efficient load latency tolerance increases the footprint of an out-of-order core by a few percent, while cycle-level simulation shows that it significantly improves the performance of memory-bound programs. Energy-efficient load latency tolerance is more energy-efficient than---and synergistic with---existing performance technique like dynamic voltage and frequency scaling (DVFS)

A multi-viewpoint feature-based re-identification system driven by skeleton keypoints

Author: Ghidoni Stefano
Munaro Matteo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Thanks to the increasing popularity of 3D sensors, robotic vision has experienced huge improvements in a wide range of applications and systems in the last years. Besides the many benefits, this migration caused some incompatibilities with those systems that cannot be based on range sensors, like intelligent video surveillance systems, since the two kinds of sensor data lead to different representations of people and objects. This work goes in the direction of bridging the gap, and presents a novel re-identification system that takes advantage of multiple video flows in order to enhance the performance of a skeletal tracking algorithm, which is in turn exploited for driving the re-identification. A new, geometry-based method for joining together the detections provided by the skeletal tracker from multiple video flows is introduced, which is capable of dealing with many people in the scene, coping with the errors introduced in each view by the skeletal tracker. Such method has a high degree of generality, and can be applied to any kind of body pose estimation algorithm. The system was tested on a public dataset for video surveillance applications, demonstrating the improvements achieved by the multi-viewpoint approach in the accuracy of both body pose estimation and re-identification. The proposed approach was also compared with a skeletal tracking system working on 3D data: the comparison assessed the good performance level of the multi-viewpoint approach. This means that the lack of the rich information provided by 3D sensors can be compensated by the availability of more than one viewpoint

Archivio istituzionale della ricerca - Università di Padova

Recommended from our members

Scalable hardware memory disambiguation

Author: Sethumadhavan Lakshminarasimhan, 1978-
Publication venue
Publication date: 01/12/2007
Field of study

This dissertation deals with one of the long-standing problems in Computer Architecture – the problem of memory disambiguation. Microprocessors typically reorder memory instructions during execution to improve concurrency. Such microprocessors use hardware memory structures for memory disambiguation, known as LoadStore Queues (LSQs), to ensure that memory instruction dependences are satisfied even when the memory instructions execute out-of-order. A typical LSQ implementation (circa 2006) holds all in-flight memory instructions in a physically centralized LSQ and performs a fully associative search on all buffered instructions to ensure that memory dependences are satisfied. These LSQ implementations do not scale because they use large, fully associative structures, which are known to be slow and power hungry. The increasing trend towards distributed microarchitectures further exacerbates these problems. As on-chip wire delays increase and high-performance processors become necessarily distributed, centralized structures such as the LSQ can limit scalability. This dissertation describes techniques to create scalable LSQs in both centralized and distributed microarchitectures. The problems and solutions described in this thesis are motivated and validated by real system designs. The dissertation starts with a description of the partitioned primary memory system of the TRIPS processor, of which the LSQ is an important component, and then through a series of optimizations describes how the power, area, and centralization problems of the LSQ can be solved with minor performance losses (if at all) even for large number of in flight memory instructions. The four solutions described in this dissertation — partitioning, filtering, late binding and efficient overflow management — enable power-, area-efficient, distributed and scalable LSQs, which in turn enable aggressive large-window processors capable of simultaneously executing thousands of instructions. To mitigate the power problem, we replaced the power-hungry, fully associative search with a power-efficient hash table lookup using a simple address-based Bloom filter. Bloom filters are probabilistic data structures used for testing set membership and can be used to quickly check if an instruction with the same data address is likely to be found in the LSQ without performing the associative search. Bloom filters typically eliminate more than 80% of the associative searches and they are highly effective because in most programs, it is uncommon for loads and stores to have the same data address and be in execution simultaneously. To rectify the area problem, we observe the fact that only a small fraction of all memory instructions are dependent, that only such dependent instructions need to be buffered in the LSQ, and that these instructions need to be in the LSQ only for certain parts of the pipelined execution. We propose two mechanisms to exploit these observations. The first mechanism, area filtering, is a hardware mechanism that couples Bloom filters and dependence predictors to dynamically identify and buffer only those instructions which are likely to be dependent. The second mechanism, late binding, reduces the occupancy and hence size of the LSQ. Both of these optimizations allows the number of LSQ slots to be reduced by up to one-half compared to a traditional organization without any performance degradation. Finally, we describe a new decentralized LSQ design for handling LSQ structural hazards in distributed microarchitectures. Decentralization of LSQs, and to a large extent distributed microarchitectures with memory speculation, has proved to be impractical because of the high performance penalties associated with the mechanisms for dealing with hazards. To solve this problem, we applied classic flow-control techniques from interconnection networks for handling resource con- flicts. The first method, memory-side buffering, buffers the overflowing instructions in a separate buffer near the LSQs. The second scheme, execution-side NACKing, sends the overflowing instruction back to the issue window from which it is later re-issued. The third scheme, network buffering, uses the buffers in the interconnection network between the execution units and memory to hold instructions when the LSQ is full, and uses virtual channel flow control to avoid deadlocks. The network buffering scheme is the most robust of all the overflow schemes and shows less than 1% performance degradation due to overflows for a subset of SPEC CPU 2000 and EEMBC benchmarks on a cycle-accurate simulator that closely models the TRIPS processor. The techniques proposed in this dissertation are independent, architectureneutral and their cumulative benefits result in LSQs that can be partitioned at a fine granularity and have low design complexity. Each of these partitions selectively buffers only memory instructions with true dependences and can be closely coupled with the execution units thus minimizing power, area, and latency. Such LSQ designs with near-ideal characteristics are well suited for microarchitectures with thousands of instructions in-flight and may enable even more aggressive microarchitectures in the future.Computer Science

Texas ScholarWorks

A High-Speed Range-Matching TCAM for Storage-Efficient Packet Classification

Author: Ahn Hyun-Seok
Jeong Deog-Kyoon
Kim Suhwan
Kim Young-Deok
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2009
Field of study

Abstract—A critical issue in the use of TCAMs for packet classification is how to efficiently represent rules with ranges, known as range matching. A range-matching ternary content addressable memory (RM-TCAM) including a highly functional range-matching cell (RMC) is presented in this paper. By offering various range operators, the RM-TCAM can reduce storage expansion ratio from 4.21 to 1.01 compared with conventional TCAMs, under real-world packet classification rule sets, which results in reduced power consumption and die area. A new pre-discharging match-line scheme is used to realize high-speed searching in a dynamic match-line structure. An additional charge-recycling driver further reduces the power consumption of search lines. Simulation results of a 256 64-bit range-matching TCAM, when implemented in the 0.13- m CMOS technology, achieves a 1.99-ns search time with an energy efficiency of 1.26 fJ/bit/search. While a TCAM including range encoding approach requires an additional SRAM or DRAM, the RM-TCAM can improve storage efficiency without any extra components as well as reduce the die area

Theory and Implementation of RF-Input Outphasing Power Amplification

Author: Barton Taylor W.
Perreault David J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2015
Field of study

Conventional outphasing power amplifier systems require both a radio frequency (RF) carrier input and a separate baseband input to synthesize a modulated RF output. This work presents an RF-input/RF-output outphasing power amplifier that directly amplifies a modulated RF input, eliminating the need for multiple costly IQ modulators and baseband signal component separation as in previous outphasing systems. An RF signal decomposition network directly synthesizes the phase- and amplitude-modulated signals used to drive the branch power amplifiers (PAs). With this approach, a modulated RF signal including zero-crossings can be applied to the single RF input port of the outphasing RF amplifier system. The proposed technique is demonstrated at 2.14 GHz in a four-way lossless outphasing amplifier with transmission-line power combiner. The RF decomposition network is implemented using a transmission-line resistance compression network with nonlinear loads designed to provide the necessary amplitude and phase decomposition. The resulting proof-of-concept outphasing power amplifier has a peak CW output power of 93 W, peak drain efficiency of 70%, and performance on par with a previously-demonstrated outphasing and power combining system requiring four IQ modulators and a digital signal component separator

Fast Multi-frame Stereo Scene Flow with Motion Segmentation

Author: Sato Yoichi
Sinha Sudipta N.
Taniai Tatsunori
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/07/2017
Field of study

We propose a new multi-frame method for efficiently computing scene flow (dense depth and optical flow) and camera ego-motion for a dynamic scene observed from a moving stereo camera rig. Our technique also segments out moving objects from the rigid scene. In our method, we first estimate the disparity map and the 6-DOF camera motion using stereo matching and visual odometry. We then identify regions inconsistent with the estimated camera motion and compute per-pixel optical flow only at these regions. This flow proposal is fused with the camera motion-based flow proposal using fusion moves to obtain the final optical flow and motion segmentation. This unified framework benefits all four tasks - stereo, optical flow, visual odometry and motion segmentation leading to overall higher accuracy and efficiency. Our method is currently ranked third on the KITTI 2015 scene flow benchmark. Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3 orders of magnitude faster than the top six methods. We also report a thorough evaluation on challenging Sintel sequences with fast camera and object motion, where our method consistently outperforms OSF [Menze and Geiger, 2015], which is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo Scene Flow Benchmark in November 201

arXiv.org e-Print Archive

Model-Based Robot Control and Multiprocessor Implementation

Author: Mirab Hamid
Publication venue: ProQuest Dissertations & Theses,
Publication date: 01/01/1990
Field of study

Model-based control of robot manipulators has been gaining momentum in recent years. Unfortunately there are very few experimental validations to accompany simulation results and as such majority of conclusions drawn lack the credibility associated with the real control implementation

Glasgow Theses Service