33,061 research outputs found
Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential
Emerging computer architectures will feature drastically decreased flops/byte
(ratio of peak processing rate to memory bandwidth) as highlighted by recent
studies on Exascale architectural trends. Further, flops are getting cheaper
while the energy cost of data movement is increasingly dominant. The
understanding and characterization of data locality properties of computations
is critical in order to guide efforts to enhance data locality. Reuse distance
analysis of memory address traces is a valuable tool to perform data locality
characterization of programs. A single reuse distance analysis can be used to
estimate the number of cache misses in a fully associative LRU cache of any
size, thereby providing estimates on the minimum bandwidth requirements at
different levels of the memory hierarchy to avoid being bandwidth bound.
However, such an analysis only holds for the particular execution order that
produced the trace. It cannot estimate potential improvement in data locality
through dependence preserving transformations that change the execution
schedule of the operations in the computation. In this article, we develop a
novel dynamic analysis approach to characterize the inherent locality
properties of a computation and thereby assess the potential for data locality
enhancement via dependence preserving transformations. The execution trace of a
code is analyzed to extract a computational directed acyclic graph (CDAG) of
the data dependences. The CDAG is then partitioned into convex subsets, and the
convex partitioning is used to reorder the operations in the execution trace to
enhance data locality. The approach enables us to go beyond reuse distance
analysis of a single specific order of execution of the operations of a
computation in characterization of its data locality properties. It can serve a
valuable role in identifying promising code regions for manual transformation,
as well as assessing the effectiveness of compiler transformations for data
locality enhancement. We demonstrate the effectiveness of the approach using a
number of benchmarks, including case studies where the potential shown by the
analysis is exploited to achieve lower data movement costs and better
performance.Comment: Transaction on Architecture and Code Optimization (2014
Autonomous Tissue Scanning under Free-Form Motion for Intraoperative Tissue Characterisation
In Minimally Invasive Surgery (MIS), tissue scanning with imaging probes is
required for subsurface visualisation to characterise the state of the tissue.
However, scanning of large tissue surfaces in the presence of deformation is a
challenging task for the surgeon. Recently, robot-assisted local tissue
scanning has been investigated for motion stabilisation of imaging probes to
facilitate the capturing of good quality images and reduce the surgeon's
cognitive load. Nonetheless, these approaches require the tissue surface to be
static or deform with periodic motion. To eliminate these assumptions, we
propose a visual servoing framework for autonomous tissue scanning, able to
deal with free-form tissue deformation. The 3D structure of the surgical scene
is recovered and a feature-based method is proposed to estimate the motion of
the tissue in real-time. A desired scanning trajectory is manually defined on a
reference frame and continuously updated using projective geometry to follow
the tissue motion and control the movement of the robotic arm. The advantage of
the proposed method is that it does not require the learning of the tissue
motion prior to scanning and can deal with free-form deformation. We deployed
this framework on the da Vinci surgical robot using the da Vinci Research Kit
(dVRK) for Ultrasound tissue scanning. Since the framework does not rely on
information from the Ultrasound data, it can be easily extended to other
probe-based imaging modalities.Comment: 7 pages, 5 figures, ICRA 202
Recommended from our members
Automation of Determination of Optimal Intra-Compute Node Parallelism
Maximizing the productivity of modern multicore and manycore chips requires optimizing parallelism at the compute node level. This is, however, a complex multi-step process. It is an iterative method requiring determining optimal degrees of parallel scalability and optimizing memory access behavior. Further, there are multiple cases to be considered, programs which use only MPI or OpenMP and hybrid (MPI +OpenMP) programs. This paper presents a set of three coordinated workflows for determining the optimal parallelism at the program level for MPI programs and at the loop level for hybrid (MPI+OpenMP) cases. The paper also details mostly automated implementations of these workflows using the PerfExpert infrastructure. Finally the paper presents case studies demonstrating both the applicability and the effectiveness of optimizing parallelism at the compute node level. The results shown in the paper will provide valuable information to further advance in the full automation of the workflows. The software implementing the parallelism scalability optimization is open source and available for download.Texas Advanced Computing Center (TACC)Computer Science
Intelligent design guidance
This paper presents results from an investigation regarding the use of the Design Structure Matrix (DSM) as a means to guide a designer through the calculation of numerical relationships within the early design system Designer. Characteristics, relationships and goals are used within Designer to enable the evaluation and approximation of the design model and are represented within the system as a digraph. Despite being a useful representation of the interactions within the design model, the digraph does not aid the designer in identifying a sequence of activities that need to be performed in order to evaluate the model. The DSM system was used to represent the characteristics and the dependencies obtained through the relationships. The sequence of characteristics within the DSM was optimised and used to produce a design process to guide the designer in model evaluation. The objective of the optimisation was to minimise the amount of iteration within the design process. The process enabled a designer who is unfamiliar with the model to evaluate it and satisfy the design goals and requirements. Both the DSM system and the Designer system are generic in nature andmay be applied to any design problem
Dynamic selection and estimation of the digital predistorter parameters for power amplifier linearization
© © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents a new technique that dynamically estimates and updates the coefficients of a digital predistorter (DPD) for power amplifier (PA) linearization. The proposed technique is dynamic in the sense of estimating, at every iteration of the coefficient's update, only the minimum necessary parameters according to a criterion based on the residual estimation error. At the first step, the original basis functions defining the DPD in the forward path are orthonormalized for DPD adaptation in the feedback path by means of a precalculated principal component analysis (PCA) transformation. The robustness and reliability of the precalculated PCA transformation (i.e., PCA transformation matrix obtained off line and only once) is tested and verified. Then, at the second step, a properly modified partial least squares (PLS) method, named dynamic partial least squares (DPLS), is applied to obtain the minimum and most relevant transformed components required for updating the coefficients of the DPD linearizer. The combination of the PCA transformation with the DPLS extraction of components is equivalent to a canonical correlation analysis (CCA) updating solution, which is optimum in the sense of generating components with maximum correlation (instead of maximum covariance as in the case of the DPLS extraction alone). The proposed dynamic extraction technique is evaluated and compared in terms of computational cost and performance with the commonly used QR decomposition approach for solving the least squares (LS) problem. Experimental results show that the proposed method (i.e., combining PCA with DPLS) drastically reduces the amount of DPD coefficients to be estimated while maintaining the same linearization performance.Peer ReviewedPostprint (author's final draft
- …