628 research outputs found

    A Computational Model to Predict In Vivo Lower Limb Kinetics and Assess Total Knee Arthroplasty Design Parameters

    Get PDF
    Evaluating total knee arthroplasty implant design success generally requires many years of patient follow-up studies which are both inefficient and costly. Although computational modeling is utilized during the implant design phase, it has yet to be fully utilized in order to predict the post-implantation kinetics associated with various design parameters. The objective of this study was to construct a three-dimensional computational model of the human lower limb that could predict in vivo kinetics based upon input subject specific kinematics. The model was constructed utilizing Kane’s theory of dynamics and applied to two clinical sub-studies. Firstly, axial tibiofemoral forces were compared over a deep knee bend between normal knee subjects and those with implanted knees. Secondly, kinematics were obtained for a sample subject undergoing a deep knee bend, and the amount of femoral rollback experienced by the subject (-1.86 mm) was varied in order to evaluate the subsequent change in the axial tibiofemoral contact force and the quadriceps force. The mean axial tibiofemoral contact force was 1.35xBW and 2.99xBW for the normal and implanted subjects, respectively, which was a significant difference (p = 0.0023). The sample subject experienced a decrease in both the axial tibiofemoral contact force (-8.97%) and the quadriceps load (-11.84%) with an increase of femoral rollback to -6 mm. A decrease in rollback to 6 mm led to increases in both the contact force (22.45%) and the quadriceps load (27.14%). These initial studies provide evidence that this model accurately predicts in vivo kinetics and that kinetics depend on implant design and patient kinematics

    SLC: Memory Access Granularity Aware Selective Lossy Compression for GPUs

    Get PDF
    Memory compression is a promising approach for reducing memory bandwidth requirements and increasing performance, however, memory compression techniques often result in a low effective compression ratio due to large memory access granularity (MAG) exhibited by GPUs. Our analysis of the distribution of compressed blocks shows that a significant percentage of blocks are compressed to a size that is only a few bytes above a multiple of MAG, but a whole burst is fetched from memory. These few extra bytes significantly reduce the compression ratio and the performance gain that otherwise could result from a higher raw compression ratio. To increase the effective compression ratio, we propose a novel MAG aware Selective Lossy Compression (SLC) technique for GPUs. The key idea of SLC is that when lossless compression yields a compressed size with few bytes above a multiple of MAG, we approximate these extra bytes such that the compressed size is a multiple of MAG. This way, SLC mostly retains the quality of a lossless compression and occasionally trades small accuracy for higher performance. We show a speedup of up to 35% normalized to a state-of-the-art lossless compression technique with a low loss in accuracy. Furthermore, average energy consumption and energy-delay- product are reduced by 8.3% and 17.5%, respectively.EC/H2020/688759/EU/Low-Power Parallel Computing on GPUs 2/LPGPU

    Approximating Memory-bound Applications on Mobile GPUs

    Get PDF
    Accepted for 2019 International Conference on High Performance Computing & Simulation (HPCS)Approximate computing techniques are often used to improve the performance of applications that can tolerate some amount of impurity in the calculations or data. In the context of embedded and mobile systems, a broad number of applications have exploited approximation techniques to improve performance and overcome the limited capabilities of the hardware. On such systems, even small performance improvements can be sufficient to meet scheduled requirements such as hard real-time deadlines. We study the approximation of memory-bound applications on mobile GPUs using kernel perforation, an approximation technique that exploits the availability of fast GPU local memory to provide high performance with more accurate results. Using this approximation technique, we approximated six applications and evaluated them on two mobile GPU architectures with very different memory layouts: a Qualcomm Adreno 506 and an ARM Mali T860 MP2. Results show that, even when the local memory is not mapped to dedicated fast memory in hardware, kernel perforation is still capable of 1.25x speedup because of improved memory layout and caching effects. Mobile GPUs with local memory show a speedup of up to 1.38x

    Analytical and computational estimation of patellofemoral forces in the knee under squatting and isometric motion

    Get PDF
    This study presents an intermediate step in prosthesis design, by introducing a newly developed two-dimensional mathematical, and a three-dimensional computational knee model. The analytical model is derived from Newton’s law with respect to the equilibrium equations, thus based on theoretical assumptions, and experimentally obtained parameter. The numeric model is built from an existing prosthesis, involving three parts as patella, femur and tibia, and currently it is under development. The models are capable to predict – with their standard deviation – the patellofemoral (numerically tibiofemoral as well) forces in the knee joint during squatting motion. The reason why the squatting is investigated is due to its relative simplicity and the fact, that during the movement the forces reach extremity in the knee joint. The obtained forces – as a function of flexion angle – are used firstly as fundaments to the knee design method, and secondly to extend the results related to the existing isometric kinetics, where one of the newly obtained functions appears as an essential – and so far missing – input function. Most results are compared and validated to the ones found in the relevant literature and put into a dimensionless form in order to have more general meaning

    MLCapsule: Guarded Offline Deployment of Machine Learning as a Service

    Full text link
    With the widespread use of machine learning (ML) techniques, ML as a service has become increasingly popular. In this setting, an ML model resides on a server and users can query it with their data via an API. However, if the user's input is sensitive, sending it to the server is undesirable and sometimes even legally not possible. Equally, the service provider does not want to share the model by sending it to the client for protecting its intellectual property and pay-per-query business model. In this paper, we propose MLCapsule, a guarded offline deployment of machine learning as a service. MLCapsule executes the model locally on the user's side and therefore the data never leaves the client. Meanwhile, MLCapsule offers the service provider the same level of control and security of its model as the commonly used server-side execution. In addition, MLCapsule is applicable to offline applications that require local execution. Beyond protecting against direct model access, we couple the secure offline deployment with defenses against advanced attacks on machine learning models such as model stealing, reverse engineering, and membership inference

    ParaDox: Eliminating Voltage Margins via Heterogeneous Fault Tolerance.

    Get PDF
    Providing reliability is becoming a challenge for chip manufacturers, faced with simultaneously trying to improve miniaturization, performance and energy efficiency. This leads to very large margins on voltage and frequency, designed to avoid errors even in the worst case, along with significant hardware expenditure on eliminating voltage spikes and other forms of transient error, causing considerable inefficiency in power consumption and performance. We flip traditional ideas about reliability and performance around, by exploring the use of error resilience for power and performance gains. ParaMedic is a recent architecture that provides a solution for reliability with low overheads via automatic hardware error recovery. It works by splitting up checking onto many small cores in a heterogeneous multicore system with hardware logging support. However, its design is based on the idea that errors are exceptional. We transform ParaMedic into ParaDox, which shows high performance in both error-intensive and scarce-error scenarios, thus allowing correct execution even when undervolted and overclocked. Evaluation within error-intensive simulation environments confirms the error resilience of ParaDox and the low associated recovery cost. We estimate that compared to a non-resilient system with margins, ParaDox can reduce energy-delay product by 15% through undervolting, while completely recovering from any induced errors
    • …
    corecore