Search CORE

1,468 research outputs found

Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes

Author: Alienin Oleg
Gordienko Yuri
Kochura Yuriy
Novotarskiy Michail
Rojbi Anis
Stirenko Sergii
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/06/2017
Field of study

arXiv.org e-Print Archive

Crossref

Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes

Author: Alienin Oleg
Gordienko Yuri
Kochura Yuriy
Novotarskiy Michail
Stirenko Sergii
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/08/2017
Field of study

The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation Also, we present the results of testing neural networks architectures on H2O platform for various activation functions, stopping metrics, and other parameters of machine learning algorithm. It was demonstrated for the use case of MNIST database of handwritten digits in single-threaded mode that blind selection of these parameters can hugely increase (by 2-3 orders) the runtime without the significant increase of precision. This result can have crucial influence for optimization of available and new machine learning methods, especially for image recognition problems.Comment: 15 pages, 11 figures, 4 tables; this paper summarizes the activities which were started recently and described shortly in the previous conference presentations arXiv:1706.02248 and arXiv:1707.04940; it is accepted for Springer book series "Advances in Intelligent Systems and Computing

arXiv.org e-Print Archive

Crossref

New Image Processing Methods for Ultrasound Musculoskeletal Applications

Author: Yang Xu
Publication venue
Publication date: 17/01/2019
Field of study

In the past few years, ultrasound (US) imaging modalities have received increasing interest as diagnostic tools for orthopedic applications. The goal for many of these novel ultrasonic methods is to be able to create three-dimensional (3D) bone visualization non-invasively, safely and with high accuracy and spatial resolution. Availability of accurate bone segmentation and 3D reconstruction methods would help correctly interpreting complex bone morphology as well as facilitate quantitative analysis. However, in vivo ultrasound images of bones may have poor quality due to uncontrollable motion, high ultrasonic attenuation and the presence of imaging artifacts, which can affect the quality of the bone segmentation and reconstruction results. In this study, we investigate the use of novel ultrasonic processing methods that can significantly improve bone visualization, segmentation and 3D reconstruction in ultrasound volumetric data acquired in applications in vivo. Specifically, in this study, we investigate the use of new elastography-based, Doppler-based and statistical shape model-based methods that can be applied to ultrasound bone imaging applications with the overall major goal of obtaining fast yet accurate 3D bone reconstructions. This study is composed to three projects, which all have the potential to significantly contribute to this major goal. The first project deals with the fast and accurate implementation of correlation-based elastography and poroelastography techniques for real-time assessment of the mechanical properties of musculoskeletal tissues. The rationale behind this project is that, iii in the future, elastography-based features can be used to reduce false positives in ultrasonic bone segmentation methods based on the differences between the mechanical properties of soft tissues and the mechanical properties of hard tissues. In this study, a hybrid computation model is designed, implemented and tested to achieve real time performance without compromise in elastographic image quality . In the second project, a Power Doppler-based signal enhancement method is designed and tested with the intent of increasing the contrast between soft tissue and bone while suppressing the contrast between soft tissue and connective tissue, which is often a cause of false positives in ultrasonic bone segmentation problems. Both in-vitro and in-vivo experiments are performed to statistically analyze the performance of this method. In the third project, a statistical shape model based bone surface segmentation method is proposed and investigated. This method uses statistical models to determine if a curve detected in a segmented ultrasound image belongs to a bone surface or not. Both in-vitro and in-vivo experiments are performed to statistically analyze the performance of this method. I conclude this Dissertation with a discussion on possible future work in the field of ultrasound bone imaging and assessment

Texas A&M Repository

Graphics Hardware Implementation of the Parameter-Less Self-Organising Map

Author: Berglund Erik
Campbell Alexander
Streit Tali
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

This paper presents a highly parallel implementation of a new type of Self-Organising Map (SOM) using graphics hardware. The Parameter-Less SOM smoothly adapts to new data while preserving the mapping formed by previous data. It is therefore in principle highly suited for interactive use, however for large data sets the computational requirements are prohibitive. This paper will present an implementation on commodity graphics hardware which uses two forms of parallelism to signi¯cantly reduce this barrier. The performance is analysed experi- mentally and algorithmically. An advantage to using graphics hardware is that visualisation is essentially free", thus increasing its suitability for interactive exploration of large data sets

Queensland University of Technology ePrints Archive

Real-time on-board obstacle avoidance for UAVs based on embedded stereo vision

Author: Grinberg Michael
Kollmann Matthias
Monka Sebastian
Ruf Boitumelo
Publication venue: 'Copernicus GmbH'
Publication date: 01/01/2018
Field of study

In order to improve usability and safety, modern unmanned aerial vehicles (UAVs) are equipped with sensors to monitor the environment, such as laser-scanners and cameras. One important aspect in this monitoring process is to detect obstacles in the flight path in order to avoid collisions. Since a large number of consumer UAVs suffer from tight weight and power constraints, our work focuses on obstacle avoidance based on a lightweight stereo camera setup. We use disparity maps, which are computed from the camera images, to locate obstacles and to automatically steer the UAV around them. For disparity map computation we optimize the well-known semi-global matching (SGM) approach for the deployment on an embedded FPGA. The disparity maps are then converted into simpler representations, the so called U-/V-Maps, which are used for obstacle detection. Obstacle avoidance is based on a reactive approach which finds the shortest path around the obstacles as soon as they have a critical distance to the UAV. One of the fundamental goals of our work was the reduction of development costs by closing the gap between application development and hardware optimization. Hence, we aimed at using high-level synthesis (HLS) for porting our algorithms, which are written in C/C++, to the embedded FPGA. We evaluated our implementation of the disparity estimation on the KITTI Stereo 2015 benchmark. The integrity of the overall realtime reactive obstacle avoidance algorithm has been evaluated by using Hardware-in-the-Loop testing in conjunction with two flight simulators.Comment: Accepted in the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Scienc

arXiv.org e-Print Archive

KITopen

Fraunhofer-ePrints

Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions. Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%. Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU’s latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications. Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore’s Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future. In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository