Search CORE

18 research outputs found

People tracking by cooperative fusion of RADAR and camera sensors

Author: Dimitrievski Martin
Jacobs Lennert
Philips Wilfried
Veelaert Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Accurate 3D tracking of objects from monocular camera poses challenges due to the loss of depth during projection. Although ranging by RADAR has proven effective in highway environments, people tracking remains beyond the capability of single sensor systems. In this paper, we propose a cooperative RADAR-camera fusion method for people tracking on the ground plane. Using average person height, joint detection likelihood is calculated by back-projecting detections from the camera onto the RADAR Range-Azimuth data. Peaks in the joint likelihood, representing candidate targets, are fed into a Particle Filter tracker. Depending on the association outcome, particles are updated using the associated detections (Tracking by Detection), or by sampling the raw likelihood itself (Tracking Before Detection). Utilizing the raw likelihood data has the advantage that lost targets are continuously tracked even if the camera or RADAR signal is below the detection threshold. We show that in single target, uncluttered environments, the proposed method entirely outperforms camera-only tracking. Experiments in a real-world urban environment also confirm that the cooperative fusion tracker produces significantly better estimates, even in difficult and ambiguous situations

Crossref

Ghent University Academic Bibliography

Accelerating iterative CT reconstruction algorithms using Tensor Cores

Author: Goossens Bart
Nourazar Mohsen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Tensor Cores are specialized hardware units added to recent NVIDIA GPUs to speed up matrix multiplication-related tasks, such as convolutions and densely connected layers in neural networks. Due to their specific hardware implementation and programming model, Tensor Cores cannot be straightforwardly applied to other applications outside machine learning. In this paper, we demonstrate the feasibility of using NVIDIA Tensor Cores for the acceleration of a non-machine learning application: iterative Computed Tomography (CT) reconstruction. For large CT images and real-time CT scanning, the reconstruction time for many existing iterative reconstruction methods is relatively high, ranging from seconds to minutes, depending on the size of the image. Therefore, CT reconstruction is an application area that could potentially benefit from Tensor Core hardware acceleration. We first studied the reconstruction algorithm's performance as a function of the hardware related parameters and proposed an approach to accelerate reconstruction on Tensor Cores. The results show that the proposed method provides about 5 x increase in speed and energy saving using the NVIDIA RTX 2080 Ti GPU for the parallel projection of 32 images of size 512 x 512. The relative reconstruction error due to the mixed-precision computations was almost equal to the error of single-precision (32-bit) floating- point computations. We then presented an approach for real-time and memory-limited applications by exploiting the symmetry of the system (i.e., the acquisition geometry). As the proposed approach is based on the conjugate gradient method, it can be generalized to extend its application to many research and industrial fields

Ghent University Academic Bibliography

An interactive ImageJ plugin for semi-automated image denoising in electron microscopy

Author: Aelterman Jan
Gonçalves Amanda
Goossens Bart
Kremer Anna
Lippens Saskia
Luong Hiep
Philips Wilfried
Roels Joris
Saeys Yvan
Vernaillen Frank
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

The recent advent of 3D in electron microscopy (EM) has allowed for detection of nanometer resolution structures. This has caused an explosion in dataset size, necessitating the development of automated workflows. Moreover, large 3D EM datasets typically require hours to days to be acquired and accelerated imaging typically results in noisy data. Advanced denoising techniques can alleviate this, but tend to be less accessible to the community due to low-level programming environments, complex parameter tuning or a computational bottleneck. We present DenoisEM: an interactive and GPU accelerated denoising plugin for ImageJ that ensures fast parameter tuning and processing through parallel computing. Experimental results show that DenoisEM is one order of magnitude faster than related software and can accelerate data acquisition by a factor of 4 without significantly affecting data quality. Lastly, we show that image denoising benefits visualization and (semi-)automated segmentation and analysis of ultrastructure in various volume EM datasets

Ghent University Academic Bibliography

Archivsystem Ask23

Autonomous Machine을 위한 실시간 스트림 처리와 센서 퓨전을 지원하는 Splash 프로그래밍 언어의 설계

Author: 노순현
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2020. 2. 홍성수.Autonomous machines have begun to be widely used in various application domains due to recent remarkable advances in machine intelligence. As these autonomous machines are equipped with diverse sensors, multicore processors and distributed computing nodes, the complexity of the underlying software platform is increasing at a rapid pace, overwhelming the developers with implementation details. This leads to a demand for a new programming framework that has an easy-to-use programming abstraction. In this thesis, we present a graphical programming framework named Splash that explicitly addresses the programming challenges that arise during the development of an autonomous machine. We set four design goals to solve the challenges. First, Splash should provide an easy-to-use, effective programming abstraction. Second, it must support real-time stream processing for deep-learning based machine learning intelligence. Third, it must provide programming support for real-time control system of autonomous machines such as sensor fusion and mode change. Finally, it should support performance optimization of software system running on a heterogeneous multicore distributed computing platform. Splash allows programmers to specify genuine, end-to-end timing constraints. Also, it provides a best-effort runtime system that tries to meet the annotated timing constraints and exception handling mechanisms to monitor the violation of such constraints. To implement these runtime mechanisms, Splash provides underlying timing semantics: (1) it provides an abstract global clock that is shared by machines in the distributed system and (2) it supports programmers to write birthmark on every stream data item. Splash offers a multithreaded process model to support concurrent programming. In the multithreaded process model, a programmer can write a multithreaded program using Splash threads we call sthreads. An sthread is a logical entity of independent execution. In addition, Splash provides a language construct named build unit that allows programmers to allocate sthreads to processes and threads of an underlying operating system. Splash provides three additional language semantics to support real-time stream processing and real-time control systems. First, it provides rate control semantics to solve uncontrolled jitter and an unbounded FIFO queue problem due to the variability in communication delay and execution time. Second, it supports fusion semantics to handle timing issues caused by asynchronous sensors in the system. Finally, it provides mode change semantics to meet varying requirements in the real-time control systems. In this paper, we describe each language semantics and runtime mechanism that realizes such semantics in detail. To show the utility of our framework, we have written a lane keeping assist system (LKAS) in Splash as an example. We evaluated rate control, sensor fusion, mode change and build unit-based allocation. First, using rate controller, the jitter was reduced from 30.61 milliseconds to 1.66 milliseconds. Also, average lateral deviation and heading angle is reduced from 0.180 meters to 0.016 meters and 0.043 rad to 0.008 rad, respectively. Second, we showed that the fusion operator works normally as intended, with a run-time overhead of only 7 microseconds on average. Third, the mode change mechanism operated correctly and incurred a run-time overhead of only 0.53 milliseconds. Finally, as we increased the number of build units from 1 to 8, the average end-to-end latency was increased from 75.79 microseconds to 2022.96 microseconds. These results show that the language semantics and runtime mechanisms proposed in this thesis are designed and implemented correctly, and Splash can be used to effectively develop applications for an autonomous machine.딥 러닝 기반 machine intelligence의 비약적인 발전으로 인해 autonomous machine들이 다양한 분야에서 활용되고 있다. 이런 기기들은 다양한 센서, 멀티코어 프로세서, 분산 컴퓨팅 노드를 장착하고 있기 때문에, 이들을 지원하기 위한 기반 소프트웨어 플랫폼의 복잡도는 빠른 속도로 증가하는 추세이다. 이에 따라 개발자들이 복잡한 소프트웨어 구조를 효과적으로 다룰 수 있도록 해주는 프로그래밍 프레임워크의 필요성이 대두되고 있다. 본 학위논문은 autonomous machine의 개발 과정에서 발생하는 문제들을 해결하기 위한 그래픽 기반 프로그래밍 프레임워크인 Splash를 제안한다. Splash라는 이름은 stream processing language for autonomous machine에서 앞의 세 단어의 첫 문자들을 따서 지어졌다. 이 이름은 물과 같이 흐르는 스트림 데이터를 다루기 위한 프로그래밍 언어와 런타임 시스템을 개발하겠다는 의도를 가진다. 본 논문에서는 복잡한 소프트웨어 구조를 효과적으로 다루기 위해 네 가지 디자인 목표를 설정한다. 첫째, Splash는 개발자에게 세부적인 구현 이슈를 숨기고, 쉽게 사용할 수 있는 프로그래밍 추상화를 제공하여야 한다. 둘째, Splash는 machine intelligence를 위한 실시간 스트림 처리를 지원할 수 있어야 한다. 셋째, Splash는 실시간 제어 시스템에서 널리 사용되는 센서 퓨전, 모드 변경, 예외 처리와 같은 기능들을 위한 지원을 제공하여야 한다. 넷째, Splash는 이기종 멀티코어 분산 컴퓨팅 플랫폼에서 수행되는 소프트웨어 시스템의 성능 최적화를 지원하여야 한다. Splash는 실시간 스트림 처리를 위해 개발자가 프로그램 상에 본질적인 end-to-end timing constraints를 명시할 수 있도록 한다. 그리고 개발자가 명시한 timing constraints를 인지하고 이를 최대한 지켜주는 best-effort 런타임 시스템과 timing constraints의 위반을 모니터링하고 처리해주는 예외 처리 메커니즘을 함께 제공한다. 이런 런타임 메커니즘들을 구현하기 위해 Splash는 두 가지 기본적인 timing semantics를 제공한다. 첫째, 분산 시스템 상에서 모든 머신들이 공유할 수 있는 global time base를 제공한다. 둘째, Splash 상에 들어오는 모든 스트림 데이터 아이템에 자신의 birthmark를 기록하도록 한다. Splash는 동시성 프로그래밍을 지원하기 위한 멀티 쓰레디드 처리 모델을 제공한다. Splash 프로그래머는 sthread라는 논리적인 수행 단위를 사용하여 프로그램을 개발할 수 있다. 그리고 Splash는 sthread들을 실제 운영체제의 수행 단위인 프로세스와 쓰레드에게 할당하는 과정을 돕기 위한 빌드 유닛이라는 language construct를 제공한다. Splash는 timing semantics와 멀티 쓰레디드 처리 모델을 기반으로 실시간 스트림 처리와 실시간 제어 시스템을 지원하기 위한 세 가지 language semantics를 추가로 지원한다. 첫째는 스트림 데이터의 통신이나 처리 지연으로 인해 발생하는 지터나 바운드 되지 않는 큐 문제를 해결하기 위한 rate 제어 semantics이다. 둘째는 센서 퓨전 과정에서 시간적으로 동기화되지 않은 센서 입력들로 인한 타이밍 이슈들을 해결하기 위한 퓨전 semantics이다. 마지막은 가변적인 제어 시스템의 요구사항을 충족시키기 위해 수행 로직의 변경을 지원하는 모드 변경 semantics이다. 본 논문에서는 각각의 language semantics를 구체적으로 설명하고, 이를 실현하기 위한 런타임 메커니즘을 설계하고 구현한다. Splash의 효용성을 검증하기 위해서, 본 논문은 Splash를 사용하여 LKAS 응용을 개발하고 이를 Splash 런타임 시스템 상에서 수행시키며 실험을 진행하였다. 본 논문에서는 rate 제어 메커니즘, 센서 퓨전 메커니즘, 모드 변경 메커니즘, 빌드 유닛 기반 allocation을 각각 선정된 성능 지표들을 사용하여 검증하였다. 첫째, Splash의 rate 제어기를 사용하면 지터가 30.61ms에서 1.66ms로 감소되었고, 이로 인해 주행 차량의 측면 편차와 방향각이 각각 0.180m에서 0.016m, 0.043rad에서 0.008rad으로 개선된다는 것을 확인하였다. 둘째, 센서 퓨전을 위해 제안된 퓨전 연산자가 설계된 의도대로 정상 동작하고, 평균 7us의 낮은 오버헤드만을 유발한다는 것을 확인하였다. 셋째, 모드 변경 기능의 정상 동작을 검증하였고 그 과정에서 발생하는 시간적 오버헤드는 평균 0.53ms에 불과하였다. 마지막으로, synthetic workload에 대해 컴포넌트들에 매핑된 빌드 유닛 개수를 1개, 2개, 4개, 8개로 증가시킴에 따라 평균 end-to-end 지연 시간은 75.79us, 330.80us, 591.87us, 2022.96us로 증가하는 것을 확인하였다. 이러한 결과들은 본 논문에서 제안하는 language semantics와 런타임 메커니즘들이 의도대로 설계, 구현되었고, 이를 통해 autonomous machine의 응용들을 효과적으로 개발할 수 있다는 것을 보여준다.Chapter 1 Introduction p.1 1.1 Motivation p.2 1.2 Splash Overview p.5 1.3 Organization of This Dissertation p.9 Chapter 2 Related Work p.10 2.1 Kahn Process Network p.10 2.2 Firing Rule Applied to a Process p.13 2.3 Programming Framework for an Autonomous Machine p.14 2.4 Runtime Software for an Autonomous Machine p.16 2.5 Rate Control p.18 2.5.1 Traffic Shaping p.20 2.5.2 Traffic Policing p.22 2.6 Sensor Fusion p.23 2.6.1 Measurement Fusion p.24 2.6.2 Situation Fusion p.27 2.7 Mode Change p.30 2.7.1 Synchronous Mode Change p.32 2.7.2 Asynchronous Mode Change p.32 Chapter 3 Motivation and Contributions p.34 3.1 Problem Description p.34 3.2 Limitations of Kahn Process Network p.36 3.3 Contributions of this Dissertation p.38 Chapter 4 Underlying Timing Semantics of Splash p.41 4.1 End-to-End Timing Constraints p.41 4.2 Global Time Base and In-order Delivery p.42 4.3 Integrating Three Distinct Computing Models p.43 Chapter 5 Splash Language Constructs p.45 5.1 Processing Component p.46 5.2 Port p.49 5.3 Channel and Clink p.52 5.4 Fusion Operator p.54 5.5 Factory and Mode Change p.60 5.6 Build Unit p.65 5.7 Exception Handling p.67 Chapter 6 Splash Runtime Mechanisms p.69 6.1 Rate Control Mechanism p.69 6.2 Sensor Fusion Mechanism p.70 6.3 Mode Change Mechanism p.77 Chapter 7 Code Generation and Runtime System p.80 7.1 Build Unit-based Allocation p.80 7.2 Code Generation Template p.82 7.3 Splash Runtime System p.84 Chapter 8 Experimental Evaluation p.86 8.1 LKAS Program p.86 8.2 Experimental Environment p.91 8.3 Evaluating Rate Control p.92 8.4 Evaluating Sensor Fusion p.96 8.5 Evaluating Mode Change p.97 8.6 Evaluating Build Unit-based Allocation p.99 Chapter 9 Conclusion p.102 Bibliography p.104 Abstract in Korean p.113Docto

SNU Open Repository and Archive

Recommended from our members

End-to-end deep reinforcement learning in computer systems

Author: Schaarschmidt Michael
Publication venue: University of Cambridge
Publication date: 13/04/2020
Field of study

Abstract The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications. In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping. Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications

Apollo (Cambridge)

On the Future of Cloud Engineering

Author: Bermbach Bermbach
Brandic Ivona
Cavalcante Everton
Chandra Abishek
Gokhale Aniruddha
Guo Tian
Krintz Chandra
Slominski Aleksander
Thamsen Lauritz
Wolski Rich
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2021
Field of study

Ever since the commercial offerings of the Cloud started appearing in 2006, the landscape of cloud computing has been undergoing remarkable changes with the emergence of many different types of service offerings, developer productivity enhancement tools, and new application classes as well as the manifestation of cloud functionality closer to the user at the edge. The notion of utility computing, however, has remained constant throughout its evolution, which means that cloud users always seek to save costs of leasing cloud resources while maximizing their use. On the other hand, cloud providers try to maximize their profits while assuring service-level objectives of the cloud-hosted applications and keeping operational costs low. All these outcomes require systematic and sound cloud engineering principles. The aim of this paper is to highlight the importance of cloud engineering, survey the landscape of best practices in cloud engineering and its evolution, discuss many of the existing cloud engineering advances, and identify both the inherent technical challenges and research opportunities for the future of cloud computing in general and cloud engineering in particular

Enlighten

High-Performance Modelling and Simulation for Big Data Applications

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

Directory of Open Access Books (DOAB)

High-Performance Modelling and Simulation for Big Data Applications

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

Robust and stable region-of-interest tomographic reconstruction using a robust width prior

Author: Bodmann Bernhard G.
Goossens Bart
Labate Demetrio
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 01/01/2020
Field of study

Region-of-interest computed tomography (ROI CT) aims at reconstructing a region within the field of view by using only ROI-focused projections. The solution of this inverse problem is challenging and methods of tomographic reconstruction that are designed to work with full projection data may perform poorly or fail when applied to this setting. In this work, we study the ROI CT problem in the presence of measurement noise and formulate the reconstruction problem by relaxing data fidelity and consistency requirements. Under the assumption of a robust width prior that provides a form of stability for data satisfying appropriate sparsity-inducing norms, we derive reconstruction performance guarantees and controllable error bounds. Based on this theoretical setting, we introduce a novel iterative reconstruction algorithm from ROI-focused projection data that is guaranteed to converge with controllable error while satisfying predetermined fidelity and consistency tolerances. Numerical tests on experimental data show that our algorithm for ROI CT is competitive with state-of-the-art methods especially when the ROI radius is small

Ghent University Academic Bibliography

Recommended from our members

Transiency-driven Resource Management for Cloud Computing Platforms

Author: Sharma Prateek
Publication venue: ScholarWorks@UMass Amherst
Publication date: 25/10/2018
Field of study

Modern distributed server applications are hosted on enterprise or cloud data centers that provide computing, storage, and networking capabilities to these applications. These applications are built using the implicit assumption that the underlying servers will be stable and normally available, barring for occasional faults. In many emerging scenarios, however, data centers and clouds only provide transient, rather than continuous, availability of their servers. Transiency in modern distributed systems arises in many contexts, such as green data centers powered using renewable intermittent sources, and cloud platforms that provide lower-cost transient servers which can be unilaterally revoked by the cloud operator. Transient computing resources are increasingly important, and existing fault-tolerance and resource management techniques are inadequate for transient servers because applications typically assume continuous resource availability. This thesis presents research in distributed systems design that treats transiency as a first-class design principle. I show that combining transiency-specific fault-tolerance mechanisms with resource management policies to suit application characteristics and requirements, can yield significant cost and performance benefits. These mechanisms and policies have been implemented and prototyped as part of software systems, which allow a wide range of applications, such as interactive services and distributed data processing, to be deployed on transient servers, and can reduce cloud computing costs by up to 90\%. This thesis makes contributions to four areas of computer systems research: transiency-specific fault-tolerance, resource allocation, abstractions, and resource reclamation. For reducing the impact of transient server revocations, I develop two fault-tolerance techniques that are tailored to transient server characteristics and application requirements. For interactive applications, I build a derivative cloud platform that masks revocations by transparently moving application-state between servers of different types. Similarly, for distributed data processing applications, I investigate the use of application level periodic checkpointing to reduce the performance impact of server revocations. For managing and reducing the risk of server revocations, I investigate the use of server portfolios that allow transient resource allocation to be tailored to application requirements. Finally, I investigate how resource providers (such as cloud platforms) can provide transient resource availability without revocation, by looking into alternative resource reclamation techniques. I develop resource deflation, wherein a server\u27s resources are fractionally reclaimed, allowing the application to continue execution albeit with fewer resources. Resource deflation generalizes revocation, and the deflation mechanisms and cluster-wide policies can yield both high cluster utilization and low application performance degradation

ScholarWorks@UMass Amherst