Search CORE

40 research outputs found

A Survey on Compiler Autotuning using Machine Learning

Author: Ashouri Amir H.
Cavazos John
Killian William
Palermo Gianluca
Silvano Cristina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2018
Field of study

Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

SCALABALE AND DISTRIBUTED METHODS FOR LARGE-SCALE VISUAL COMPUTING

Author: C Krishna Mohan
Singh D
Publication venue
Publication date: 01/01/2019
Field of study

The objective of this research work is to develop efficient, scalable, and distributed methods to meet the challenges associated with the processing of immense growth in visual data like images, videos, etc. The motivation stems from the fact that the existing computer vision approaches are computation intensive and cannot scale-up to carry out analysis on the large collection of data as well as to perform the real-time inference on the resourceconstrained devices. Some of the issues encountered are: 1) increased computation time for high-level representation from low-level features, 2) increased training time for classification methods, and 3) carry out analysis in real-time on the live video streams in a city-scale surveillance network. The issue of scalability can be addressed by model approximation and distributed implementation of computer vision algorithms. But existing scalable approaches suffer from the high loss in model approximation and communication overhead. In this thesis, our aim is to address some of the issues by proposing efficient methods for reducing the training time over large datasets in a distributed environment, and for real-time inference on resource-constrained devices by scaling-up computation-intensive methods using the model approximation. A scalable method Fast-BoW is presented for reducing the computation time of bagof-visual-words (BoW) feature generation for both hard and soft vector-quantization with time complexities O(|h| log2 k) and O(|h| k), respectively, where |h| is the size of the hash table used in the proposed approach and k is the vocabulary size. We replace the process of finding the closest cluster center with a softmax classifier which improves the cluster boundaries over k-means and can also be used for both hard and soft BoW encoding. To make the model compact and faster, the real weights are quantized into integer weights which can be represented using few bits (2 − 8) only. Also, on the quantized weights, the hashing is applied to reduce the number of multiplications which accelerate the entire process. Further the effectiveness of the video representation is improved by exploiting the structural information among the various entities or same entity over the time which is generally ignored by BoW representation. The interactions of the entities in a video are formulated as a graph of geometric relations among space-time interest points. The activities represented as graphs are recognized using a SVM with low complexity graph kernels, namely, random walk kernel (O(n3)) and Weisfeiler-Lehman kernel (O(n)). The use of graph kernel provides robustness to slight topological deformations, which may occur due to the presence of noise and viewpoint variation in data. The further issues such as computation and storage of the large kernel matrix are addressed using the Nystrom method for kernel linearization. The second major contribution is in reducing the time taken in learning of kernel supvi port vector machine (SVM) from large datasets using distributed implementation while sustaining classification performance. We propose Genetic-SVM which makes use of the distributed genetic algorithm to reduce the time taken in solving the SVM objective function. Further, the data partitioning approaches achieve better speed-up than distributed algorithm approaches but invariably leads to the loss in classification accuracy as global support vectors may not have been chosen as local support vectors in their respective partitions. Hence, we propose DiP-SVM, a distribution preserving kernel SVM where the first and second order statistics of the entire dataset are retained in each of the partitions. This helps in obtaining local decision boundaries which are in agreement with the global decision boundary thereby reducing the chance of missing important global support vectors. Further, the task of combining the local SVMs hinder the training speed. To address this issue, we propose Projection-SVM, using subspace partitioning where a decision tree is constructed on a projection of data along the direction of maximum variance to obtain smaller partitions of the dataset. On each of these partitions, a kernel SVM is trained independently, thereby reducing the overall training time. Also, it results in reducing the prediction time significantly. Another issue addressed is the recognition of traffic violations and incidents in real-time in a city-scale surveillance scenario. The major issues are accurate detection and real-time inference. The central computing infrastructures are unable to perform in real-time due to large network delay from video sensor to the central computing server. We propose an efficient framework using edge computing for deploying large-scale visual computing applications which reduces the latency and the communication overhead in a camera network. This framework is implemented for two surveillance applications, namely, motorcyclists without a helmet and accident incident detection. An efficient cascade of convolutional neural networks (CNNs) is proposed for incrementally detecting motorcyclists and their helmets in both sparse and dense traffic. This cascade of CNNs shares common representation in order to avoid extra computation and over-fitting. The accidents of the vehicles are modeled as an unusual incident. The deep representation is extracted using denoising stacked auto-encoders trained from the spatio-temporal video volumes of normal traffic videos. The possibility of an accident is determined based on the reconstruction error and the likelihood of the deep representation. For the likelihood of the deep representation, an unsupervised model is trained using one class SVM. Also, the intersection points of the vehicle’s trajectories are used to reduce the false alarm rate and increase the reliability of the overall system. Both the approaches are evaluated on the real traffic videos collected from the video surveillance network of Hyderabad city in India. The experiments on the real traffic videos demonstrate the efficacy of the proposed approache

Research Archive of Indian Institute of Technology Hyderabad

Un enfoque para la detección de anomalías en el tráfico de red usando imágenes y técnicas de computación de alto desempeño

Author: Barrionuevo Mercedes
Lopresti Mariela
Miranda Natalia Carolina
Piccoli María Fabiana
Publication venue
Publication date: 03/11/2016
Field of study

La variedad y la complejidad del tráfico actual en Internet superan lo imaginado por los diseñadores de su arquitectura. Detectar posibles ataques en la red requiere contar con tecnologías para su clasificación, asociando flujos de datos con las aplicaciones que los generan. Uno de los desafíos actuales es trabajar con un conjunto de datos que crece a mayor velocidad que su capacidad de procesamiento. Utilizar y procesar imágenes para representar el tráfico de red a fin de detectar tráfico anómalo, tiene como ventajas no sólo contar con una herramienta de visualización de tráfico, sino también con las propiedades de las imágenes y su procesamiento: técnicas bien conocidas y naturaleza paralela de las computaciones. Este trabajo presenta la arquitectura de una herramienta eficiente para detectar anomalías en el tráfico de una red mediante un trabajo computacional sobre imágenes y aplicando técnicas de computación de alto desempeño, mostrando algunos resultados.V Workshop de Seguridad Informática.Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Un enfoque para la detección de anomalías en el tráfico de red usando imágenes y técnicas de computación de alto desempeño

Author: Barrionuevo Mercedes
Lopresti Mariela
Miranda Natalia Carolina
Piccoli María Fabiana
Publication venue
Publication date: 01/10/2016
Field of study

Un enfoque para la detección de anomalías en el tráfico de red usando imágenes y técnicas de computación de alto desempeño

Author: Barrionuevo Mercedes
Lopresti Mariela
Miranda Natalia Carolina
Piccoli María Fabiana
Publication venue
Publication date: 01/10/2016
Field of study

Servicio de Difusión de la Creación Intelectual

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures

Author: Fang J
Huang C
Tang T
Wang Z
Yang C
Zhang P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/03/2020
Field of study

As many-core accelerators keep integrating more processing units, it becomes increasingly more difficult for a parallel application to make effective use of all available resources. An effective way for improving hardware utilization is to exploit spatial and temporal sharing of the heterogeneous processing units by multiplexing computation and communication tasks - a strategy known as heterogeneous streaming. Achieving effective heterogeneous streaming requires carefully partitioning hardware among tasks, and matching the granularity of task parallelism to the resource partition. However, finding the right resource partitioning and task granularity is extremely challenging, because there is a large number of possible solutions and the optimal solution varies across programs and datasets. This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a performance model to estimate the resulting performance of the target application under a given resource partition and task granularity configuration. The model is used as a utility to quickly search for a good configuration at runtime. Instead of hand-crafting an analytical model that requires expert insights into low-level hardware details, we employ machine learning techniques to automatically learn it. We achieve this by first learning a predictive model offline using training programs. The learnt model can then be used to predict the performance of any unseen program at runtime. We apply our approach to 39 representative parallel applications and evaluate it on two representative heterogeneous many-core platforms: a CPU-XeonPhi platform and a CPU-GPU platform. Compared to the single-stream version, our approach achieves, on average, a 1.6x and 1.1x speedup on the XeonPhi and the GPU platform, respectively. These results translate to over 93% of the performance delivered by a theoretically perfect predictor

arXiv.org e-Print Archive

White Rose Research Online

Dynamic adaptation and distribution of binaries to heterogeneous architectures

Author: Kristiansen Espen Angell
Publication venue
Publication date: 01/01/2011
Field of study

Real time multimedia workloads require progressingly more processing power. Modern many-core architectures provide enough processing power to satisfy the requirements of many real time multimedia workloads. When even they are un- able to satisfy processing power requirements, network-distribution can provide many workloads with even more computing power. In this thesis, we present solutions that can be used to make it practical to use the processing power that networks of many-core architectures can provide. The research focus on solutions that can be included in our Parallel Processing Graphs (P2G) project. We have developed the foundation for network distribution in P2G, and we have suggested a viable solution for execution of workloads on heterogeneous multi- core architectures

NORA - Norwegian Open Research Archives

PERICLES Deliverable 4.3:Content Semantics and Use Context Analysis Techniques

Author: Chatzilari E
Corubolo F
Darányi Sandor
De Weerdt David
Gill Alastair
Kontopoulos Efstratios
Maronidis A
Mitzias P
Nikopoulos S
Riga M
Sauter Christine
Tonkin Emma L.
Waddington Simon
Wittek Peter
Publication venue
Publication date: 01/01/2016
Field of study

The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner. Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.PERICLE

University of Borås

Digitala Vetenskapliga Arkivet - Academic Archive On-line

King's Research Portal

Explore Bristol Research