848 research outputs found

    A general guide to applying machine learning to computer architecture

    Get PDF
    The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms to improve the computer system performance. The work that has been conducted, however, has produced considerably promising results. The purpose of this paper is to serve as a foundational base and guide to future computer architecture research seeking to make use of machine learning models for improving system efficiency. We describe a method that highlights when, why, and how to utilize machine learning models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data generation every execution quantum and parameter engineering. This is followed by a survey of a set of popular machine learning models. We discuss their strengths and weaknesses and provide an evaluation of implementations for the purpose of creating a workload performance predictor for different core types in an x86 processor. The predictions can then be exploited by a scheduler for heterogeneous processors to improve the system throughput. The algorithms of focus are stochastic gradient descent based linear regression, decision trees, random forests, artificial neural networks, and k-nearest neighbors.This work has been supported by the European Research Council (ERC) Advanced Grant RoMoL (Grant Agreemnt 321253) and by the Spanish Ministry of Science and Innovation (contract TIN 2015-65316P).Peer ReviewedPostprint (published version

    Parallel Traversal of Large Ensembles of Decision Tree

    Get PDF
    Machine-learnt models based on additive ensembles of regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. The deployment of such models is computationally demanding: to compute the final prediction, the whole ensemble must be traversed by accumulating the contributions of all its trees. In particular, traversal cost impacts applications where the number of candidate items is large, the time budget available to apply the learnt model to them is limited, and the users' expectations in terms of quality-of-service is high. Document ranking in web search, where sub-optimal ranking models are deployed to find a proper trade-off between efficiency and effectiveness of query answering, is probably the most typical example of this challenging issue. This paper investigates multi/many-core parallelization strategies for speeding up the traversal of large ensembles of regression trees thus obtaining machine-learnt models that are, at the same time, effective, fast, and scalable. Our best results are obtained by the GPU-based parallelization of the state-of-the-art algorithm, with speedups of up to 102.6x. IEE

    MASSIVELY PARALLEL ALGORITHMS FOR POINT CLOUD BASED OBJECT RECOGNITION ON HETEROGENEOUS ARCHITECTURE

    Get PDF
    With the advent of new commodity depth sensors, point cloud data processing plays an increasingly important role in object recognition and perception. However, the computational cost of point cloud data processing is extremely high due to the large data size, high dimensionality, and algorithmic complexity. To address the computational challenges of real-time processing, this work investigates the possibilities of using modern heterogeneous computing platforms and its supporting ecosystem such as massively parallel architecture (MPA), computing cluster, compute unified device architecture (CUDA), and multithreaded programming to accelerate the point cloud based object recognition. The aforementioned computing platforms would not yield high performance unless the specific features are properly utilized. Failing that the result actually produces an inferior performance. To achieve the high-speed performance in image descriptor computing, indexing, and matching in point cloud based object recognition, this work explores both coarse and fine grain level parallelism, identifies the acceptable levels of algorithmic approximation, and analyzes various performance impactors. A set of heterogeneous parallel algorithms are designed and implemented in this work. These algorithms include exact and approximate scalable massively parallel image descriptors for descriptor computing, parallel construction of k-dimensional tree (KD-tree) and the forest of KD-trees for descriptor indexing, parallel approximate nearest neighbor search (ANNS) and buffered ANNS (BANNS) on the KD-tree and the forest of KD-trees for descriptor matching. The results show that the proposed massively parallel algorithms on heterogeneous computing platforms can significantly improve the execution time performance of feature computing, indexing, and matching. Meanwhile, this work demonstrates that the heterogeneous computing architectures, with appropriate architecture specific algorithms design and optimization, have the distinct advantages of improving the performance of multimedia applications

    Modeling Algorithm Performance on Highly-threaded Many-core Architectures

    Get PDF
    The rapid growth of data processing required in various arenas of computation over the past decades necessitates extensive use of parallel computing engines. Among those, highly-threaded many-core machines, such as GPUs have become increasingly popular for accelerating a diverse range of data-intensive applications. They feature a large number of hardware threads with low-overhead context switches to hide the memory access latencies and therefore provide high computational throughput. However, understanding and harnessing such machines places great challenges on algorithm designers and performance tuners due to the complex interaction of threads and hierarchical memory subsystems of these machines. The achieved performance jointly depends on the parallelism exploited by the algorithm, the effectiveness of latency hiding, and the utilization of multiprocessors (occupancy). Contemporary work tries to model the performance of GPUs from various aspects with different emphasis and granularity. However, no model considers all of these factors together at the same time. This dissertation presents an analytical framework that jointly addresses parallelism, latency-hiding, and occupancy for both theoretical and empirical performance analysis of algorithms on highly-threaded many-core machines so that it can guide both algorithm design and performance tuning. In particular, this framework not only helps to explore and reduce the runtime configuration space for tuning kernel execution on GPUs, but also reflects performance bottlenecks and predicts how the runtime will trend as the problem and other parameters scale. The framework consists of a pair of analytical models with one focusing on higher-level asymptotic algorithm performance on GPUs and the other one emphasizing lower-level details about scheduling and runtime configuration. Based on the two models, we have conducted extensive analysis of a large set of algorithms. Two analysis provides interesting results and explains previously unexplained data. In addition, the two models are further bridged and combined as a consistent framework. The framework is able to provide an end-to-end methodology for algorithm design, evaluation, comparison, implementation, and prediction of real runtime on GPUs fairly accurately. To demonstrate the viability of our methods, the models are validated through data from implementations of a variety of classic algorithms, including hashing, Bloom filters, all-pairs shortest path, matrix multiplication, FFT, merge sort, list ranking, string matching via suffix tree/array, etc. We evaluate the models\u27 performance across a wide spectrum of parameters, data values, and machines. The results indicate that the models can be effectively used for algorithm performance analysis and runtime prediction on highly-threaded many-core machines

    Estimating UK House Prices using Machine Learning

    Get PDF
    House price estimation is an important subject for property owners, property developers, investors and buyers. It has featured in many academic research papers and some government and commercial reports. The price of a house may vary depending on several features including geographic location, tenure, age, type, size, market, etc. Existing studies have largely focused on applying single or multiple machine learning techniques to single or groups of datasets to identify the best performing algorithms, models and/or most important predictors, but this paper proposes a cumulative layering approach to what it describes as a Multi-feature House Price Estimation (MfHPE) framework. The MfHPE is a process-oriented, data-driven and machine learning based framework that does not just identify the best performing algorithms or features that drive the accuracy of models but also exploits a cumulative multi-feature layering approach to creating machine learning models, optimising and evaluating them so as to produce tangible insights that enable the decision-making process for stakeholders within the housing ecosystem for a more realistic estimation of house prices. Fundamentally, the MfHPE framework development leverages the Design Science Research Methodology (DSRM) and HM Land Registry’s Price Paid Data is ingested as the base transactions data. 1.1 million London-based transaction records between January 2011 and December 2020 have been exploited for model design, optimisation and evaluation, while 84,051 2021 transactions have been used for model validation. With the capacity for updates to existing datasets and the introduction of new datasets and algorithms, the proposed framework has also leveraged a range of neighbourhood and macroeconomic features including the location of rail stations, supermarkets, bus stops, inflation rate, GDP, employment rate, Consumer Price Index (CPIH) and unemployment rate to explore their impact on the estimation of house prices and their influence on the behaviours of machine learning algorithms. Five machine learning algorithms have been exploited and three evaluation metrics have been used. Results show that the layered introduction of new variety of features in multiple tiers led to improved performance in 50% of models, a change in the best performing models as new variety of features are introduced, and that the choice of evaluation metrics should not just be based on technical problem types but on three components: (i) critical business objectives or project goals; (ii) variety of features; and (iii) machine learning algorithms

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    • …
    corecore