16 research outputs found

    Financial Forecasting Using Evolutionary Computational Techniques

    Get PDF
    Financial forecasting or specially stock market prediction is one of the hottest field of research lately due to its commercial applications owing to high stakes and the kinds of attractive benefits that it has to offer. In this project we have analyzed various evolutionary computation algorithms for forecasting of financial data. The financial data has been taken from a large database and has been based on the stock prices in leading stock exchanges .We have based our models on data taken from Bombay Stock Exchange (BSE), S&P500 (Standard and Poor’s) and Dow Jones Industrial Average (DJIA). We have designed three models and compared those using historical data from the three stock exchanges. The models used were based on: 1. Radial Basis Function parameters updated by Particle swarm optimization. 2. Radial Basis Function parameters updated by Least Mean Square Algorithm. 3. FLANN parameters updated by Particle Swarm optimization. The raw input for the experiment is the historical daily open, close, high, low and volume of the concerned index. However the actual input to the model was the parameters derived from these data. The results of the experiment have been depicted with the aid of suitable curves where a comparative analysis of the various models is done on the basis on various parameters including error convergence and the Mean Average Percentage Error (MAPE). Key Words: Radial Basis Functions, FLANN, PSO, LM

    Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads

    No full text
    This is the artifact for the paper "Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads" to appear in MICRO 2023

    Address-stride assisted approximate load value prediction in GPUs

    No full text
    Value prediction holds the promise of significantly improving the performance and energy efficiency. However, if the values are predicted incorrectly, significant performance overheads are observed due to execution rollbacks. To address these overheads, value approximation is introduced, which leverages the observation that the rollbacks are not necessary as long as the application-level loss in quality due to value misprediction is acceptable to the user. However, in the context of Graphics Processing Units (GPUs), our evaluations show that the existing approximate value predictors are not optimal in improving the prediction accuracy as they do not consider memory request order, a key characteristic in determining the accuracy of value prediction. As a result, the overall data movement reduction benefits are capped as it is necessary to limit the percentage of predicted values (i.e., prediction coverage) for an acceptable value of application-level error. To this end, we propose a new Address-Stride Assisted Approximate Value Predictor (ASAP) that explicitly considers the memory addresses and their request order information so as to provide high value prediction accuracy. We take advantage of our new observation that the stride between memory request addresses and the stride between their corresponding data values are highly correlated in several applications. Therefore, ASAP predicts the values only for those requests that have regular strides in their addresses. We evaluate ASAP on a diverse set of GPGPU applications. The results show that ASAP can significantly improve the value prediction accuracy over the previously proposed mechanisms at the same coverage, or can achieve higher coverage (leading to higher performance/energy improvements) under a fixed error threshold

    OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance

    No full text
    Emerging GPGPU architectures, along with programming models like CUDA and OpenCL, offer a cost-effective platform for many applications by providing high thread level parallelism at lower energy budgets. Unfortunately, for many general-purpose applications, available hardware resources of a GPGPU are not efficiently utilized, leading to lost opportunity in improving performance. A major cause of this is the inefficiency of current warp scheduling policies in tolerating long memory latencies. In this paper, we identify that the scheduling decisions made by such policies are agnostic to thread-block, or cooperative thread array (CTA), behavior, and as a result inefficient. We present a coordinated CTA-aware scheduling policy that utilizes four schemes to minimize the impact of long memory latencies. The first two schemes, CTA-aware two-level warp scheduling and locality aware warp scheduling, enhance per-core performance by effectively reducing cache contention and improving latency hiding capability. The third scheme, bank-level parallelism aware warp scheduling, improves overall GPGPU performance by enhancing DRAM bank-level parallelism. The fourth scheme employs opportunistic memory-side prefetching to further enhance performance by taking advantage of open DRAM rows. Evaluations on a 28-core GPGPU platform with highly memory-intensive applications indicate that our proposed mechanism can provide 33 % average performance improvement compared to the commonly-employed round-robin warp scheduling policy

    Managing GPU Concurrency in Heterogeneous Architectures

    No full text
    <p>Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPUs are projected to be the dominant computing platforms for many classes of applications. The design of such systems is more complex than that of homogeneous architectures because maximizing resource utilization while minimizing shared resource interference between CPU and GPU applications is difficult. We show that GPU applications tend to monopolize the shared hardware resources, such as memory and network, because of their high thread-level parallelism (TLP), and discuss the limitations of existing GPU-based concurrency management techniques when employed in heterogeneous systems. To solve this problem, we propose an integrated concurrency management strategy that modulates the TLP in GPUs to control the performance of both CPU and GPU applications. This mechanism considers both GPU core state and system-wide memory and network congestion information to dynamically decide on the level of GPU concurrency to maximize system performance. We propose and evaluate two schemes: one (CM-CPU) for boosting CPU performance in the presence of GPU interference, the other (CM-BAL) for improving both CPU and GPU performance in a balanced manner and thus overall system performance. Our evaluations show that the first scheme improves average CPU performance by 24%, while reducing average GPU performance by 11%. The second scheme provides 7% average performance improvement for both CPU and GPU applications. We also show that our solution allows the user to control performance trade-offs between CPUs and GPUs.</p
    corecore