Search CORE

287 research outputs found

Evidence Fusion using D-S Theory: utilizing a progressively evolving reliability factor in wireless networks

Author: Dissanayake Aqila
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2010
Field of study

The Dempster-Shafer (D-S) theory provides a method to combine evidence from multiple nodes to estimate the likelihood of an intrusion. The theory\u27s rule of combination gives a numerical method to fuse multiple pieces of information to derive a conclusion. But, D-S theory has its shortcomings when used in situations where evidence has significant conflict. Though the observers may have different values of uncertainty in the observed data, D-S theory considers the observers to be equally trustworthy. This thesis introduces a new method of combination based on D-S theory and Consensus method, that takes into consideration the reliability of evidence used in data fusion. The new method\u27s results have been compared against three other methods of evidence fusion to objectively analyze how they perform under Denial of Service attacks and Xmas tree scan attacks

Scholarship at UWindsor

Efficient Matrix Profile Computation Using Different Distance Functions

Author: Akbarinia Reza
Cloez Bertrand
Publication venue
Publication date: 17/01/2019
Field of study

Matrix profile has been recently proposed as a promising technique to the problem of all-pairs-similarity search on time series. Efficient algorithms have been proposed for computing it, e.g., STAMP, STOMP and SCRIMP++. All these algorithms use the z-normalized Euclidean distance to measure the distance between subsequences. However, as we observed, for some datasets other Euclidean measurements are more useful for knowledge discovery from time series. In this paper, we propose efficient algorithms for computing matrix profile for a general class of Euclidean distances. We first propose a simple but efficient algorithm called AAMP for computing matrix profile with the "pure" (non-normalized) Euclidean distance. Then, we extend our algorithm for the p-norm distance. We also propose an algorithm, called ACAMP, that uses the same principle as AAMP, but for the case of z-normalized Euclidean distance. We implemented our algorithms, and evaluated their performance through experimentation. The experiments show excellent performance results. For example, they show that AAMP is very efficient for computing matrix profile for non-normalized Euclidean distances. The results also show that the ACAMP algorithm is significantly faster than SCRIMP++ (the state of the art matrix profile algorithm) for the case of z-normalized Euclidean distance

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

ProdInra

HAL-Rennes 1

Performance analysis : a case study on network management system using machine learning

Author: Poonem Kristhombuge Ananda Manoj Kumara
Publication venue
Publication date: 02/07/2018
Field of study

Businesses have legacy distributed software systems which are out of traditional data analysis methods due to their complexities. In addition, the software systems evolve and become complex to understand even with the knowledge of system architecture. Machine learning and big data analytic techniques are widely used in many technical domains to get insight from this large business data due to performance and accuracy. This study was conducted to investigate the applicability of machine learning techniques on performance utilization modelling on Nokia’s network management system. The objective was to study and develop resource utilization models based on system performance data and to study future business needs on capacity analysis of the software performance to minimize manual tasks. The performance data was extracted from network management system software which contains resource usages on system level and component level measurements based on input load. In general, the simulated load on a network management system is uniform with less variance. To overcome this during the research, different load profiles were simulated on the system to assess its performance. Later the data was processed and evaluated using set of machine learning techniques (linear regression, MARS, K-NN, random forest, SVR and feed forward neural networks) to construct resource utilization models. Further, the goodness of developed models was evaluated on simulated test and customer data. Overall, no single algorithm performed best on all resource entities, but neural networks performed well on most response variables as a multivariable output model. However, when comparing performance across customer and test datasets, there were some differences which were also studied. Overall, the results show the feasibility on modeling system resource that can be used in capacity analysis. In future iterations, further analysis on remaining system nodes and suggestions have been made in the report

Trepo - Institutional Repository of Tampere University

Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems

Author: Fang Ye
Publication venue: LSU Digital Commons
Publication date: 01/01/2016
Field of study

Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of high-performance computer systems. A significant performance improvement can be achieved when suitable workloads are handled by the accelerator. Traditional CPUs can handle those workloads not well suited for accelerators. Combination of multiple types of processors in a single computer system is referred to as a heterogeneous system. This dissertation addresses tuning and scheduling issues in heterogeneous systems. The first section presents work on tuning scientific workloads on three different types of processors: multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU; common tuning methods and platform-specific tuning techniques are presented. Then, analysis is done to demonstrate the performance characteristics of the heterogeneous system on different input data. This section of the dissertation is part of the GeauxDock project, which prototyped a few state-of-art bioinformatics algorithms, and delivered a fast molecular docking program. The second section of this work studies the performance model of the GeauxDock computing kernel. Specifically, the work presents an extraction of features from the input data set and the target systems, and then uses various regression models to calculate the perspective computation time. This helps understand why a certain processor is faster for certain sets of tasks. It also provides the essential information for scheduling on heterogeneous systems. In addition, this dissertation investigates a high-level task scheduling framework for heterogeneous processor systems in which, the pros and cons of using different heterogeneous processors can complement each other. Thus a higher performance can be achieve on heterogeneous computing systems. A new scheduling algorithm with four innovations is presented: Ranked Opportunistic Balancing (ROB), Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and Automatic Small Tasks Rearranging (ASTR). The new algorithm consistently outperforms previously proposed algorithms with better scheduling results, lower computational complexity, and more consistent results over a range of performance prediction errors. Finally, this work extends the heterogeneous task scheduling algorithm to handle power capping feature. It demonstrates that a power-aware scheduler significantly improves the power efficiencies and saves the energy consumption. This suggests that, in addition to performance benefits, heterogeneous systems may have certain advantages on overall power efficiency

Louisiana State University

Strategic and operational services for workload management in the cloud

Author: Ishakian Vatche
Publication venue
Publication date: 21/09/2015
Field of study

In hosting environments such as Infrastructure as a Service (IaaS) clouds, desirable application performance is typically guaranteed through the use of Service Level Agreements (SLAs), which specify minimal fractions of resource capacities that must be allocated by a service provider for unencumbered use by customers to ensure proper operation of their workloads. Most IaaS offerings are presented to customers as fixed-size and fixed-price SLAs, that do not match well the needs of specific applications. Furthermore, arbitrary colocation of applications with different SLAs may result in inefficient utilization of hosts' resources, resulting in economically undesirable customer behavior. In this thesis, we propose the design and architecture of a Colocation as a Service (CaaS) framework: a set of strategic and operational services that allow the efficient colocation of customer workloads. CaaS strategic services provide customers the means to specify their application workload using an SLA language that provides them the opportunity and incentive to take advantage of any tolerances they may have regarding the scheduling of their workloads. CaaS operational services provide the information necessary for, and carry out the reconfigurations mandated by strategic services. We recognize that it could be the case that there are multiple, yet functionally equivalent ways to express an SLA. Thus, towards that end, we present a service that allows the provably-safe transformation of SLAs from one form to another for the purpose of achieving more efficient colocation. Our CaaS framework could be incorporated into an IaaS offering by providers or it could be implemented as a value added proposition by IaaS resellers. To establish the practicality of such offerings, we present a prototype implementation of our proposed CaaS framework

Boston University Institutional Repository (OpenBU)

BlueDBM: An Appliance for Big Data Analytics

Author: Ankcorn John
Arvind Arvind
Hicks Jamey
Jun SangWoo
King Myron Decker
Lee Sungjin
Liu Ming Gang
Xu Shuotao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2015
Field of study

Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data and daily twitter feeds where the datasets of interest are 5TB to 20 TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GBs of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. In this paper we present BlueDBM, a new system architecture which has flash-based storage with in-store processing capability and a low-latency high-throughput inter-controller network. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a ram-cloud system falls sharply even if only 5%~10% of the references are to the secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost-performance trade-off for Big Data analytics.Quanta Computer (Firm)Samsung (Firm)Lincoln Laboratory (PO7000261350)Intel Corporatio

DSpace@MIT

Virtualisation of FPGA-Resources for Concurrent User Designs Employing Partial Dynamic Reconfiguration

Author: Genßler Paul Richard
Publication venue
Publication date: 12/03/2015
Field of study

Reconfigurable hardware in a cloud environment is a power efficient way to increase the processing power of future data centers beyond today\'s maximum. This work enhances an existing framework to support concurrent users on a virtualized reconfigurable FPGA resource. The FPGAs are used to provide a flexible, fast and very efficient platform for the user who has access through a simple cloud based interface. A fast partial reconfiguration is achieved through the ICAP combined with a PCIe connection and a combination of custom and TCL scripts to control the tool flow. This allows for a reconfiguration of a user space on a FPGA in a few milliseconds while providing a simple single-action interface to the user

Technische Universität Dresden: Qucosa

Hardware-Software Co-Design, Acceleration and Prototyping of Control Algorithms on Reconfigurable Platforms

Author: Edosa Desta Kumsa
Publication venue: Digital Scholarship@UNLV
Publication date: 01/12/2012
Field of study

Differential equations play a significant role in many disciplines of science and engineering. Solving and implementing Ordinary Differential Equations (ODEs) and partial Differential Equations (PDEs) effectively are very essential as most complex dynamic systems are modeled based on these equations. High Performance Computing (HPC) methodologies are required to compute and implement complex and data intensive applications modeled by differential equations at higher speed. There are, however, some challenges and limitations in implementing dynamic system, modeled by non-linear ordinary differential equations, on digital hardware. Modeling an integrator involves data approximation which results in accuracy error if data values are not considered properly. Accuracy and precision are dependent on the data types defined for each block of a system and subsystems. Also, digital hardware mostly works on fixed point data which leads to some data approximations. Using Field Programmable Gate Array (FPGA), it is possible to solve ordinary differential equations (ODE) at high speed. FPGA also provides scalable, flexible and reconfigurable features. The goal of this thesis is to explore and compare implementation of control algorithms on reconfigurable logic. This thesis focuses on implementing control algorithms modeled by second and fourth order PDEs and ODEs using Xilinx System Generator (XSG) and LabVIEW FPGA module synthesis tools. Xilinx System Generator for DSP allows integration of legacy HDL code, embedded IP cores, MATLAB functions, and hardware components targeted for Xilinx FPGAs to create complete system models that can be simulated and synthesized within the Simulink environment. The National Instruments (NI) LabVIEW FPGA Module extends LabVIEW graphical development to Field-Programmable Gate Arrays (FPGAs) on NI Reconfigurable I/O hardware. This thesis also focuses on efficient implementation and performance comparison of these implementations. Optimization of area, latency and power has also been explored during implementation and comparison results are discussed

University of Nevada, Las Vegas Repository

FPGA-based range-limited molecular dynamics acceleration

Author: Wu Chunshu
Publication venue
Publication date: 07/09/2023
Field of study

Molecular Dynamics (MD) is a computer simulation technique that executes iteratively over discrete, infinitesimal time intervals. It has been a widely utilized application in the fields of material sciences and computer-aided drug design for many years, serving as a crucial benchmark in high-performance computing (HPC). Numerous MD packages have been developed and effectively accelerated using GPUs. However, as the limits of Moore's Law are reached, the performance of an individual computing node has reached its bottleneck, while the performance of multiple nodes is primarily hindered by scalability issues, particularly when dealing with small datasets. In this thesis, the acceleration with respect to small datasets is the main focus. With the recent COVID-19 pandemic, drug discovery has gained significant attention, and Molecular Dynamics (MD) has emerged as a crucial tool in this process. Particularly, in the critical domain of drug discovery, small simulations involving approximately ~50K particles are frequently employed. However, it is important to note that small simulations do not necessarily translate to faster results, as long-term simulations comprising billions of MD iterations and more are essential in this context. In addition to dataset size, the problem of interest is further constrained. Referred to as the most computationally demanding aspect of MD, the evaluation of range-limited (RL) forces not only accounts for 90% of the MD computation workload but also involves irregular mapping patterns of 3-D data onto 2-D processor networks. To emphasize, this thesis centers around the acceleration of RL MD specifically for small datasets. In order to address the single-node bottleneck and multi-node scaling challenges, the thesis is organized into two progressive stages of investigation. The first stage delves extensively into enhancing single-node efficiency by examining various factors such as workload mapping from 3-D to 2-D, data routing, and data locality. The second stage focuses on studying multi-node scalability, with a particular emphasis on strong scaling, bandwidth demands, and the synchronization mechanisms between nodes. Through our study, the results show our design on a Xilinx U280 FPGA achieves 51.72x and 4.17x speedups with respect to an Intel Xeon Gold 6226R CPU, and a Quadro RTX 8000 GPU. Our research towards strong scaling also demonstrates that 8 Xilinx U280 FPGAs connected to a switch achieves 4.67x speedup compared to an Nvidia V100 GP

Boston University Institutional Repository (OpenBU)

Queuing network models and performance analysis of computer systems

Author: Wijbrands R.J.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1988
Field of study

Repository TU/e

Pure OAI Repository