7 research outputs found
A Prediction-Based Replication Algorithm for Improving Data Availability in Frid Environment
Data replication is a key optimization technique for reducing access latency and managing large data by storing replica of data in a wisely manner. In this paper, we propose a data replication algorithm, called the Prediction-Base Dynamic Replication (PBDR) algorithm that improves file access time. Restricted by the storage capacity, it is essential to design an effective strategy for the replication replacement task. PBDR deletes files by considering four important factors: the number of requests for the replica in the future times, availability, the size of the replica and the last time the replica was requested. Also, it can minimize access latency by selecting the best replica when various sites hold replicas of datasets. The algorithm is simulated using a data grid simulator, OptorSim, developed by European Data Grid projects. The experiment results show that PBDR strategy gives better performance compared to the other algorithms and prevents unnecessary creation of replica which leads to efficient storage usage
An Effective Weighted Data Replication Strategy for Data Grid
Data Grid is a good solution to large scale data management problems including efficient file transfer and replication. Dynamic data replication in Data Grid aims to improve data access time and to utilize network and storage resources efficiently. Since the data files are very large and the Grid storages are limited, managing replicas in storages for the purpose of more effective utilization of them require more attention.In this paper, a dynamic data replication strategy, called Modified Latest Access Largest Weight (MLALW) is proposed. This strategy is an enhanced version of Latest Access Largest Weight strategy.MLALW deletes files by considering three important factors: least frequently used replicas, least recently used replicas and the size of the replica. MLALW stores each replica in an appropriate site i.e. appropriate site in the region that has the highest number of access in future for that particular replica. The algorithm is simulated using a Data Grid simulator, OptorSim, developed by European Data Grid projects. The experiment results show that MLALW strategy gives better performance compared to the other algorithms and prevents unnecessary creation of replica which leads to efficient storage usage
Data Management Challenges in Cloud Environments
Recently the cloud computing paradigm has been receiving special excitement and attention in the new researches. Cloud computing has the potential to change a large part of the IT activity, making software even more interesting as a service and shaping the way IT hardware is proposed and purchased. Developers with novel ideas for new Internet services no longer require the large capital outlays in hardware to present their service or the human expense to do it. These cloud applications apply large data centers and powerful servers that host Web applications and Web services. This report presents an overview of what cloud computing means, its history along with the advantages and disadvantages. In this paper we describe the problems and opportunities of deploying data management issues on these emerging cloud computing platforms. We study that large scale data analysis jobs, decision support systems, and application specific data marts are more likely to take benefit of cloud computing platforms than operational, transactional database systems.
 
A New Hybrid Filter-Wrapper Feature Selection using Equilibrium Optimizer and Simulated Annealing
Data dimensions and networks have grown exponentially with the Internet and communications. The challenge of high-dimensional data is increasing for machine learning and data science. This paper presents a hybrid filter-wrapper feature selection method based on Equilibrium Optimization (EO) and Simulated Annealing (SA). The proposed algorithm is named Filter-Wrapper Binary Equilibrium Optimizer Simulated Annealing (FWBEOSA). We used SA to solve the local optimal problem so that EO could be more accurate and better able to select the best subset of features. FWBEOSA utilizes a filtering phase that increases accuracy as well as reduces the number of selected features. The proposed method is evaluated on 17 standard UCI datasets using Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) classifiers and compared with ten state-of-the-art algorithms (i.e., Binary Equilibrium Optimizer (BEO), Binary Gray Wolf Optimization (BGWO), Binary Swarm Slap Algorithm (BSSA), Binary Genetic Algorithm (BGA), Binary Particle Swarm Optimization (BPSO), Binary Social Mimic Optimization (BSMO), Binary Atom Search Optimization (BASO), Modified Flower Pollination Algorithm (MFPA), Bar Bones Particle Swarm Optimization (BBPSO) and Two-phase Mutation Gray Wolf Optimization (TMGWO)). Based on the results of the SVM classification, the highest level of accuracy was achieved in 13 out of 17 data sets (76%), and the lowest number of selected features was achieved in 15 out of 17 data sets (88%). Furthermore, the proposed algorithm using class KNN achieved the highest accuracy rate in 14 datasets (82%) and the lowest selective feature rate in 13 datasets (76%)
A Hybrid Approach for Scheduling based on Multi-criteria Decision Method in Data Grid
Grid computing environments have emerged following the demand of scientists to have a very high computing power and storage capacity. One among the challenges imposed in the use of these environments is the performance problem. To improve performance, scheduling technique is used. Most existing scheduling strategies in Grids only focus on one kind of Grid jobs which can be data-intensive or computation-intensive. However, only considering one kind of jobs in scheduling does not result in suitable scheduling in the viewpoint of all system, and sometimes causes wasting of resources on the other side. To address the challenge of simultaneously considering both kinds of jobs, a new Hybrid Job Scheduling (HJS) strategy is proposed in this paper. At one hand, HJS algorithm considers both data and computational resource availability of the network, and on the other hand, considering the corresponding requirements of each job, it determines a value called W to the job. Using the W value, the importance of two aspects (being data or computation intensive) for each job is determined, and then the job is assigned to the available resources. The simulation results with OptorSim show that HJS outperforms comparing to the existing algorithms mentioned in literature as number of jobs increases
Improve the Performance of Data Grids by Cost-Based Job Scheduling Strategy
Grid environments have gain tremendous importance in recent years since application requirements increased drastically. The heterogeneity and geographic dispersion of grid resources and applications places some complex problems such as job scheduling. Most existing scheduling strategies in Grids only focus on one kind of Grid jobs which can be data-intensive or computation-intensive. However, only considering one kind of jobs in scheduling does not result in suitable scheduling in the viewpoint of all system, and sometimes causes wasting of resources on the other side. To address the challenge of simultaneously considering both kinds of jobs, a new Cost-Based Job Scheduling (CJS) strategy is proposed in this paper. At one hand, CJS algorithm considers both data and computational resource availability of the network, and on the other hand, considering the corresponding requirements of each job, it determines a value called W to the job. Using the W value, the importance of two aspects (being data or computation intensive) for each job is determined, and then the job is assigned to the available resources. The simulation results with OptorSim show that CJS outperforms comparing to the existing algorithms mentioned in literature as number of jobs increases