679 research outputs found
Optimizing Data Intensive Flows for Networks on Chips
Data flow analysis and optimization is considered for homogeneous rectangular
mesh networks. We propose a flow matrix equation which allows a closed-form
characterization of the nature of the minimal time solution, speedup and a
simple method to determine when and how much load to distribute to processors.
We also propose a rigorous mathematical proof about the flow matrix optimal
solution existence and that the solution is unique. The methodology introduced
here is applicable to many interconnection networks and switching protocols (as
an example we examine toroidal networks and hypercube networks in this paper).
An important application is improving chip area and chip scalability for
networks on chips processing divisible style loads
Load-Balancing Models for Scheduling Divisible Load on Large Scale Data Grids
In many data grid applications, data can be decomposed into multiple independent
sub datasets and distributed for parallel execution. This property has been successfully
employed using Divisible Load Theory (DLT) , which has been proven to be a
powerful tool for modeling divisible load problems in large scale data grid. Load
balancing in such environment plays a critical role in achieving high utilization of
resources to schedule the applications efficiently through join consideration of communication
and computation time. There are some scheduling models, which have
been studied, such as Constraint DLT (CDLT), Task Data Present (TDP) and Genetic
Algorithm (GA). However, there has been no optimal solution reached. At the same
time, effective schedulers are not only required to minimize the maximum completion
time (makespan) of the jobs, but also the execution time of the schedulers.This thesis proposes several load balancing models for scheduling divisible load on
large scale data grids, when both processor and communication link speed are heterogeneous.
The proposed models can be decomposed into three stages. The first stage
is to develop new DLT based models for multiple sources scheduling. Closed form
solutions for the load allocation are derived. The new models are called Adaptive
DLT (ADLT) and A2DLT models. In the second stage, an Iterative DLT (IDLT)
model is proposed. Recursive numerical equations are derived to find the optimal
workload assigned to the grid node. The closed form solutions are derived for the
optimal load allocation. Although the IDLT model is proposed for single source, it
has been applied in the case of multiple sources. The third stage integrates the proposed
DLT based models with GA algorithm to solve the time consuming problem.
In addition, the integration of the proposed DLT model with Simulated Annealing
(SA) algorithm has been also developed.
The experimental results have proven that the proposed models yield better perform
ance than previous models in terms of makespan and scheduler execution time. The
ADLT and A2DLT models have reduced the makespan by 21% and 37% respectively
compared to CDLT model. The IDLT model is capable of producing almost optimal
solution for single source scheduling with low time complexity. In addition, the integration
of the proposed DLT model with GA and SA algorithms has also significantly
improved the performance. The SA is 64.70% better than GA in terms of makespan.
Thus, the proposed models can balance the processing loads efficiently so that they
can be integrated in the existing data grid schedulers to improve the performance
Agentless robust load sharing strategy for utilising hetero-geneous resources over wide area network
Resource monitoring and performance prediction services have always been regarded as important keys to improving the performance of load sharing strategy. However, the traditional methodologies usually require specific performance information, which can only be collected by installing proprietary agents on all participating resources. This requirement of implementing a single unified monitoring service may not be feasible because of the differences in the underlying systems and organisation policies. To address this problem, we define a new load sharing strategy which bases the load decision on a simple performance estimation that can be measured easily at the coordinator node. Our proposed strategy relies on a stage-based dynamic task allocation to handle the imprecision of our performance estimation and to correct load distribution on-the-fly. The simulation results showed that the performance of our strategy is comparable or better than traditional strategies, especially when the performance information from the monitoring service is not accurate
Improving Structural Features Prediction in Protein Structure Modeling
Proteins play a vital role in the biological activities of all living species. In nature, a protein folds into a specific and energetically favorable three-dimensional structure which is critical to its biological function. Hence, there has been a great effort by researchers in both experimentally determining and computationally predicting the structures of proteins.
The current experimental methods of protein structure determination are complicated, time-consuming, and expensive. On the other hand, the sequencing of proteins is fast, simple, and relatively less expensive. Thus, the gap between the number of known sequences and the determined structures is growing, and is expected to keep expanding. In contrast, computational approaches that can generate three-dimensional protein models with high resolution are attractive, due to their broad economic and scientific impacts. Accurately predicting protein structural features, such as secondary structures, disulfide bonds, and solvent accessibility is a critical intermediate step stone to obtain correct three-dimensional models ultimately.
In this dissertation, we report a set of approaches for improving the accuracy of structural features prediction in protein structure modeling. First of all, we derive a statistical model to generate context-based scores characterizing the favorability of segments of residues in adopting certain structural features. Then, together with other information such as evolutionary and sequence information, we incorporate the context-based scores in machine learning approaches to predict secondary structures, disulfide bonds, and solvent accessibility. Furthermore, we take advantage of the emerging high performance computing architectures in GPU to accelerate the calculation of pairwise and high-order interactions in context-based scores. Finally, we make these prediction methods available to the public via web services and software packages
Recommended from our members
Computing resources sensitive parallelization of neural neworks for large scale diabetes data modelling, diagnosis and prediction
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Diabetes has become one of the most severe deceases due to an increasing number of diabetes patients globally. A large amount of digital data on diabetes has been collected through various channels. How to utilize these data sets to help doctors to make a decision on diagnosis, treatment and prediction of diabetic patients poses many challenges to the research community. The thesis investigates mathematical models with a focus on neural networks for large scale diabetes data modelling and analysis by utilizing modern computing technologies such as grid computing and cloud computing. These computing technologies provide users with an inexpensive way to have access to extensive computing resources over the Internet for solving data and computationally intensive problems. This thesis evaluates the performance of seven representative machine learning techniques in classification of diabetes data and the results show that neural network produces the best accuracy in classification but incurs high overhead in data training. As a result, the thesis develops MRNN, a parallel neural network model based on the MapReduce programming model which has become an enabling technology in support of data intensive applications in the clouds.
By partitioning the diabetic data set into a number of equally sized data blocks, the workload in training is distributed among a number of computing nodes for speedup in data training. MRNN is first evaluated in small scale experimental environments using 12 mappers and subsequently is evaluated in large scale simulated environments using up to 1000 mappers. Both the experimental and simulations results have shown the effectiveness of MRNN in classification, and its high scalability in data training.
MapReduce does not have a sophisticated job scheduling scheme for heterogonous computing environments in which the computing nodes may have varied computing capabilities. For this purpose, this thesis develops a load balancing scheme based on genetic algorithms with an aim to balance the training workload among heterogeneous computing nodes. The nodes with more computing capacities will receive more MapReduce jobs for execution. Divisible load theory is employed to guide the evolutionary process of the genetic algorithm with an aim to achieve fast convergence. The proposed load balancing scheme is evaluated in large scale simulated MapReduce environments with varied levels of heterogeneity using different sizes of data sets. All the results show that the genetic algorithm based load balancing scheme significantly reduce the makespan in job execution in comparison with the time consumed without load balancing.This work is funded by the EPSRC and China Market Association
Recommended from our members
A resource aware distributed LSI algorithm for scalable information retrieval
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Latent Semantic Indexing (LSI) is one of the popular techniques in the information retrieval fields. Different from the traditional information retrieval techniques, LSI is not based on the keyword matching simply. It uses statistics and algebraic computations. Based on Singular Value Decomposition (SVD), the higher dimensional matrix is converted to a lower dimensional approximate matrix, of which the noises could be filtered. And also the issues of synonymy and polysemy in the traditional techniques can be overcome based on the investigations of the terms related with the documents. However, it is notable that LSI suffers a scalability issue due to the computing complexity of SVD.
This thesis presents a resource aware distributed LSI algorithm MR-LSI which can solve the scalability issue using Hadoop framework based on the distributed computing model MapReduce. It also solves the overhead issue caused by the involved clustering algorithm. The evaluations indicate that MR-LSI can gain significant enhancement compared to the other strategies on processing large scale of documents. One remarkable advantage of Hadoop is that it supports heterogeneous computing environments so that the issue of unbalanced load among nodes is highlighted. Therefore, a load balancing algorithm based on genetic algorithm for balancing load in static environment is proposed. The results show that it can improve the performance of a cluster according to heterogeneity levels.
Considering dynamic Hadoop environments, a dynamic load balancing strategy with varying window size has been proposed. The algorithm works depending on data selecting decision and modeling Hadoop parameters and working mechanisms. Employing improved genetic algorithm for achieving optimized scheduler, the algorithm enhances the performance of a cluster with certain heterogeneity levels
- …